<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>oneminusp.com</title>
	
	<link>http://oneminusp.com</link>
	<description>Computational Finance, Markets, Programming &amp; co</description>
	<lastBuildDate>Thu, 11 Mar 2010 10:27:16 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/Oneminuspcom" /><feedburner:info uri="oneminuspcom" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Empirical probabilities Entropy function</title>
		<link>http://feedproxy.google.com/~r/Oneminuspcom/~3/uAasG6745og/empirical-probabilities-entropy-function.html</link>
		<comments>http://oneminusp.com/code/empirical-probabilities-entropy-function.html#comments</comments>
		<pubDate>Thu, 11 Mar 2010 10:27:16 +0000</pubDate>
		<dc:creator />
				<category><![CDATA[Code]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[entropy]]></category>

		<guid isPermaLink="false">http://oneminusp.com/?p=156</guid>
		<description><![CDATA[Just added this simple but often useful entropy function which can be run on any data set with multiple occurrences of symbols. Say your data is the vector
c(1,1,1,2,2,2,3,3,3)
then
entropy.count
calculates the Shannon entropy using empirical/maximum likelihood probabilities for all unique symbols 1,2,3.
entropy.count &#60;- function(entry) {
	counts &#60;- lapply(split(entry, as.factor(entry)), length)
	counts &#60;- unlist(counts)
	ps &#60;- counts / sum(counts)
	entropy(ps)
}
The code is [...]]]></description>
			<content:encoded><![CDATA[<p>Just added this simple but often useful entropy function which can be run on any data set with multiple occurrences of symbols. Say your data is the vector</p>
<pre>c(1,1,1,2,2,2,3,3,3)</pre>
<p>then</p>
<pre>entropy.count</pre>
<p>calculates the Shannon entropy using empirical/maximum likelihood probabilities for all unique symbols 1,2,3.</p>
<pre>entropy.count &lt;- function(entry) {
	counts &lt;- lapply(split(entry, as.factor(entry)), length)
	counts &lt;- unlist(counts)
	ps &lt;- counts / sum(counts)
	entropy(ps)
}</pre>
<p>The code is now also available through the "Papers / Code" tab above.</p>
<img src="http://feeds.feedburner.com/~r/Oneminuspcom/~4/uAasG6745og" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://oneminusp.com/code/empirical-probabilities-entropy-function.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://oneminusp.com/code/empirical-probabilities-entropy-function.html</feedburner:origLink></item>
		<item>
		<title>Open Source Information theory frameworks</title>
		<link>http://feedproxy.google.com/~r/Oneminuspcom/~3/HyeszBE2FSI/open-source-information-theory-frameworks.html</link>
		<comments>http://oneminusp.com/code/open-source-information-theory-frameworks.html#comments</comments>
		<pubDate>Thu, 18 Feb 2010 12:51:56 +0000</pubDate>
		<dc:creator />
				<category><![CDATA[Code]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://oneminusp.com/?p=147</guid>
		<description><![CDATA[I found two frameworks (for languages I'm interested in) which provide a range of entropy estimators and other information theoretical measures:
For python there is pyentropy
For R there is the entropy package by Hausser and Strimmer.
If you know of others, please let me know.
Also, I will soon add an additional page to the blog where you [...]]]></description>
			<content:encoded><![CDATA[<p>I found two frameworks (for languages I'm interested in) which provide a range of entropy estimators and other information theoretical measures:</p>
<p>For python there is <a href="http://code.google.com/p/pyentropy/">pyentropy</a></p>
<p>For R there is the <a href="http://strimmerlab.org/software/entropy/">entropy package</a> by Hausser and Strimmer.</p>
<p>If you know of others, please let me know.</p>
<p>Also, I will soon add an additional page to the blog where you can additionally download all the source code presented in the code.</p>
<img src="http://feeds.feedburner.com/~r/Oneminuspcom/~4/HyeszBE2FSI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://oneminusp.com/code/open-source-information-theory-frameworks.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://oneminusp.com/code/open-source-information-theory-frameworks.html</feedburner:origLink></item>
		<item>
		<title>Entropy estimators and predictability</title>
		<link>http://feedproxy.google.com/~r/Oneminuspcom/~3/1QXryPL252w/entropy-estimators-and-predictability.html</link>
		<comments>http://oneminusp.com/quant/entropy-estimators-and-predictability.html#comments</comments>
		<pubDate>Sat, 13 Feb 2010 01:54:15 +0000</pubDate>
		<dc:creator />
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Quant]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[bias]]></category>
		<category><![CDATA[entropy estimation]]></category>
		<category><![CDATA[estimation]]></category>
		<category><![CDATA[predictability]]></category>
		<category><![CDATA[sampling]]></category>
		<category><![CDATA[variance reduction]]></category>

		<guid isPermaLink="false">http://oneminusp.com/?p=127</guid>
		<description><![CDATA[In previous posts I discussed the local uncertainty  and the block entropy . We also saw the rapid decrease in  uncertainty -- this is due to sampling errors. With larger n our empirical probability estimate  gets worse because it would require more samples to "fill up the histogram", i.e. there's missing ngrams [...]]]></description>
			<content:encoded><![CDATA[<p>In previous posts I discussed the local uncertainty <img src='http://s.wordpress.com/latex.php?latex=h_n%5E%7B%281%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='h_n^{(1)}' title='h_n^{(1)}' class='latex' /> and the block entropy <img src='http://s.wordpress.com/latex.php?latex=H_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H_n' title='H_n' class='latex' />. We also saw the rapid decrease in <img src='http://s.wordpress.com/latex.php?latex=H_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H_n' title='H_n' class='latex' /> uncertainty -- this is due to sampling errors. With larger <em>n</em> our empirical probability estimate <img src='http://s.wordpress.com/latex.php?latex=n_i%20%2F%20n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='n_i / n' title='n_i / n' class='latex' /> gets worse because it would require more samples to "fill up the histogram", i.e. there's missing ngrams and the seen ngrams have a bad probability estimate.</p>
<p>There's a vast number of papers and techniques on reducing the bias and variance on entropy estimates and I decided to write a few posts about this, with the aim to find the best entropy estimators for our (local) uncertainty measure. With a suitable entropy estimator we will be able to analyse local predictabilities conditioned on larger number of previous symbols with higher significance.</p>
<p>The estimator we used so far is called "plug-in" or maximum likelihood estimator and is defined as</p>
<p><img src='http://s.wordpress.com/latex.php?latex=%5Chat%7BH%7D%28X%29%20%3D%20-%20%5Csum_X%20%5Chat%7BP%7D%28x%29%20log%20%5Chat%7BP%7D%28x%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\hat{H}(X) = - \sum_X \hat{P}(x) log \hat{P}(x)' title='\hat{H}(X) = - \sum_X \hat{P}(x) log \hat{P}(x)' class='latex' /></p>
<p>where <img src='http://s.wordpress.com/latex.php?latex=%5Chat%7BP%7D%28x%29%20%3D%20n_x%20%2F%20n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\hat{P}(x) = n_x / n' title='\hat{P}(x) = n_x / n' class='latex' />, so the number of occurrences of the word <em>x</em> in the whole space. It is well known that the MLE estimator is negatively biased. What does that mean?</p>
<p><span id="more-127"></span></p>
<p>Bias is the difference between the expected value of the estimator and the true value, i.e.</p>
<p><img src='http://s.wordpress.com/latex.php?latex=Bias%5B%5Chat%7BH%7D%5D%20%3D%20E%5B%5Chat%7BH%7D%5D%20-%20H&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Bias[\hat{H}] = E[\hat{H}] - H' title='Bias[\hat{H}] = E[\hat{H}] - H' class='latex' /></p>
<p>Thus it is a measure of how accurate the estimator is. The negative bias of MLE means that it underestimates the true entropy of a system. Let us check  if we can find this behaviour in data -- theory is nice but data is what counts.</p>
<p>We need to set up a system where we can easily check MLE with the true entropy, so it's easiest if we just sample from a uniform distribution with, e.g. 256 different observations, thus the true probability is <img src='http://s.wordpress.com/latex.php?latex=1%2F256&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='1/256' title='1/256' class='latex' /> for each observation; it easily follows that the entropy is then just <img src='http://s.wordpress.com/latex.php?latex=log%28256%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='log(256)' title='log(256)' class='latex' />.</p>
<p>The function <em>biassim</em> below calculates the bias through <em>steps</em> numbers of samples with replacement from a pool of 100'000 uniformly distributed integers between 0 and 256; it then calculates the entropy of the sample. Also notice here that the number of samples and number of possible words are equal, which is obviously  not the case in general. Finally, we subtract the true entropy value <em>trueEV</em> from the estimates.</p>
<pre>biassim &lt;- function(n, m, steps) {
	trueEV &lt;- log2(m)
	uniformM&lt;-round(runif(100000, min=0, max=m))
	f &lt;- function(i) {
		sampleN &lt;- sample(uniformM, n, replace = TRUE)
		probN &lt;- sampleN / sum(sampleN)
                entropy(probN[probN &gt; 0])
	}
	accu &lt;- sapply(1:steps, f)
	return(accu - trueEV)
}</pre>
<p>We apply the function above as follows:</p>
<pre>&gt; r &lt;- biassim(256,256,2000)
&gt; hist(r,breaks=100,xlim=0:-1,main="Bias and variance of MLE")</pre>
<p>The histogram should like similar to the plot below.</p>
<p><img class="alignnone size-full wp-image-135" title="Screen shot 2010-02-12 at 19.37.36" src="http://oneminusp.com/wp-content/uploads/2010/02/Screen-shot-2010-02-12-at-19.37.36.png" alt="Screen shot 2010-02-12 at 19.37.36" width="553" height="393" /></p>
<p>It shows the negative bias very nicely, with the red vertical line being the true value to be estimated.</p>
<p>If our naive MLE estimator has this systematic underestimation then surely we can easily fix this issue by "shifting" the estimates to the positive side? Yes and no. The simplest bias correction is known under the Miller-Maddow correction and it suggests adding <img src='http://s.wordpress.com/latex.php?latex=%28m%20-%201%29%2F%282n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(m - 1)/(2n)' title='(m - 1)/(2n)' class='latex' /> to the estimate. The only problem is that generally m, the number of possible words, is unknown, so it would have to be estimated as well.</p>
<p>In the next post we will apply the Miller-Maddow correction with a few other known estimators and analyse which one is most suitable for our purposes.</p>
<img src="http://feeds.feedburner.com/~r/Oneminuspcom/~4/1QXryPL252w" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://oneminusp.com/quant/entropy-estimators-and-predictability.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://oneminusp.com/quant/entropy-estimators-and-predictability.html</feedburner:origLink></item>
		<item>
		<title>Local order and predictabilitiy: Significance testing</title>
		<link>http://feedproxy.google.com/~r/Oneminuspcom/~3/j7PAF1p5Uf8/local-order-and-predictabilitiy-significance-testing.html</link>
		<comments>http://oneminusp.com/quant/local-order-and-predictabilitiy-significance-testing.html#comments</comments>
		<pubDate>Fri, 29 Jan 2010 20:02:49 +0000</pubDate>
		<dc:creator />
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Quant]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[entropy]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[significance]]></category>
		<category><![CDATA[surrogate]]></category>
		<category><![CDATA[uncertainty]]></category>

		<guid isPermaLink="false">http://oneminusp.com/?p=110</guid>
		<description><![CDATA[The two previous posts described an implementation of a paper about finding local order (return patterns with higher than average predictability of the next symbol) in financial time series.
One important unanswered question so far is about the significance of the local uncertainties . Does a deviation from almost no order ( &#62; 0.99) really mean [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://oneminusp.com/quant/local-order-and-predictability-of-financial-time-series.html">two</a> <a href="http://oneminusp.com/quant/local-order-and-predictability-implementation.html">previous</a> posts described an implementation of a paper about finding local order (return patterns with higher than average predictability of the next symbol) in financial time series.</p>
<p>One important unanswered question so far is about the significance of the local uncertainties <img src='http://s.wordpress.com/latex.php?latex=h_n%28A_1%20%5Cdots%20A_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='h_n(A_1 \dots A_n)' title='h_n(A_1 \dots A_n)' class='latex' />. Does a deviation from almost no order ( &gt; 0.99) really mean something or is it due to imprecisions/undersampling of the empirical probabilities? As the original paper notices, the larger values we choose for <em>n, </em>i.e. the more previous trading days we consider to predict the next one, the more ngrams are possible and therefore the more samples we need to approximate the probabilities <img src='http://s.wordpress.com/latex.php?latex=p%5E%7B%28n%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p^{(n)}' title='p^{(n)}' class='latex' /> more or less accurately.</p>
<p>There's two ways to go:</p>
<ul>
<li>As in the original paper, use empirical probabilities and the basic plugin entropy estimator and restrict <em>n </em>to maximally 5, as their significance level K dictates (more to that below)</li>
<li>Experiment with larger <em>n </em>including more sophisticated probability and enstropy estimators</li>
</ul>
<p>We will do both. But for now I'll concentrate on the significance level K as introduced in the paper. A so called surrogate sequence of length n is generated out of the partitioned time series. These surrogates have the same mean and standard deviation as the original sequence, you could see it as a random shuffling of the sequence with some further rules. The local uncertainties from the surrogates are called <img src='http://s.wordpress.com/latex.php?latex=h_n%5ES%28A_1%20%5Cdots%20A_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='h_n^S(A_1 \dots A_n)' title='h_n^S(A_1 \dots A_n)' class='latex' />. The significance level K is then calculates as:</p>
<p><img src='http://s.wordpress.com/latex.php?latex=K_n%28A_1%20%5Cdots%20A_n%29%20%3D%20%5Cvert%20%5Cfrac%7Bh_n%28A_1%20%5Cdots%20A_n%29%20-%20%5Clangle%20h_n%5ES%28A_1%20%5Cdots%20A_n%29%20%5Crangle%7D%7B%5Csigma_%7Bh_n%5ES%7D%7D%5Cvert&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='K_n(A_1 \dots A_n) = \vert \frac{h_n(A_1 \dots A_n) - \langle h_n^S(A_1 \dots A_n) \rangle}{\sigma_{h_n^S}}\vert' title='K_n(A_1 \dots A_n) = \vert \frac{h_n(A_1 \dots A_n) - \langle h_n^S(A_1 \dots A_n) \rangle}{\sigma_{h_n^S}}\vert' class='latex' /></p>
<p><span id="more-110"></span></p>
<p>where <img src='http://s.wordpress.com/latex.php?latex=%5Clangle%20h_n%5ES%20%5Crangle&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\langle h_n^S \rangle' title='\langle h_n^S \rangle' class='latex' /> is the mean and <img src='http://s.wordpress.com/latex.php?latex=%5Csigma&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\sigma' title='\sigma' class='latex' /> is the standard deviation of the surrogate's answers. Intuitively, this is the difference between the actual uncertainties and mean randomised ones in units of the standard deviation of the surrogates. A K value larger than 2 would mean a confidence level greater than 95%. Due to the exponential nature however we'll just say, the bigger K the better. The reference to this test is from the Nature magazine 379 618 <em>Characterisation of Low-Dimensional Dynamics in the Crayfish Caudal Photoreceptor</em>.</p>
<p>It is not completely clear how this should get implemented. For now, I decided to create 5 surrogate sequences of the length n, and build up the definition above from that.</p>
<pre>K &lt;- function(seq, slideseq, ngram) {
	# generate 5 surrogate sequences out of seq
	sur1 &lt;- surrogate(seq, length(ngram))
	sur2 &lt;- surrogate(seq, length(ngram))
	sur3 &lt;- surrogate(seq, length(ngram))
	sur4 &lt;- surrogate(seq, length(ngram))
	p1 &lt;- h_n_cond(sur1, ngram,3)
	p2 &lt;- h_n_cond(sur2, ngram,3)
	p3 &lt;- h_n_cond(sur3, ngram,3)
	p4 &lt;- h_n_cond(sur4, ngram,3)

	s &lt;- sd(c(p1,p2,p3,p4))
	#print(s)
	m &lt;- mean(c(p1,p2,p3,p4))
	p &lt;- h_n_cond(slideseq, ngram,3)
	return( abs( (p - m) / s) )
}</pre>
<p>The length of the alphabet is fixed to 3 here in the code, but you can change that easily.</p>
<p>Now I'd like to calculate a sliding window of 4-grams to calculate local uncertainties of the last 200 DJI closes and their corresponding K values. The following function does just that</p>
<pre>localallinfo &lt;- function(seq, slideseq, lambda, limit=0:0) {
  len &lt;- dim(slideseq)[1]
  range &lt;- 1:len   if(length(limit) &gt; 1) {
  	  range &lt;- limit
  }
  # we store 2 columns for h_n_cond value and K, and dim(slideseq)[2] for length of word
  v &lt;- array(dim=c(length(range), 2 + dim(slideseq)[2]))
  for(i in range) {
	v[i,] &lt;- c(h_n_cond(slideseq, slideseq[i,], lambda),
                       K(seq, slideseq, slideseq[i,]), slideseq[i,])
  }
  return(v)
}</pre>
<p>This will return a 2+n column array with the values for (local uncertainty, K value, ngram). Stored in variable <em>x</em> we have the return value of <em>localallinfo</em>. We can query all the results with high predictability:</p>
<pre>&gt; x[x[,1] &lt; 0.932,]
          [,1]      [,2] [,3] [,4] [,5] [,6]
[1,] 0.9308985 30.705164    0    2    1    1
[2,] 0.9308985 13.576514    0    2    1    1
[3,] 0.9246361  6.738199    1    1    2    1
[4,] 0.9308985  7.542447    0    2    1    1
[5,] 0.9246361 13.721803    1    1    2    1
[6,] 0.9245277  9.663806    1    1    1    1</pre>
<p>Very satisfying to see this develop so nicely. So what we have here are the patterns, their local uncertainty and the K values, all bigger than quite a bit bigger than 2. We can read this as follows for entry 1:</p>
<p>The ngram pattern 0 2 1 1 has a local uncertainty of 93% with a significance level K of 30.7</p>
<p>Thanks to the R package <a href="http://had.co.nz/ggplot2">ggplot2</a>, we generate the graph below with the commands:</p>
<pre>&gt; z&lt;-data.frame(A=x[,1],B=1:200,K=x[,2])
&gt; d&lt;-qplot(B,A,data=z,colour=K,xlab="Time",ylab="Local Uncertainty")
&gt; d + scale_colour_gradient(limits=c(1, 15), low="yellow",high="red")</pre>
<p><img class="alignnone size-full wp-image-118" title="significance" src="http://oneminusp.com/wp-content/uploads/2010/01/significance.png" alt="significance" width="557" height="351" /></p>
<p>This is the same data as the plot in the previous post, just now color coded with the significance level. We can see the trend that lower uncertainties have higher K values, just like described in the original paper.</p>
<p>In the next post I'll try to play a bit more with graphical representations, K levels, and then will move on to different entropy estimators and applying our code to more time series.</p>
<img src="http://feeds.feedburner.com/~r/Oneminuspcom/~4/j7PAF1p5Uf8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://oneminusp.com/quant/local-order-and-predictabilitiy-significance-testing.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://oneminusp.com/quant/local-order-and-predictabilitiy-significance-testing.html</feedburner:origLink></item>
		<item>
		<title>Local order and predictability – Implementation</title>
		<link>http://feedproxy.google.com/~r/Oneminuspcom/~3/tBo0H58IrY0/local-order-and-predictability-implementation.html</link>
		<comments>http://oneminusp.com/quant/local-order-and-predictability-implementation.html#comments</comments>
		<pubDate>Tue, 26 Jan 2010 17:28:10 +0000</pubDate>
		<dc:creator />
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Quant]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[conditional entropy]]></category>
		<category><![CDATA[entropy]]></category>
		<category><![CDATA[local order]]></category>
		<category><![CDATA[predictability]]></category>

		<guid isPermaLink="false">http://oneminusp.com/?p=92</guid>
		<description><![CDATA[Part 1 discussed a paper on local order and predictability of time series. I will now describe the implementation of the described functions in R.
First we assume that already have our real returns data partitioned into symbols  so  is 3. Thus our time series is just a vector of values 0 1 2.
Next, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://oneminusp.com/quant/local-order-and-predictability-of-financial-time-series.html">Part 1</a> discussed a paper on local order and predictability of time series. I will now describe the implementation of the described functions in R.</p>
<p>First we assume that already have our real returns data partitioned into symbols <img src='http://s.wordpress.com/latex.php?latex=A_t%20%3D%20%5C%7B0%2C1%2C2%5C%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='A_t = \{0,1,2\}' title='A_t = \{0,1,2\}' class='latex' /> so <img src='http://s.wordpress.com/latex.php?latex=%5Clambda&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\lambda' title='\lambda' class='latex' /> is 3. Thus our time series is just a vector of values 0 1 2.</p>
<p>Next, all our functions will consider trajectories <img src='http://s.wordpress.com/latex.php?latex=A_1%20%5Cdots%20A_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='A_1 \dots A_n' title='A_1 \dots A_n' class='latex' /> of that original vector. I will implement this as a sliding window of length n. So if our sequence is 012020120 the function <em>slide</em> will create the array 012, 120, 202, 020, 201, 012, 120 out of it.</p>
<pre>slide &lt;- function(seq,windowsize) {
	steps &lt;- length(seq)-windowsize
	start &lt;- 1
	stop &lt;- windowsize
	accu &lt;- array(0,dim=c(steps,windowsize))
	for(i in 1:(steps)) {
		#print(seq[start:stop])
		accu[i,] &lt;- seq[start:stop]
		start &lt;- start+1
		stop &lt;- start+windowsize-1
	}
	return(accu)
}
<span id="more-92"></span></pre>
<p>The probability of a trajectory of length n is <img src='http://s.wordpress.com/latex.php?latex=p%5E%7B%28n%29%7D%28A_1%20%5Cdots%20A_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p^{(n)}(A_1 \dots A_n)' title='p^{(n)}(A_1 \dots A_n)' class='latex' />. This is simply modeled by the function <em>prob_n</em> by the empirical distribution of the number of times the ngram is found, divided by the total number of windows</p>
<pre>prob_n &lt;- function(seq, ngram) {
 l &lt;- dim(seq)
 count &lt;- 0
 for(j in 1:l[1]) {
	if(identical(seq[j,], ngram)) { count&lt;-count+1 }
 }
 return(count/l[1])
}</pre>
<p>The actual entropy <img src='http://s.wordpress.com/latex.php?latex=H_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H_n' title='H_n' class='latex' /> is the average uncertainty in an ngram given the complete sequence. So if a specific ngram has probability <img src='http://s.wordpress.com/latex.php?latex=p%5E%7B%28n%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p^{(n)}' title='p^{(n)}' class='latex' /> then its weighted self-information is <img src='http://s.wordpress.com/latex.php?latex=p%5E%7B%28n%29%7D%20%5Clog_%7B%5Clambda%7D%20p%5E%7B%28n%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p^{(n)} \log_{\lambda} p^{(n)}' title='p^{(n)} \log_{\lambda} p^{(n)}' class='latex' />. This quantity is added up for all possible ngram sequences. This is implemented in the following function</p>
<pre>H_n &lt;- function(seq,n,lambda) {
	nseq &lt;- slide(seq,n)
	# get the unique ngrams
	uniques &lt;- unique(nseq)
	dim_u &lt;- dim(uniques)
	accu &lt;- 0
	# for every unique ngram, get its prob_n and add to its entropy
	for(i in 1:dim_u[1]) {
	  p &lt;- prob_n(nseq, uniques[i,])
	  #accu &lt;- accu + p
	  accu &lt;- accu + (-p *log(p)/log(lambda))
	}
	return(accu)
}</pre>
<p>The average uncertainty of predicting the next symbol , given that you've already seen n previous ones is just the difference of <img src='http://s.wordpress.com/latex.php?latex=H_%7Bn%2B1%7D%20-%20H_%7Bn%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H_{n+1} - H_{n}' title='H_{n+1} - H_{n}' class='latex' /> as follows</p>
<pre>h_n &lt;- function(seq,n,lambda) {
	return(H_n(seq,n+1,lambda) - H_n(seq,n,lambda))
}</pre>
<p>The more interesting bit is how to calculate the conditional probabilities in <img src='http://s.wordpress.com/latex.php?latex=h_n%5E%7B%281%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='h_n^{(1)}' title='h_n^{(1)}' class='latex' />, <img src='http://s.wordpress.com/latex.php?latex=p%28A_%7Bn%2B1%7D%20%5Cvert%20A_1%20%5Cdots%20A_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(A_{n+1} \vert A_1 \dots A_n)' title='p(A_{n+1} \vert A_1 \dots A_n)' class='latex' />. Say the ngram we are looking at is the sequence 0120, <img src='http://s.wordpress.com/latex.php?latex=h_n%5E%7B%281%29%7D%280120%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='h_n^{(1)}(0120)' title='h_n^{(1)}(0120)' class='latex' /> is the uncertainty in predicting the next symbol after observing 0120. This is implemented in a way where we only consider the sequences of length 5 with an unknown fifth symbol 0120<em>x</em>. The code <em>h_n_cond</em> finds the empirical probability distribution for <em>x</em> in the restricted sample space of 0120<em>x</em>. Once the probability distribution is accumulated in the variable <em>probdist</em>, we simply calculate the <img src='http://s.wordpress.com/latex.php?latex=%5Clambda&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\lambda' title='\lambda' class='latex' /> based entropy of it to arrive at <img src='http://s.wordpress.com/latex.php?latex=h_n%5E%7B%281%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='h_n^{(1)}' title='h_n^{(1)}' class='latex' />.</p>
<pre>h_n_cond &lt;- function(seq, ngram, lambda) {
  nextcounts &lt;- array(data=0, dim=c(lambda,1))
  l &lt;- dim(seq)
  skip &lt;- length(ngram)
  #count &lt;- 0
  for(j in 1:l[1]) {
  	if(identical(seq[j,], ngram)) {
  	   #count &lt;- count + 1
  	   # keep count for each possible next step element
  	   if(j &lt; l[1]-(skip-1)) {
  		nextcounts[seq[j+skip,1]+1] &lt;- nextcounts[seq[j+skip,1]+1] + 1
  	   }
  	}
  }
  probdist &lt;- (nextcounts/sum(nextcounts))
  entropy(probdist, lambda)
}</pre>
<p><strong>Testing</strong></p>
<p>Next up, we need some real data to test our functions on! I downloaded all of DJI historical data which was available from Yahoo. The data is stored in the variable dji and I will only consider dji$Close prices for now. I generate the daily logarithmic returns which will be partitioned into the discrete symbols 0, 1 and 2. We will use the same partitioning as in the paper. In R this is quite straightforward as:</p>
<p>&gt; djipart &lt;- ifelse(djiret &lt; -0.0025, 0, ifelse(djiret &gt; 0.0034, 2, 1))</p>
<p>The authors justified the asymmetric partitioning due to the slightly positive mean logarithmic prices. I rather see a negative mean of -0.0001835142 in my data but I want to reproduce the numbers of the paper as well as possible so I use theirs.</p>
<p>Before running any functions on the real data I want to test it on uniformly distributed random data</p>
<p>&gt; sequnif &lt;-round(runif(10000, min=0,max=100)) %% 3</p>
<p>Obviously, I expect any uncertainties calculated as equally likely .. there shouldn't be any higher predictability by considering longer trajectories</p>
<p>&gt; h_n(sequnif, 3, 3)<br />
[1] 0.997702<br />
&gt; h_n(sequnif, 4, 3)<br />
[1] 0.9928539<br />
&gt; sequnif4 &lt;- slide(sequnif,4)<br />
&gt; h_n_cond(sequnif4, c(1,0,1,0), 3)<br />
[1] 0.9980187<br />
&gt; h_n_cond(sequnif4, c(1,0,1,1), 3)<br />
[1] 0.9982527<br />
&gt; h_n_cond(sequnif4, c(0,1,1,1), 3)<br />
[1] 0.9974415</p>
<p>Our functions seem to produce the expected almost total uncertainty answers. The conditional entropies <img src='http://s.wordpress.com/latex.php?latex=h_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='h_n' title='h_n' class='latex' /> for n from 1-6 is plotted in the picture below. Here are some numerical values:</p>
<p>&gt; h_n(djipart, 3, 3)<br />
[1] 0.9841248<br />
&gt; h_n(djipart, 4, 3)<br />
[1] 0.9788154<br />
&gt; h_n(djipart, 5, 3)<br />
[1] 0.9679176<br />
&gt; h_n(djipart, 6, 3)<br />
[1] 0.9429579</p>
<p>We observe a nice gradual decrease of uncertainty for longer trajectories.</p>
<p><img class="alignnone size-full wp-image-100" title="condentropy" src="http://oneminusp.com/wp-content/uploads/2010/01/condentropy.png" alt="condentropy" width="480" height="480" /></p>
<p>This graph pretty much reflects the shape of <strong>Fig. 2</strong> in the original paper. Again, the graph reflects the average uncertainty of predicting the next symbol for ngrams of different lengths (with values from the x axis).</p>
<p>The local uncertainty for specific ngrams can be queried with h_n_cond as follows:</p>
<p>&gt; h_n_cond(dji4, c(0,2,1,2), 3)<br />
[1] 0.9924063<br />
&gt; h_n_cond(dji4, c(1,1,1,1), 3)<br />
[1] 0.9245277</p>
<p>The next state after observing pattern 1111 is more predictable than the pattern 0212 by 6.7%!</p>
<p>The next graph I will generate is the local uncertainty of length 4 of the last 200 trading day closes using a rolling window as follows</p>
<pre>len &lt;- dim(dji4)[1]
v4 &lt;- vector()
for(i in 1:200) {
   h &lt;- h_n_cond(dji4, dji4[i,], 3)
   v4 &lt;- c(v4,h)
}</pre>
<p>&gt; plot(rev(v4), type='l', axes=F, xlab="DJI close", ylab="local uncertainty")<br />
&gt; axis(1,at=pretty(c(1,200)),labels=rev(dji$Close[pretty(c(0,201))]))<br />
&gt; axis(2)</p>
<p><img class="alignnone size-full wp-image-102" title="localuncert" src="http://oneminusp.com/wp-content/uploads/2010/01/localuncert.png" alt="localuncert" width="480" height="480" /></p>
<p>Which resembles the graph from <strong>Fig. 1</strong> from the original paper. Given the original definitions we should have that the mean of that local uncertainty sequence v4 should be close to the average uncertainty of length 4:</p>
<p>&gt; mean(v4)<br />
[1] 0.9764447<br />
&gt; h_n(djipart, 4, 3)<br />
[1] 0.9788154</p>
<p>The minimal value in this graph is at 0.9245.</p>
<p>What we haven't considered so far is how significant the predictions are. The MLE estimator of entropy we are using here is biased and the sample error grows rapidly with increasing ngram length. The paper is using something they call surrogate sequences to calculate its significance. I don't really know about that approach but I will look into it and write about it in a future article in this series of local uncertainties. One approach I could imagine is using different entropy estimators with bias correction.</p>
<p>Hope you enjoyed this article so far, please leave a comment if you feel like it.</p>
<img src="http://feeds.feedburner.com/~r/Oneminuspcom/~4/tBo0H58IrY0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://oneminusp.com/quant/local-order-and-predictability-implementation.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://oneminusp.com/quant/local-order-and-predictability-implementation.html</feedburner:origLink></item>
		<item>
		<title>Local order and predictability of financial time series</title>
		<link>http://feedproxy.google.com/~r/Oneminuspcom/~3/_l3-dc9mmys/local-order-and-predictability-of-financial-time-series.html</link>
		<comments>http://oneminusp.com/quant/local-order-and-predictability-of-financial-time-series.html#comments</comments>
		<pubDate>Tue, 26 Jan 2010 11:41:59 +0000</pubDate>
		<dc:creator />
				<category><![CDATA[Papers]]></category>
		<category><![CDATA[Quant]]></category>
		<category><![CDATA[conditional entropy]]></category>
		<category><![CDATA[entropy]]></category>
		<category><![CDATA[local order]]></category>
		<category><![CDATA[predictability]]></category>
		<category><![CDATA[returns]]></category>
		<category><![CDATA[time series]]></category>

		<guid isPermaLink="false">http://oneminusp.com/?p=63</guid>
		<description><![CDATA[In this series of posts I will discuss an implementation and tests of the paper Local order, entropy and predictability of financial time series by L. Molgedey and W. Ebeling. (pdf)
The paper presents an excellent application of information theory to time series analysis. The idea is simple: is it possible to find sub-trajectories in financial [...]]]></description>
			<content:encoded><![CDATA[<p>In this series of posts I will discuss an implementation and tests of the paper <em>Local order, entropy and predictability of financial time series</em> by L. Molgedey and W. Ebeling. (<a href="http://oneminusp.com/wp-content/uploads/2010/01/sd2.pdf">pdf</a>)</p>
<p>The paper presents an excellent application of information theory to time series analysis. The idea is simple: is it possible to find sub-trajectories in financial time series (here the daily returns of some indices or stock) where a "local order" exists with higher than average predictability.</p>
<p>I won't explain the paper in full, so please have a look at the pdf above for notation and details. However I will describe the most important concepts below. We consider one-dimensional, discretely partitioned time series. The authors use Shannon entropy H as basic tool to measure uncertainty or predictability of the probability distribution described by the time series. For a certain trajectory of length n the uncertainty of predicting the next state is the difference in Shannon entropies for trajectories of length n+1 and n:</p>
<p><img src='http://s.wordpress.com/latex.php?latex=h_n%20%3D%20H_%7Bn%2B1%7D%20-%20H_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='h_n = H_{n+1} - H_n' title='h_n = H_{n+1} - H_n' class='latex' /></p>
<p><span id="more-63"></span></p>
<p>Let <img src='http://s.wordpress.com/latex.php?latex=L&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='L' title='L' class='latex' /> be the total length of the time series, let <img src='http://s.wordpress.com/latex.php?latex=%5Clambda&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\lambda' title='\lambda' class='latex' /> be the length of the alphabet and <img src='http://s.wordpress.com/latex.php?latex=A_1%20%5Cdots%20A_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='A_1 \dots A_n' title='A_1 \dots A_n' class='latex' /> is a specific subtrajectory of length <img src='http://s.wordpress.com/latex.php?latex=n%20%5Cle%20L&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='n \le L' title='n \le L' class='latex' />.</p>
<p>Further, <img src='http://s.wordpress.com/latex.php?latex=p%5E%7B%28n%29%7D%28A_1%20%5Cdots%20A_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p^{(n)}(A_1 \dots A_n)' title='p^{(n)}(A_1 \dots A_n)' class='latex' /> is the probability to find a subtrajectory with the letters <img src='http://s.wordpress.com/latex.php?latex=A_1%20%5Cdots%20A_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='A_1 \dots A_n' title='A_1 \dots A_n' class='latex' /> in the complete time series. Then the entropy is simply</p>
<p><img src='http://s.wordpress.com/latex.php?latex=H_n%20%3D%20-%5Csum%7Bp%5E%7B%28n%29%7D%28A_1%20%5Cdots%20A_n%29%20%5Clog_%7B%5Clambda%7D%20p%5E%7B%28n%29%7D%28A_1%20%5Cdots%20A_n%29%20%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H_n = -\sum{p^{(n)}(A_1 \dots A_n) \log_{\lambda} p^{(n)}(A_1 \dots A_n) }' title='H_n = -\sum{p^{(n)}(A_1 \dots A_n) \log_{\lambda} p^{(n)}(A_1 \dots A_n) }' class='latex' /></p>
<p>Notice here that we take the logarithm of base <img src='http://s.wordpress.com/latex.php?latex=%5Clambda&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\lambda' title='\lambda' class='latex' />, so the quantities are not measured in the more conventional bits. As described above from the conditional entropy <img src='http://s.wordpress.com/latex.php?latex=h_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='h_n' title='h_n' class='latex' /> we define the average predictability as</p>
<p><img src='http://s.wordpress.com/latex.php?latex=r_n%20%3D%201%20-%20h_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='r_n = 1 - h_n' title='r_n = 1 - h_n' class='latex' /></p>
<p>Clearly, since we consider <img src='http://s.wordpress.com/latex.php?latex=log%28%5Clambda%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='log(\lambda)' title='log(\lambda)' class='latex' /> units, the maximum uncertainty is <img src='http://s.wordpress.com/latex.php?latex=h_n%20%3D%201&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='h_n = 1' title='h_n = 1' class='latex' />. Also we can deduce that <img src='http://s.wordpress.com/latex.php?latex=h_%7Bn%2B1%7D%20%5Cle%20h_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='h_{n+1} \le h_n' title='h_{n+1} \le h_n' class='latex' /> which is intuitively understandable as: the more I know the less uncertainty there can be, unless of course every observation was completely independent, where the equality would apply.</p>
<p>Next, the paper describes the "local uncertainty" measure as the uncertainty of the next symbol given the observation of specific symbols <img src='http://s.wordpress.com/latex.php?latex=A_1%20%5Cdots%20A_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='A_1 \dots A_n' title='A_1 \dots A_n' class='latex' />. The authors write it as</p>
<p><img src='http://s.wordpress.com/latex.php?latex=h_n%5E%7B%281%29%7D%28A_1%20%5Cdots%20A_n%29%20%3D%20-%20%5Csum%7Bp%28A_%7Bn%2B1%7D%20%5Cvert%20A_1%20%5Cdots%20A_n%29%20%5Clog%20p%28A_%7Bn%2B1%7D%20%5Cvert%20A_1%20%5Cdots%20A_n%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='h_n^{(1)}(A_1 \dots A_n) = - \sum{p(A_{n+1} \vert A_1 \dots A_n) \log p(A_{n+1} \vert A_1 \dots A_n)}' title='h_n^{(1)}(A_1 \dots A_n) = - \sum{p(A_{n+1} \vert A_1 \dots A_n) \log p(A_{n+1} \vert A_1 \dots A_n)}' class='latex' /></p>
<p>Notice that this <em>almost</em> coincides with the<a href="http://en.wikipedia.org/wiki/Conditional_entropy"> conditional entropy</a> <img src='http://s.wordpress.com/latex.php?latex=H%28A_%7Bn%2B1%7D%20%5Cvert%20A_1%20%5Cdots%20A_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H(A_{n+1} \vert A_1 \dots A_n)' title='H(A_{n+1} \vert A_1 \dots A_n)' class='latex' /> except that it leaves out the weighting factor of <img src='http://s.wordpress.com/latex.php?latex=p%28A_1%20%5Cdots%20A_n%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='p(A_1 \dots A_n)' title='p(A_1 \dots A_n)' class='latex' />, thus it's not an averaging measure but a local one. The function <img src='http://s.wordpress.com/latex.php?latex=h_n%5E%7B%281%29%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='h_n^{(1)}' title='h_n^{(1)}' class='latex' /> will be the main measure we apply to our time series.</p>
<p><strong>Partitioning</strong></p>
<p>As a next step we need to partition the real valued data of the time series into discrete symbols. First we take the daily logarithmic prices changes of some stock <img src='http://s.wordpress.com/latex.php?latex=S_t&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='S_t' title='S_t' class='latex' />:</p>
<p><img src='http://s.wordpress.com/latex.php?latex=x_t%20%3D%20ln%28S_t%29%20-%20ln%28S_%7Bt-1%7D%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x_t = ln(S_t) - ln(S_{t-1})' title='x_t = ln(S_t) - ln(S_{t-1})' class='latex' /></p>
<p>The returns <img src='http://s.wordpress.com/latex.php?latex=x_t&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x_t' title='x_t' class='latex' /> now need to be partitioned into symbols <img src='http://s.wordpress.com/latex.php?latex=A_t&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='A_t' title='A_t' class='latex' /> of an alphabet with length <img src='http://s.wordpress.com/latex.php?latex=%5Clambda&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\lambda' title='\lambda' class='latex' />. The optimal choice of partitioning is a whole other question, but for now we just assume that it doesn't matter. The paper uses for <img src='http://s.wordpress.com/latex.php?latex=%5Clambda%20%3D%203&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\lambda = 3' title='\lambda = 3' class='latex' /> and partitions Dow Jones daily returns with the following thresholds:</p>
<p><img src='http://s.wordpress.com/latex.php?latex=x_t%20%3C%20-0.0025%20%5Crightarrow%20A_t%20%3D%200%2C%20x_t%20%3E%200.0034%20%5Crightarrow%20A_t%20%3D%202%2C%20%5Ctext%7Botherwise%20%7D%20A_t%20%3D%201&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='x_t &lt; -0.0025 \rightarrow A_t = 0, x_t &gt; 0.0034 \rightarrow A_t = 2, \text{otherwise } A_t = 1' title='x_t &lt; -0.0025 \rightarrow A_t = 0, x_t &gt; 0.0034 \rightarrow A_t = 2, \text{otherwise } A_t = 1' class='latex' /></p>
<p>This is obviously different for every stock we look at and needs to be chosen depending on its basic statistics (skew, mean).</p>
<p>A randomly chosen snippet of a trajectory <img src='http://s.wordpress.com/latex.php?latex=A_1%20%5Cdots%20A_n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='A_1 \dots A_n' title='A_1 \dots A_n' class='latex' /> then might look like the string 0112012220201.</p>
<p>This is all we need to know for our implementation, which will be described in the next post of this series.</p>
<img src="http://feeds.feedburner.com/~r/Oneminuspcom/~4/_l3-dc9mmys" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://oneminusp.com/quant/local-order-and-predictability-of-financial-time-series.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://oneminusp.com/quant/local-order-and-predictability-of-financial-time-series.html</feedburner:origLink></item>
		<item>
		<title>Calculating Entropy the Functional Way</title>
		<link>http://feedproxy.google.com/~r/Oneminuspcom/~3/vpdQRMvPisM/calculating-entropy-the-functional-way.html</link>
		<comments>http://oneminusp.com/code/calculating-entropy-the-functional-way.html#comments</comments>
		<pubDate>Fri, 22 Jan 2010 17:04:50 +0000</pubDate>
		<dc:creator />
				<category><![CDATA[Code]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[entropy]]></category>
		<category><![CDATA[functional programming]]></category>
		<category><![CDATA[reduce]]></category>

		<guid isPermaLink="false">http://oneminusp.com/?p=54</guid>
		<description><![CDATA[Previously, I wrote a short article on how to implement fold left in R. It was fairly obvious that there must be a builtin function for it in R. At the time, I just assumed it would be "reduce" or it would not exist, however the proper function name is called "Reduce" with a capital [...]]]></description>
			<content:encoded><![CDATA[<p>Previously, I wrote a short article on <a href="http://oneminusp.com/code/fold-left-in-r.html">how to implement fold left in R</a>. It was fairly obvious that there must be a builtin function for it in R. At the time, I just assumed it would be "reduce" or it would not exist, however the proper function name is called "Reduce" with a capital R -- as a side note, I do not really understand the naming scheme of functions in the R base library.</p>
<p>So here's the fairly obvious way on how to calculate Shannon's entropy in R using Reduce:</p>
<pre>&gt; fentropy &lt;- function(x,y) { x + (-y * log2(y)) }
&gt; Reduce(fentropy, c(0.5,0.5), 0)
[1] 1
&gt; Reduce(fentropy, c(0.25,0.25,0.25,0.25), 0)
[1] 2</pre>
<p>First for the binary case with answer 1, and then for four values uniformly distributed.</p>
<p>Last but not least, we could also write an entropy function the "R way" which uses its nice functions which work over vectors:</p>
<pre>entropy &lt;- function(ps) {
     H = -sum(ifelse(ps&gt;0, ps * log2(ps), 0))
     return(H)
}</pre>
<img src="http://feeds.feedburner.com/~r/Oneminuspcom/~4/vpdQRMvPisM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://oneminusp.com/code/calculating-entropy-the-functional-way.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://oneminusp.com/code/calculating-entropy-the-functional-way.html</feedburner:origLink></item>
		<item>
		<title>Information Theory and Financial Markets</title>
		<link>http://feedproxy.google.com/~r/Oneminuspcom/~3/SJuLLuQzZBY/information-theory-and-financial-markets.html</link>
		<comments>http://oneminusp.com/quant/information-theory-and-financial-markets.html#comments</comments>
		<pubDate>Sun, 17 Jan 2010 20:27:08 +0000</pubDate>
		<dc:creator />
				<category><![CDATA[Papers]]></category>
		<category><![CDATA[Quant]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[entropy]]></category>
		<category><![CDATA[forecasting]]></category>
		<category><![CDATA[information theory]]></category>
		<category><![CDATA[predictability]]></category>

		<guid isPermaLink="false">http://oneminusp.com/?p=45</guid>
		<description><![CDATA[I would like discuss and implement ideas from papers applying information theoretical (IT) notions to trading in financial markets. I will provide links to all papers I'll read on this topic and describe certain concepts in more detail.
The current list is:
Untertainty analysis in financial markets: can entropy be a solution?, Andreia Dionísio, Rui Menezes and [...]]]></description>
			<content:encoded><![CDATA[<p>I would like discuss and implement ideas from papers applying information theoretical (IT) notions to trading in financial markets. I will provide links to all papers I'll read on this topic and describe certain concepts in more detail.</p>
<p>The current list is:</p>
<p><em>Untertainty analysis in financial markets: can entropy be a solution?</em>, Andreia Dionísio, Rui Menezes and Diana A. Mendes (<a href="http://oneminusp.com/wp-content/uploads/2010/01/DionisioMenezesMendesPaper.pdf">pdf</a>)</p>
<p><em>Forecasting Foreign Exchange Market Movements via Entropy Coding</em>, Arman Glodjo, Campbell R. Harvey (<a href="http://oneminusp.com/wp-content/uploads/2010/01/W13_Forecasting_foreign_exchange.pdf">pdf</a>)</p>
<p><em>Local order, entropy and predictability of financial time series,</em> L. Molgedey and W. Ebeling (<a href="http://oneminusp.com/wp-content/uploads/2010/01/sd2.pdf">pdf</a>)</p>
<p>These three papers all use Shannon Entropy in place of more traditional statistical measures. What is interesting however is that all of them apply entropy in different ways.<br />
The first paper by Dionisio compares entropy as measure of uncertainty with variance/standard deviation in portfolio management.<br />
The second paper by Glodjo applies techniques from coding theory (the original and most successful application of IT) to forecasting high frequency time series. Also it provides good arguments for using IT in finance.<br />
The last paper by Molgedey is using conditional entropy directly on returns time series to quantify "local order" in highly stochastic time series. A local order would be a point in time where the next step is more predictable than average.</p>
<p>I will have a look at some of those techniques in more detail and might implement some of it to see if I can replicate the authors results.</p>
<img src="http://feeds.feedburner.com/~r/Oneminuspcom/~4/SJuLLuQzZBY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://oneminusp.com/quant/information-theory-and-financial-markets.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://oneminusp.com/quant/information-theory-and-financial-markets.html</feedburner:origLink></item>
		<item>
		<title>Understanding Biotech Companies</title>
		<link>http://feedproxy.google.com/~r/Oneminuspcom/~3/7IzQf1u3kvI/understanding-biotech-companies.html</link>
		<comments>http://oneminusp.com/investing/understanding-biotech-companies.html#comments</comments>
		<pubDate>Sun, 10 Jan 2010 12:54:12 +0000</pubDate>
		<dc:creator>greenpepper</dc:creator>
				<category><![CDATA[Investing]]></category>
		<category><![CDATA[biopharmacology]]></category>
		<category><![CDATA[biotech]]></category>
		<category><![CDATA[biotechnology]]></category>
		<category><![CDATA[pipeline]]></category>

		<guid isPermaLink="false">http://oneminusp.com/?p=13</guid>
		<description><![CDATA[Biotechnology companies that develop new medicines are not quite like other companies. Understanding how their product development process works is crucial for investing profitably.
To make a medical product from zero to market takes a long time and is a risky undertaking. For a new discovery, both safety and effectiveness have to be demonstrated over time. [...]]]></description>
			<content:encoded><![CDATA[<p>Biotechnology companies that develop new medicines are not quite like other companies. Understanding how their product development process works is crucial for investing profitably.</p>
<p>To make a medical product from zero to market takes a long time and is a risky undertaking. For a new discovery, both safety and effectiveness have to be demonstrated over time. Only then may the regulatory authorities grant a permission to sell the medicine on the market. This research process, with its multiple phases, is called <strong>a pipeline</strong>.</p>
<p><span id="more-13"></span>To reduce the risks involved, a company will have research <strong>projects</strong> ongoing for multiple <strong>candidate</strong> molecules simultaneously. The projects within a pipeline are usually in different phases, some more mature than others, and some just starting their entry into the development pipeline.</p>
<p style="text-align: center;"><a href="http://oneminusp.com/wp-content/uploads/2010/01/biotech_pipeline.png"><img class="aligncenter size-thumbnail wp-image-33" title="Biopharma pipeline" src="http://oneminusp.com/wp-content/uploads/2010/01/biotech_pipeline-150x150.png" alt="Biopharma pipeline" width="150" height="150" /></a>Image of a pipeline. Click the image to see it bigger.</p>
<p>On a high level, the pipeline can be split in two phases: <strong>discovery</strong> phase and<strong> development </strong>phase.</p>
<p>The discovery phase consists of finding suitable molecules for some chosen <strong>indication</strong>, such as pollen allergy, pain, inflammation, cancer etc. All aspects of the molecule are documented and examined, for example an analysis will be made of how difficult the molecule will be to manufacture.</p>
<p>After the discovery phase, a promising molecule enters the development phase. The development phase is split in two: to <strong>preclinical</strong> and <strong>clinical</strong> phases. The preclinical phase consists of non-human testing, whereas the clinical phase starts when testing on humans begins.</p>
<p>The clinical phase itself is split into three phases: (clinical) <strong>phase I</strong>, (clinical) <strong>phase II</strong> and (clinical) <strong>phase III</strong>.</p>
<p>The clinical phase I is conducted on healthy volunteers. The focus is on determining how the human body reacts to the candidate molecule, and on establishing a suitable dosage and dosage interval to minimize any side effects. This phase usually takes around 1 to 1.5 years. On average, a candidate molecule reaching this step will enter the market with <strong>20% probability</strong>.</p>
<p>The clinical phase II examines a larger group, this time consisting of <strong>patients</strong> (people with an <em>indication</em> which the molecule targets). The efficiency and safety of the molecule are examined. As part of efficiency testing, a part of the group is given placebo (a substance which does nothing) to make sure the actual molecule works better than the placebo effect. This phase usually takes around 1.5 to 2.5 years. On average, a candidate molecule reaching this step will enter the market with <strong>30% probability</strong>.</p>
<p>The clinical phase III examines yet a larger group of patients. The purpose is to confirm the effects seen in clinical phase II, and to verify the safety of the candidate molecule. This phase usually takes around 2 to 4 years. On average, a candidate molecule reaching this step will enter the market with <strong>70% probability</strong>.</p>
<p>Therefore, the average time to market after human (clinical) trials start is somewhere within 4.5 to 8 years. Sometimes, for example, when there are health problems or even deaths of a patient during clinical phases, the authorities may require more data to assure safety. This will further delay the molecule's entry to the market.</p>
<p>After the clinical trials are completed, the regulatory authorities will make a decision of whether the molecule will be allowed to enter the market. After this, the medicine can be sold and used by patients.</p>
<p>Why is understanding this so important?</p>
<p>Since the probability of a candidate molecule reaching the market grows immensely when entering the clinical phase III (from ca. 30% to 70%), this will be reflected on the valuation of the company. Other <strong>milestone </strong>events such as entering clinical phase II will have the same effect, but to a lesser degree.</p>
<p>Smaller biotechnology companies doing pharmacological development usually need or want to partner with a bigger company. In such a case, the bigger company buys rights for the molecule and pays the smaller company as development progresses: when a milestone is reached, a milestone payment will be triggered. These payments are often a substantial source of revenue for the smaller company until the molecule reaches the market.</p>
<p>Also other news, such as finding enough patients for a phase II clinical trial (in order for it to start) will be a positive signal.</p>
<p>In a nutshell, transitioning forward along the stages within the pipeline will have major effects on the future value of the company. In other words, the stock price tends to jump at these points.</p>
<p>Vice versa, negative news will drive the stock price down due to lost future value. These can be problems or fatalities in clinical testing, or lack of demonstrated clinical efficiency, or further information requests from authorities, and so on.</p>
<p>To understand the pipeline and the projects within the pipeline is a key requirement when making a rational investment decision.</p>
<p>The best place to find pipeline information is the company's web page.</p>
<p>Happy analysing!</p>
<img src="http://feeds.feedburner.com/~r/Oneminuspcom/~4/7IzQf1u3kvI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://oneminusp.com/investing/understanding-biotech-companies.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://oneminusp.com/investing/understanding-biotech-companies.html</feedburner:origLink></item>
		<item>
		<title>fold left in R</title>
		<link>http://feedproxy.google.com/~r/Oneminuspcom/~3/a__3Fa8JOSQ/fold-left-in-r.html</link>
		<comments>http://oneminusp.com/code/fold-left-in-r.html#comments</comments>
		<pubDate>Sun, 10 Jan 2010 01:44:02 +0000</pubDate>
		<dc:creator />
				<category><![CDATA[Code]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[dice]]></category>
		<category><![CDATA[functional programming]]></category>
		<category><![CDATA[mean]]></category>
		<category><![CDATA[variance]]></category>

		<guid isPermaLink="false">http://oneminusp.com/?p=17</guid>
		<description><![CDATA[Often used high order functions in functional programming are left and right folds.
A left fold [foldleft f accu l] applies the head of the list l to the function f together with the accumulator variable accu. The result is the new accumulator which is used in the next recursive call together with the tail of [...]]]></description>
			<content:encoded><![CDATA[<p>Often used high order functions in functional programming are left and right folds.</p>
<p>A left fold <em>[foldleft f accu l]</em> applies the head of the list <em>l</em> to the function <em>f </em>together with the accumulator variable <em>accu</em>. The result is the new accumulator which is used in the next recursive call together with the tail of the list.</p>
<p>A left (or right) fold is easily implemented in R as follows:</p>
<pre>foldleft &lt;- function(f,accu,l) {
	if(length(l)==0) {
		accu
	} else {
		head &lt;- l[1];
		tail &lt;- l[-1];
		foldleft(f, (f(accu, head)) , tail)
	}
}</pre>
<p>To see how it works, we could apply it to calculate the variance of a fair die. Remember the variance is just <img src='http://s.wordpress.com/latex.php?latex=Var%28X%29%20%3D%20E%5B%28X-%5Cmu%29%5E2%5D%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Var(X) = E[(X-\mu)^2] ' title='Var(X) = E[(X-\mu)^2] ' class='latex' /> where <img src='http://s.wordpress.com/latex.php?latex=%5Cmu&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\mu' title='\mu' class='latex' /> is the mean, which is implemented in the following function <em>f</em>:</p>
<pre>mean&lt;-sum(1:6)/6

f &lt;- function(accu,i) {
	accu+(1/6 * (i-mean)^2)
}

foldleft(f,0,c(1,2,3,4,5,6))</pre>
<p>where our last call to <em>foldleft</em> evaluates to 2.91667.</p>
<img src="http://feeds.feedburner.com/~r/Oneminuspcom/~4/a__3Fa8JOSQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://oneminusp.com/code/fold-left-in-r.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://oneminusp.com/code/fold-left-in-r.html</feedburner:origLink></item>
	</channel>
</rss>
