<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Quantitative thoughts</title>
	
	<link>http://www.investuotojas.eu</link>
	<description>Quantitative investment strategies</description>
	<lastBuildDate>Tue, 15 May 2012 10:48:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/investuotojas" /><feedburner:info uri="investuotojas" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>GitHub data analysis</title>
		<link>http://feedproxy.google.com/~r/investuotojas/~3/uKTQpSKJeGM/</link>
		<comments>http://www.investuotojas.eu/2012/05/15/github-data-analysis/#comments</comments>
		<pubDate>Tue, 15 May 2012 10:48:49 +0000</pubDate>
		<dc:creator>Dzidorius Martinaitis</dc:creator>
				<category><![CDATA[data analysis]]></category>
		<category><![CDATA[EN]]></category>
		<category><![CDATA[R-language]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[quantitative]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.investuotojas.eu/?p=846</guid>
		<description><![CDATA[Few weeks ago GitHub announced, that its timeline data is available on bigquery for analysis. Moreover, it offers prizes for the best visualization of the data. Despite my art skills and minimal chances to win beauty contest, I decided to crunch GitHub data and run data analysis. After initial trial of bigquery service, I found hard [...]]]></description>
			<content:encoded><![CDATA[<p>Few weeks ago GitHub <a href="https://github.com/blog/1112-data-at-github" target="_blank">announced</a>, that its timeline data is available on <a href="https://bigquery.cloud.google.com" target="_blank">bigquery</a> for analysis. Moreover, it <a href="https://github.com/blog/1118-the-github-data-challenge" target="_blank">offers prizes</a> for the best visualization of the data. Despite my art skills and minimal chances to win beauty contest, I decided to crunch GitHub data and run data analysis.</p>
<p>After initial trial of bigquery service, I found hard to know, what price, if any, I&#8217;m going to pay for the service. Hence, I pulled the data (6.5 GB) from bigquery on my machine and further I used my machine for analysis. Bash scripts have been used to clean up and extract necessary data, R for data analysis and visualization and C++ for text extraction.</p>
<p>GitHub dataset is one table, where each row consist of information about repository (i.e. path, date of creation, name, description, programming language, number of forks/watchers and etc.) and action, which was done by user (i.e. username, location, timestamp and etc.).</p>
<p>As a result, we can check how GitHub users actions are spread over time during the day. The X axis on the graph below is labeled with the hours of the day (GMT) and the Y axis represent median values of the actions for each hour. From it, we can make a deduction, that highest load for GitHub can be expected between 15:00 and 17:00 GMT and lowest to be expected between 05:00 and 07:00 GMT. The color of the line indicates how busy was the day based on quantiles: green are calm days (20% of days), blue &#8211; normal days (50% quantile) and red are busy days (80% quantile). I should to mention, that auto-correlation or serial correlation is high (70% for following hour), which means, that busy hours tend to be followed by busy hours and calm hours tend to be followed by calm hours. Moreover, busy days tend happen after busy days.</p>
<p><a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=actions.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/actions.png" alt="Photobucket" border="0" /></a></p>
<p>Second graph below shows median of actions divided by weekdays. There is not big surprise &#8211; weekends are more slow than weekdays, nevertheless the programmers are slightly less productive on Mondays and Fridays.</p>
<p><a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=actions_weekdays.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/actions_weekdays.png" alt="Photobucket" border="0" /></a></p>
<p>The analysis of creation of new repository shows, that the pattern of busy or calm hours remains over the years. This can be attributed to the fact, that majority of the users comes from North America and Europe.<br />
Another hypothesis can be drawn from this information, that number of creation of the new repositories grow exponentially. However, I mind you, that the graph below is biased &#8211; most likely, GitHub users update recent projects, consequently more recent projects appeared on timeline. Even though, 2009-2011 years show exponential grow.<br />
The X axis of the graph below is labeled with the hour of the day, the Y axis &#8211; log of median values of new repositories.</p>
<p><a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=new_repos.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/new_repos.png" alt="Photobucket" border="0" /></a></p>
<p>Following graph shows the number of forks per project (the X axis, log scale) versus number of watchers (the Y axis, log scale). As expected, there is linear correlation between forks and watchers. Even so there is something interesting about outliers, which are below bottom line &#8211; the projects, where number of watchers is low, but number of forks is high. These are anomalies and worth to check.</p>
<p><a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=fork_watch.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/fork_watch.png" alt="Photobucket" border="0" /></a></p>
<p>The next thing to do is to look at the repository description. Let&#8217;s group the repositories by programming language and count most dominant words in the description. The graph below has C++ word cloud on the left and Java &#8211; right . C++ projects are about library, game, simple(?), engine, Arduino. Java is dominated by android, plugin, server, minecraft, spring, maven.</p>
<p><a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=cpp_java.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/cpp_java.png" alt="Photobucket" border="0" /></a><br />
Ruby (left) vs Python(right ):<br />
<a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=ruby_python.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/ruby_python.png" alt="Photobucket" border="0" /></a></p>
<p>&#8220;Surprise&#8221;, &#8220;surprise&#8221; &#8211; R projects (left) are largely about data analysis, however &#8220;machine&#8221; word, which corresponds to Machine learning is very tiny. Shell (right) is dominated by configuration, managing, git(?).</p>
<p><a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=r_bash.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/r_bash.png" alt="Photobucket" border="0" /></a></p>
<p>GitHub dataset includes location field. Unfortunately, the users can enter whatever they want &#8211; country, city or leave it empty. Nevertheless, I was able to extract good chunk of actions, where location field has meaningful value.  The video below shows country based users activity, where dark red corresponds to high activity and light red &#8211; minor. Only 30 most active countries are included, the rest are grey.<br />
The same pattern persist over the days &#8211; activity in Asia increases around midnight, Europe wakes up around 8:00 or 9:00, where America starts around 15:00. Who said, that hackers and programmers work at night?<br />
<iframe src="http://player.vimeo.com/video/42186230" frameborder="0" width="500" height="369"></iframe></p>
<p>&nbsp;</p>
<p>What else can be done with GitHub dataset? Most repositories have description field, which can be used to find similar projects by implementing <a href="http://en.wikipedia.org/wiki/Tf*idf">tf-idf</a> method. I tried that method and the results are satisfying.</p>
<p>Most of the graphs shown above are reproducible (except word clouds) and the code can be found on <a href="https://github.com/kafka399/githubdata" target="_blank">GitHub</a>.</p>

<p><a href="http://feedads.g.doubleclick.net/~a/rpkXch0BBH61_eniUalRaCZDHAU/0/da"><img src="http://feedads.g.doubleclick.net/~a/rpkXch0BBH61_eniUalRaCZDHAU/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/rpkXch0BBH61_eniUalRaCZDHAU/1/da"><img src="http://feedads.g.doubleclick.net/~a/rpkXch0BBH61_eniUalRaCZDHAU/1/di" border="0" ismap="true"></img></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.investuotojas.eu/2012/05/15/github-data-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.investuotojas.eu/2012/05/15/github-data-analysis/</feedburner:origLink></item>
		<item>
		<title>Machine learning for identification of cars</title>
		<link>http://feedproxy.google.com/~r/investuotojas/~3/szutkqOnfkI/</link>
		<comments>http://www.investuotojas.eu/2012/04/22/machine-learning-for-identification-of-cars/#comments</comments>
		<pubDate>Sun, 22 Apr 2012 14:53:33 +0000</pubDate>
		<dc:creator>Dzidorius Martinaitis</dc:creator>
				<category><![CDATA[EN]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[R-language]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[ML]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.investuotojas.eu/?p=814</guid>
		<description><![CDATA[There are plenty of data on internet, however it is raw data. Think for a second about public surveillance cameras - useful to check the traffic on the route or busy place, but anything else? What if you want to know how many cars are on the route? How many car were yesterday at the same time? [...]]]></description>
			<content:encoded><![CDATA[<p>There are plenty of data on internet, however it is raw data. Think for a second about public surveillance cameras - useful to check the traffic on the route or busy place, but anything else? What if you want to know how many cars are on the route? How many car were yesterday at the same time? Given so many cars on the route, how much polluted air in the area?<br />
While working on the road map for data dive event, I started to wonder, how feasible is to use data of public surveillance cameras. So I quickly built a pilot project and now I would like to share my experience.</p>
<p>First step &#8211; <strong>data acquisition</strong>. At beginning I was thinking to plug my smartphone somewhere and collect data of the busy route.  Nevertheless, I quickly found surveillance cameras in Vilnius and started to collect images. Run a search and I&#8217;m sure, that you will find them in your city:</p>
<p><a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=example.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/example.png" alt="Photobucket" border="0" /></a></p>
<p>Here is bash script, which I use to collect images:</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p814code4'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p8144"><td class="code" id="p814code4"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#you need full path for crontab</span>
<span style="color: #7a0874; font-weight: bold;">cd</span> <span style="color: #000000; font-weight: bold;">/</span>home<span style="color: #000000; font-weight: bold;">/</span>git<span style="color: #000000; font-weight: bold;">/</span>carCount<span style="color: #000000; font-weight: bold;">/</span>img
<span style="color: #007800;">a</span>=<span style="color: #000000; font-weight: bold;">`</span><span style="color: #c20cb9; font-weight: bold;">date</span> +<span style="color: #000000; font-weight: bold;">%</span>s<span style="color: #000000; font-weight: bold;">`</span>
<span style="color: #007800;">b</span>=<span style="color: #800000;">${a}</span>_4.jpg
<span style="color: #c20cb9; font-weight: bold;">wget</span> <span style="color: #660033;">-O</span> <span style="color: #007800;">$b</span> <span style="color: #660033;">-q</span> <span style="color: #ff0000;">&quot;http://www.sviesoforai.lt/map/camera.aspx?size=full&amp;amp;image=K7742-1.jpg&amp;amp;rnd=0.15417794161476195&quot;</span></pre></td></tr></table></div>

<p><strong>Data preparation</strong>. After while you will have enough data to train your machine (for beginning more than 30 images should be O.K.).<br />
How do we train the algorithm? The goal is to identify the cars in a given image. That means, that we have to provide the examples of positive images (clear image of the cars) and negative images (no car, parts of the car and etc.). Important note &#8211; we don&#8217;t feed whole image, but we cut a chosen image with sliding window (100&#215;100 in my case). 4 examples of positive images:</p>
<p><a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=4.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/4.png" alt="Photobucket" border="0" /></a></p>
<p>Meanwhile, it is worth converting each image to <a href="en.wikipedia.org/wiki/Netpbm_format" target="_blank">portable grey format PGM</a>. For this specific task, we can sacrifice information about the color of the car &#8211; it won&#8217;t improve prediction. Besides, PGM images can be loaded into R and easily transformed into matrix. Here is bash script, which converts jpg to pgm and slices each image:</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p814code5'); return false;">View Code</a> BASH</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p8145"><td class="code" id="p814code5"><pre class="bash" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#remove image duplicates</span>
<span style="color: #c20cb9; font-weight: bold;">find</span> . <span style="color: #660033;">-maxdepth</span> <span style="color: #000000;">1</span> <span style="color: #660033;">-type</span> f <span style="color: #660033;">-exec</span> md5sum <span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #7a0874; font-weight: bold;">&#125;</span> \;  <span style="color: #000000; font-weight: bold;">&amp;</span>gt;test.txt
<span style="color: #c20cb9; font-weight: bold;">awk</span> <span style="color: #ff0000;">'a[$1]++ {gsub(/^\*/,&quot;&quot;,$2); print &quot;rm &quot;, $2}'</span> test.txt <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">sh</span>
<span style="color: #c20cb9; font-weight: bold;">rm</span> test.txt
&nbsp;
<span style="color: #666666; font-style: italic;">#convert jpg</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">if</span> <span style="color: #7a0874; font-weight: bold;">&#91;</span> <span style="color: #660033;">-d</span> <span style="color: #ff0000;">&quot;out&quot;</span> <span style="color: #7a0874; font-weight: bold;">&#93;</span>; <span style="color: #000000; font-weight: bold;">then</span>
        <span style="color: #c20cb9; font-weight: bold;">rm</span> <span style="color: #660033;">-r</span> out
<span style="color: #000000; font-weight: bold;">fi</span>
<span style="color: #c20cb9; font-weight: bold;">mkdir</span> out
<span style="color: #000000; font-weight: bold;">for</span> k <span style="color: #000000; font-weight: bold;">in</span> $<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #c20cb9; font-weight: bold;">ls</span> <span style="color: #000000; font-weight: bold;">*</span>.jpg<span style="color: #7a0874; font-weight: bold;">&#41;</span>; <span style="color: #000000; font-weight: bold;">do</span> convert <span style="color: #007800;">$k</span> out<span style="color: #000000; font-weight: bold;">/</span><span style="color: #007800;">$k</span>.pgm; <span style="color: #000000; font-weight: bold;">done</span>
&nbsp;
<span style="color: #7a0874; font-weight: bold;">cd</span> out
<span style="color: #c20cb9; font-weight: bold;">mkdir</span> slide
<span style="color: #000000; font-weight: bold;">for</span> filename <span style="color: #000000; font-weight: bold;">in</span> $<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #c20cb9; font-weight: bold;">ls</span> <span style="color: #000000; font-weight: bold;">*</span>.pgm<span style="color: #7a0874; font-weight: bold;">&#41;</span>;
 <span style="color: #000000; font-weight: bold;">do</span> 
&nbsp;
<span style="color: #007800;">w</span>=<span style="color: #000000; font-weight: bold;">`</span>convert <span style="color: #007800;">$filename</span> <span style="color: #660033;">-print</span> <span style="color: #ff0000;">&quot;%w&quot;</span> <span style="color: #000000; font-weight: bold;">/</span>dev<span style="color: #000000; font-weight: bold;">/</span>null<span style="color: #000000; font-weight: bold;">`</span>
<span style="color: #007800;">h</span>=<span style="color: #000000; font-weight: bold;">`</span>convert <span style="color: #007800;">$filename</span> <span style="color: #660033;">-print</span> <span style="color: #ff0000;">&quot;%h&quot;</span> <span style="color: #000000; font-weight: bold;">/</span>dev<span style="color: #000000; font-weight: bold;">/</span>null<span style="color: #000000; font-weight: bold;">`</span>
<span style="color: #7a0874; font-weight: bold;">let</span> <span style="color: #ff0000;">&quot;ww= <span style="color: #007800;">$w</span>/100&quot;</span>
<span style="color: #7a0874; font-weight: bold;">let</span> <span style="color: #ff0000;">&quot;hh= <span style="color: #007800;">$h</span>/100&quot;</span>
<span style="color: #000000; font-weight: bold;">for</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #007800;">y</span>=<span style="color: #000000;">150</span>;y<span style="color: #000000; font-weight: bold;">&lt;</span>=<span style="color: #000000;">250</span>;y+=<span style="color: #000000;">50</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #000000; font-weight: bold;">do</span>
<span style="color: #000000; font-weight: bold;">for</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #007800;">i</span>=<span style="color: #000000;">100</span>;i<span style="color: #000000; font-weight: bold;">&lt;</span>=<span style="color: #000000;">400</span>;i+=<span style="color: #000000;">50</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #000000; font-weight: bold;">do</span>
<span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;slide/<span style="color: #007800;">$i</span>.<span style="color: #007800;">$filename</span>&quot;</span>
<span style="color: #7a0874; font-weight: bold;">let</span> <span style="color: #ff0000;">&quot;h_slide=<span style="color: #007800;">$i</span>&quot;</span>
convert <span style="color: #007800;">$filename</span> <span style="color: #660033;">-crop</span> 100x100+<span style="color: #007800;">$i</span>+<span style="color: #007800;">$y</span> slide<span style="color: #000000; font-weight: bold;">/</span><span style="color: #007800;">$y</span>.<span style="color: #007800;">$i</span>.<span style="color: #007800;">$filename</span>
<span style="color: #000000; font-weight: bold;">done</span>
<span style="color: #000000; font-weight: bold;">done</span>
<span style="color: #000000; font-weight: bold;">done</span></pre></td></tr></table></div>

<p><strong>Training, predicting, cross validation</strong>. Now is time to open R, load 100&#215;100 images from &#8220;train/out/slide&#8221; directory and train the algorithm. Important note &#8211; each image is a matrix, however you have to feed a matrix of all images to learning algorithm (support vector machine in my case). What you have to do is to &#8220;unroll&#8221; each image matrix into a vector, get 1X10000 vector and build a new matrix, where each row is an image.<br />
Once training is done, load unseen data from &#8220;crossval/out/slide&#8221; directory and check &#8220;result/&#8221; directory, where you will find  images of the cars. R script, which does all above:</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p814code6'); return false;">View Code</a> RSPLUS</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p8146"><td class="code" id="p814code6"><pre class="rsplus" style="font-family:monospace;"><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/setwd.html"><span style="color: #0000FF; font-weight: bold;">setwd</span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'/home/git/carCount/'</span><span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;">######read positives############</span>
files<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/list.files.html"><span style="color: #0000FF; font-weight: bold;">list.<span style="">files</span></span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'test/pos/'</span><span style="color: #080;">&#41;</span>
pos<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/matrix.html"><span style="color: #0000FF; font-weight: bold;">matrix</span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/nrow.html"><span style="color: #0000FF; font-weight: bold;">nrow</span></a><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/NROW.html"><span style="color: #0000FF; font-weight: bold;">NROW</span></a><span style="color: #080;">&#40;</span>files<span style="color: #080;">&#41;</span>,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/ncol.html"><span style="color: #0000FF; font-weight: bold;">ncol</span></a><span style="color: #080;">=</span><span style="color: #ff0000;">100</span><span style="color: #080;">*</span><span style="color: #ff0000;">100</span><span style="color: #080;">&#41;</span>
&nbsp;
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/for.html"><span style="color: #0000FF; font-weight: bold;">for</span></a><span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/NROW.html"><span style="color: #0000FF; font-weight: bold;">NROW</span></a><span style="color: #080;">&#40;</span>files<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#123;</span>
  gray_file<span style="color: #080;">=</span>read.<span style="">pnm</span><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/paste.html"><span style="color: #0000FF; font-weight: bold;">paste</span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'test/pos/'</span>,files<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span>,sep<span style="color: #080;">=</span><span style="color: #ff0000;">''</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  pos<span style="color: #080;">&#91;</span>i,<span style="color: #080;">&#93;</span><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/c.html"><span style="color: #0000FF; font-weight: bold;">c</span></a><span style="color: #080;">&#40;</span>gray_file@<a href="http://astrostatistics.psu.edu/su07/R/html/stats/html/summary.lm.html"><span style="color: #0000FF; font-weight: bold;">grey</span></a><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span>
outcome<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/vector.html"><span style="color: #0000FF; font-weight: bold;">vector</span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/length.html"><span style="color: #0000FF; font-weight: bold;">length</span></a><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/NROW.html"><span style="color: #0000FF; font-weight: bold;">NROW</span></a><span style="color: #080;">&#40;</span>files<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
outcome<span style="color: #080;">&#91;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/which.html"><span style="color: #0000FF; font-weight: bold;">which</span></a><span style="color: #080;">&#40;</span>outcome<span style="color: #080;">!=</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span><span style="color: #080;">=</span><span style="color: #ff0000;">1</span>
&nbsp;
<span style="color: #228B22;">########read negatives#############</span>
files<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/list.files.html"><span style="color: #0000FF; font-weight: bold;">list.<span style="">files</span></span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'test/neg/'</span><span style="color: #080;">&#41;</span>
neg<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/matrix.html"><span style="color: #0000FF; font-weight: bold;">matrix</span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/nrow.html"><span style="color: #0000FF; font-weight: bold;">nrow</span></a><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/NROW.html"><span style="color: #0000FF; font-weight: bold;">NROW</span></a><span style="color: #080;">&#40;</span>files<span style="color: #080;">&#41;</span>,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/ncol.html"><span style="color: #0000FF; font-weight: bold;">ncol</span></a><span style="color: #080;">=</span><span style="color: #ff0000;">100</span><span style="color: #080;">*</span><span style="color: #ff0000;">100</span><span style="color: #080;">&#41;</span>
&nbsp;
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/for.html"><span style="color: #0000FF; font-weight: bold;">for</span></a><span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/NROW.html"><span style="color: #0000FF; font-weight: bold;">NROW</span></a><span style="color: #080;">&#40;</span>files<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#123;</span>
  gray_file<span style="color: #080;">=</span>read.<span style="">pnm</span><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/paste.html"><span style="color: #0000FF; font-weight: bold;">paste</span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'test/neg/'</span>,files<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span>,sep<span style="color: #080;">=</span><span style="color: #ff0000;">''</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  neg<span style="color: #080;">&#91;</span>i,<span style="color: #080;">&#93;</span><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/c.html"><span style="color: #0000FF; font-weight: bold;">c</span></a><span style="color: #080;">&#40;</span>gray_file@<a href="http://astrostatistics.psu.edu/su07/R/html/stats/html/summary.lm.html"><span style="color: #0000FF; font-weight: bold;">grey</span></a><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span>
tmp<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/vector.html"><span style="color: #0000FF; font-weight: bold;">vector</span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/length.html"><span style="color: #0000FF; font-weight: bold;">length</span></a><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/NROW.html"><span style="color: #0000FF; font-weight: bold;">NROW</span></a><span style="color: #080;">&#40;</span>files<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
tmp<span style="color: #080;">&#91;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/which.html"><span style="color: #0000FF; font-weight: bold;">which</span></a><span style="color: #080;">&#40;</span>tmp<span style="color: #080;">!=</span><span style="color: #ff0000;">0</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span><span style="color: #080;">=</span><span style="color: #ff0000;">0</span>
outcome<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/c.html"><span style="color: #0000FF; font-weight: bold;">c</span></a><span style="color: #080;">&#40;</span>outcome,tmp<span style="color: #080;">&#41;</span>
forecast<span style="color: #080;">=</span>svm<span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/rbind.html"><span style="color: #0000FF; font-weight: bold;">rbind</span></a><span style="color: #080;">&#40;</span>pos,neg<span style="color: #080;">&#41;</span>,outcome<span style="color: #080;">&#41;</span>
cross_val<span style="color: #080;">=</span>pos<span style="color: #080;">&#91;</span><span style="color: #ff0000;">84</span><span style="color: #080;">:</span><span style="color: #ff0000;">90</span>,<span style="color: #080;">&#93;</span>
pred<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">predict</span><span style="color: #080;">&#40;</span>forecast,cross_val,decision.<span style="">values</span><span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;">##########################unseen data######################</span>
files<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/list.files.html"><span style="color: #0000FF; font-weight: bold;">list.<span style="">files</span></span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'crossval/out/slide/'</span><span style="color: #080;">&#41;</span>
cross<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/matrix.html"><span style="color: #0000FF; font-weight: bold;">matrix</span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/nrow.html"><span style="color: #0000FF; font-weight: bold;">nrow</span></a><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/NROW.html"><span style="color: #0000FF; font-weight: bold;">NROW</span></a><span style="color: #080;">&#40;</span>files<span style="color: #080;">&#41;</span>,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/ncol.html"><span style="color: #0000FF; font-weight: bold;">ncol</span></a><span style="color: #080;">=</span><span style="color: #ff0000;">100</span><span style="color: #080;">*</span><span style="color: #ff0000;">100</span><span style="color: #080;">&#41;</span>
&nbsp;
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/for.html"><span style="color: #0000FF; font-weight: bold;">for</span></a><span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/NROW.html"><span style="color: #0000FF; font-weight: bold;">NROW</span></a><span style="color: #080;">&#40;</span>files<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#123;</span>
  gray_file<span style="color: #080;">=</span>read.<span style="">pnm</span><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/paste.html"><span style="color: #0000FF; font-weight: bold;">paste</span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'crossval/out/slide/'</span>,files<span style="color: #080;">&#91;</span>i<span style="color: #080;">&#93;</span>,sep<span style="color: #080;">=</span><span style="color: #ff0000;">''</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  cross<span style="color: #080;">&#91;</span>i,<span style="color: #080;">&#93;</span><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/c.html"><span style="color: #0000FF; font-weight: bold;">c</span></a><span style="color: #080;">&#40;</span>gray_file@<a href="http://astrostatistics.psu.edu/su07/R/html/stats/html/summary.lm.html"><span style="color: #0000FF; font-weight: bold;">grey</span></a><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span>
pred<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">predict</span><span style="color: #080;">&#40;</span>forest,cross,decision.<span style="">values</span><span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;">###############copy positives into result directory###############</span>
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/dir.create.html"><span style="color: #0000FF; font-weight: bold;">dir.<span style="">create</span></span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'result'</span><span style="color: #080;">&#41;</span>
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/file.copy.html"><span style="color: #0000FF; font-weight: bold;">file.<span style="">copy</span></span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/paste.html"><span style="color: #0000FF; font-weight: bold;">paste</span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'crossval/out/slide/'</span>,files<span style="color: #080;">&#91;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/which.html"><span style="color: #0000FF; font-weight: bold;">which</span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.double.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">double</span></span></a><span style="color: #080;">&#40;</span>pred<span style="color: #080;">&#41;</span><span style="color: #080;">&amp;</span>gt<span style="color: #080;">;</span><span style="color: #ff0000;">0.6</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span>,sep<span style="color: #080;">=</span><span style="color: #ff0000;">''</span><span style="color: #080;">&#41;</span>,<span style="color: #ff0000;">'result/'</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>

<p>&nbsp;</p>
<p>Classified as positive by algorithm:<br />
<a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=pos.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/pos.png" alt="Photobucket" border="0" /></a></p>
<p>Classified as negative by algorithm:<br />
<a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=neg.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/neg.png" alt="Photobucket" border="0" /></a></p>
<p>&nbsp;</p>
<p><strong>Conclusion</strong>. It is truly amazing how well algorithm is able to separate wheat from the chaff without additional tuning. Mind you, my impression is biased after so many fails with financial data, which is noisy and good predictions are scarce.<br />
Nevertheless, this project is far away for ideal &#8211; it doesn&#8217;t take into account weather condition, traffic jams, perspective view, movements of the camera and etc. But I leave this fun for data-dive event.</p>
<p><strong>Fork the code</strong>: <a href="https://github.com/kafka399/carCount/" target="_blank">https://github.com/kafka399/carCount/</a></p>

<p><a href="http://feedads.g.doubleclick.net/~a/qwaIe2W0zS3JBoXVFQkKdeZGK5c/0/da"><img src="http://feedads.g.doubleclick.net/~a/qwaIe2W0zS3JBoXVFQkKdeZGK5c/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/qwaIe2W0zS3JBoXVFQkKdeZGK5c/1/da"><img src="http://feedads.g.doubleclick.net/~a/qwaIe2W0zS3JBoXVFQkKdeZGK5c/1/di" border="0" ismap="true"></img></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.investuotojas.eu/2012/04/22/machine-learning-for-identification-of-cars/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		<feedburner:origLink>http://www.investuotojas.eu/2012/04/22/machine-learning-for-identification-of-cars/</feedburner:origLink></item>
		<item>
		<title>How to organize R user group</title>
		<link>http://feedproxy.google.com/~r/investuotojas/~3/N_XJHhbQIkc/</link>
		<comments>http://www.investuotojas.eu/2012/04/18/how-to-organize-r-user-group/#comments</comments>
		<pubDate>Wed, 18 Apr 2012 11:12:32 +0000</pubDate>
		<dc:creator>Dzidorius Martinaitis</dc:creator>
				<category><![CDATA[EN]]></category>
		<category><![CDATA[R-language]]></category>
		<category><![CDATA[Lithuania]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.investuotojas.eu/?p=795</guid>
		<description><![CDATA[The first thing, what you have to do is to estimate how many users will be interested in local R group. I would say, that out of one million inhabitants you can expect 10-20 users. Based on this raw number, you can know, what challenges are waiting for you. If you expect 100 or more users, you have [...]]]></description>
			<content:encoded><![CDATA[<p>The first thing, what you have to do is to estimate how many users will be interested in local R group. I would say, that out of one million inhabitants you can expect 10-20 users. Based on this raw number, you can know, what challenges are waiting for you. If you expect 100 or more users, you have to think about the appropriate place to hold first meetup, how to manage so many people, what topics to present first time.<br />
However, if you expect small community (as I did), your challenge is spreading the news about local R group. Get to know a few local users and ask them, what do they think about a meetup. The right place to find such users would be local university. Most likely the local university will be happy to provide a place for the first official meeting.</p>
<p>Fortunately for me, I met a <a href="http://vzemlys.wordpress.com/" target="_blank">powerful</a> R user &#8211; <a href="https://twitter.com/#!/mpiktas" target="_blank">@mpiktas</a>, which is a lecturer at Vilnius University. With his help I was able to identify more R-infected users and get premises at university for presentation.</p>
<p>Next step, when you are sure, that you are not alone, is to choose a name for the group, build simple web site from scratch or by using a service like <a href="http://www.meetup.com">meetup</a>. This won&#8217;t cost you a fortune, however I need to say, that you can <a href="http://www.revolutionanalytics.com/news-events/r-user-group/" target="_blank">apply for sponsorship</a>. Revolution Analytics not only provides sponsorship, but as well manages <a href="http://blog.revolutionanalytics.com/local-r-groups.html" target="_blank">a dictionary of R user group</a>.</p>
<p>Once you have created virtual community you have to thinking about a meetup. As a founder, prepare an introduction about local R group, its future plans, your own usage of R. Here mine, which I used for <a href="http://www.VilniusR.org" target="_blank">VilniusR</a> introduction:</p>
<div id="__ss_12506088" style="width: 425px;">
<p><strong style="display: block; margin: 12px 0 4px;"><a title="R language presentation" href="http://www.slideshare.net/kafka399/vilniusr-group" target="_blank">R language presentation</a></strong> <iframe src="http://www.slideshare.net/slideshow/embed_code/12506088" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
</div>
<p>&nbsp;</p>
<p>What&#8217;s next? During first meetup you can outline future meetup, however I found fascinating, that many participants expressed an interest in datadive event. So, now we are in the process of organizing such event!</p>
<p>p.s. if you happen to be from Luxembourg and you are interested in local user group &#8211; <a href="http://www.investuotojas.eu/contact-me/" target="_blank">let me know</a>.</p>

<p><a href="http://feedads.g.doubleclick.net/~a/Q4q2OL5G18hVNs3Ev3dSIDmxh_I/0/da"><img src="http://feedads.g.doubleclick.net/~a/Q4q2OL5G18hVNs3Ev3dSIDmxh_I/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/Q4q2OL5G18hVNs3Ev3dSIDmxh_I/1/da"><img src="http://feedads.g.doubleclick.net/~a/Q4q2OL5G18hVNs3Ev3dSIDmxh_I/1/di" border="0" ismap="true"></img></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.investuotojas.eu/2012/04/18/how-to-organize-r-user-group/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.investuotojas.eu/2012/04/18/how-to-organize-r-user-group/</feedburner:origLink></item>
		<item>
		<title>Levenshtein distance in C++ and code profiling in R</title>
		<link>http://feedproxy.google.com/~r/investuotojas/~3/XxQiIJ0-O10/</link>
		<comments>http://www.investuotojas.eu/2012/03/25/levenshtein-distance-in-c-and-code-profiling-in-r/#comments</comments>
		<pubDate>Sun, 25 Mar 2012 15:26:23 +0000</pubDate>
		<dc:creator>Dzidorius Martinaitis</dc:creator>
				<category><![CDATA[EN]]></category>
		<category><![CDATA[quant]]></category>
		<category><![CDATA[R-language]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[quantitative]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.investuotojas.eu/?p=764</guid>
		<description><![CDATA[At work, the client requested, if existing search engine could accept singular and plural forms equally, e. g. &#8220;partner&#8221; and &#8220;partners&#8221; would lead to the same result. The first option &#8211; stemming. In that case, search engine would use root of a word, e. g. &#8220;partn&#8221;. However, stemming has many weaknesses: two different words might have same root, a [...]]]></description>
			<content:encoded><![CDATA[<p>At work, the client requested, if existing search engine could accept singular and plural forms equally, e. g. &#8220;partner&#8221; and &#8220;partners&#8221; would lead to the same result.</p>
<p>The first option &#8211; <a href="http://en.wikipedia.org/wiki/Stemming" target="_blank">stemming</a>. In that case, search engine would use root of a word, e. g. &#8220;partn&#8221;. However, stemming has many weaknesses: two different words might have same root, a user can misspell the root of the word, except English and few others languages it is not that trivial to implement stemming.</p>
<p><a href="http://en.wikipedia.org/wiki/Levenshtein_distance" target="_blank">Levenshtein distance</a> comes as the second option. The algorithm is simple &#8211; you have two words and you calculate the difference between them. You can insert, delete or replace any character, but it will cost you. Let&#8217;s imagine, an user enters &#8220;Levenstin distances&#8221; into search engine and expects to find revalent information. However, he just made 2 errors by misspeling the author&#8217;s name and he used plural form of &#8220;distance&#8221;. If search engine accepts 3 errors &#8211; the user will get relevant information.</p>
<p>The challenge comes, when you have a dictionary of terms (e. g. more that 1 mil.) and you want to get similar terms based on Levenshtein distance. You can visit every entry in the dictionary (very costly) or you can push dictionary into the <a href="http://en.wikipedia.org/wiki/Trie">trie</a>. Do you need a proof for the cost? There we go:</p>
<p><a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=test.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/test.png" alt="Photobucket" border="0" /></a></p>
<p>Red color indicates the performance of the search, when all terms are in the trie, green &#8211; simple dictionary.</p>
<p>Now we come to the second part of the post &#8211; why to bother and plot such graphs, if we could check <strong>few</strong> entries to determine average time and the winner? The reason is simple &#8211; we trust in God, all others must bring data. To say it differently &#8211; while profiling the code, you should be interested in average time AND variation. As you can see in the graph above, variation of the blue color is very small &#8211; it takes approximately the same time to scan whole dictionary. However, red has higher variation &#8211; the result can take for while or it can finish just at the beginning, but overall it works faster.<br />
Now, imagine, that a programmer wants to define, which implementation A or B for volatile cache is much faster. Let&#8217;s assume, that big O notion is not going to help and she conducts 2 test for A and 2 for B. While running test A, cache size expands, while B &#8211; shrinks. As the result, B wins over A and she makes wrong choice. However, her colleague claims, that despite A has greater volatility, it is much faster and she tried with 500 queries! Whom should I trust?</p>
<p>I use this piece for code profiling:</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p764code8'); return false;">View Code</a> RSPLUS</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p7648"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code" id="p764code8"><pre class="rsplus" style="font-family:monospace;"> simple<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">read.<span style="">table</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'simple.txt'</span><span style="color: #080;">&#41;</span>
node<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">read.<span style="">table</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'node.txt'</span><span style="color: #080;">&#41;</span>
&nbsp;
simple<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/cbind.html"><span style="color: #0000FF; font-weight: bold;">cbind</span></a><span style="color: #080;">&#40;</span>simple,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.character.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">character</span></span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/c.html"><span style="color: #0000FF; font-weight: bold;">c</span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'simple'</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/colnames.html"><span style="color: #0000FF; font-weight: bold;">colnames</span></a><span style="color: #080;">&#40;</span>simple<span style="color: #080;">&#41;</span><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/c.html"><span style="color: #0000FF; font-weight: bold;">c</span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'time'</span>,<span style="color: #ff0000;">'type'</span><span style="color: #080;">&#41;</span>
node<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/cbind.html"><span style="color: #0000FF; font-weight: bold;">cbind</span></a><span style="color: #080;">&#40;</span>node,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/c.html"><span style="color: #0000FF; font-weight: bold;">c</span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'node'</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/colnames.html"><span style="color: #0000FF; font-weight: bold;">colnames</span></a><span style="color: #080;">&#40;</span>node<span style="color: #080;">&#41;</span><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/c.html"><span style="color: #0000FF; font-weight: bold;">c</span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'time'</span>,<span style="color: #ff0000;">'type'</span><span style="color: #080;">&#41;</span>
&nbsp;
rez<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/data.frame.html"><span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/rbind.html"><span style="color: #0000FF; font-weight: bold;">rbind</span></a><span style="color: #080;">&#40;</span>simple, node<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
&nbsp;
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/require.html"><span style="color: #0000FF; font-weight: bold;">require</span></a><span style="color: #080;">&#40;</span>ggplot2<span style="color: #080;">&#41;</span>
&nbsp;
ggplot<span style="color: #080;">&#40;</span>rez,aes<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">time</span>,fill<span style="color: #080;">=</span>type<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>geom_density<span style="color: #080;">&#40;</span>alpha<span style="color: #080;">=</span><span style="color: #ff0000;">0.6</span>,size<span style="color: #080;">=</span><span style="color: #ff0000;">1.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>scale_x_log10<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>

<p>The data, C++ code for Levenshtein distance and trie can be find on <a href="https://github.com/kafka399/Levenshtein-distance" target="_blank">GitHub</a>. </p>
<p>I found this source very useful: <a href="http://stevehanov.ca/blog/index.php?id=114" target="_blank">http://stevehanov.ca/blog/index.php?id=114</a></p>

<p><a href="http://feedads.g.doubleclick.net/~a/b35ZJGCx19HM-JZyKSDh8nmLBb0/0/da"><img src="http://feedads.g.doubleclick.net/~a/b35ZJGCx19HM-JZyKSDh8nmLBb0/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/b35ZJGCx19HM-JZyKSDh8nmLBb0/1/da"><img src="http://feedads.g.doubleclick.net/~a/b35ZJGCx19HM-JZyKSDh8nmLBb0/1/di" border="0" ismap="true"></img></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.investuotojas.eu/2012/03/25/levenshtein-distance-in-c-and-code-profiling-in-r/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.investuotojas.eu/2012/03/25/levenshtein-distance-in-c-and-code-profiling-in-r/</feedburner:origLink></item>
		<item>
		<title>I see high frequency data</title>
		<link>http://feedproxy.google.com/~r/investuotojas/~3/b2-rD1oRjMQ/</link>
		<comments>http://www.investuotojas.eu/2012/03/01/i-see-high-frequency-data/#comments</comments>
		<pubDate>Thu, 01 Mar 2012 17:02:16 +0000</pubDate>
		<dc:creator>Dzidorius Martinaitis</dc:creator>
				<category><![CDATA[EN]]></category>
		<category><![CDATA[quant]]></category>
		<category><![CDATA[R-language]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[quantitative]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[realtime]]></category>
		<category><![CDATA[trading]]></category>

		<guid isPermaLink="false">http://www.investuotojas.eu/?p=753</guid>
		<description><![CDATA[In the previous post I shared an example how to get high frequency data from IB broker (well, it is retail version of HFD &#8211; it has only best bid/ask and the trades). Now, once you saved some data &#8211; what should you do next? Next logical step would be data sanity check and visualization. [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.investuotojas.eu/2012/02/24/save-high-frequency-data-in-mongodb/" target="_blank">In the previous post</a> I shared an example how to get high frequency data from <a href="http://www.interactivebrokers.com" target="_blank">IB broker</a> (well, it is retail version of HFD &#8211; it has only best bid/ask and the trades). Now, once you saved some data &#8211; what should you do next?</p>
<p>Next logical step would be data sanity check and visualization. For example, while preparing R script for this post, I found, that IB data contains numerous duplicates in the quotes. Every time, when the trade happens, IB trading platform sends the price and the size of the trade bundled together. Additionally, it sends the size of the trade as separate quote as well and this completely mess up the data. So, data sanity check and visualization gave me a hint, that something is wrong with the data.</p>
<p>Today I want to show an example in R, which loads data from mongodb and plots some parts of the data. This should give you better intuition on collected data.</p>
<p><a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=intra.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/intra.png" alt="Photobucket" border="0" /></a></p>
<p>The plot shows bid (light blue) prices , ask(green) prices and the trades (red). The size of the red dot indicates volume of the trade.</p>
<p>The source code is shared on <a href="https://github.com/kafka399/Rproject/blob/master/HFD/showHFD.R" target="_blank">github</a> and below:</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p753code10'); return false;">View Code</a> RSPLUS</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p75310"><td class="code" id="p753code10"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;">#Author Dzidorius Martinaitis</span>
<span style="color: #228B22;">#Date 2012-03-01</span>
<span style="color: #228B22;">#Description </span>
&nbsp;
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/require.html"><span style="color: #0000FF; font-weight: bold;">require</span></a><span style="color: #080;">&#40;</span>rmongodb<span style="color: #080;">&#41;</span>
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/require.html"><span style="color: #0000FF; font-weight: bold;">require</span></a><span style="color: #080;">&#40;</span>xts<span style="color: #080;">&#41;</span>
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/require.html"><span style="color: #0000FF; font-weight: bold;">require</span></a><span style="color: #080;">&#40;</span>ggplot2<span style="color: #080;">&#41;</span>
mongo<span style="color: #080;">=</span>mongo.<span style="">create</span><span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>
&nbsp;
buf <span style="color: #080;">=</span> mongo.<span style="">bson</span>.<span style="">buffer</span>.<span style="">create</span><span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>
mongo.<span style="">bson</span>.<span style="">buffer</span>.<span style="">append</span><span style="color: #080;">&#40;</span>buf, <span style="color: #ff0000;">&quot;tickerId&quot;</span>, 20L<span style="color: #080;">&#41;</span>
mongo.<span style="">bson</span>.<span style="">buffer</span>.<span style="">start</span>.<span style="">object</span><span style="color: #080;">&#40;</span>buf, <span style="color: #ff0000;">&quot;size&quot;</span><span style="color: #080;">&#41;</span>
mongo.<span style="">bson</span>.<span style="">buffer</span>.<span style="">append</span><span style="color: #080;">&#40;</span>buf, <span style="color: #ff0000;">&quot;$exists&quot;</span>, <span style="color: #ff0000;">&quot;true&quot;</span><span style="color: #080;">&#41;</span>
mongo.<span style="">bson</span>.<span style="">buffer</span>.<span style="">finish</span>.<span style="">object</span><span style="color: #080;">&#40;</span>buf<span style="color: #080;">&#41;</span>
&nbsp;
query <span style="color: #080;">=</span> mongo.<span style="">bson</span>.<span style="">from</span>.<span style="">buffer</span><span style="color: #080;">&#40;</span>buf<span style="color: #080;">&#41;</span>
&nbsp;
count <span style="color: #080;">=</span> mongo.<span style="">count</span><span style="color: #080;">&#40;</span>mongo,<span style="color: #ff0000;">'quotes.trinti'</span>,query<span style="color: #080;">&#41;</span>
cursor<span style="color: #080;">=</span>mongo.<span style="">find</span><span style="color: #080;">&#40;</span>mongo,<span style="color: #ff0000;">'quotes.trinti'</span>,query<span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;">#############  very slow code #############</span>
<span style="color: #228B22;">#size=''</span>
<span style="color: #228B22;">#system.time(</span>
<span style="color: #228B22;">#while(mongo.cursor.next(cursor)){</span>
<span style="color: #228B22;">#  temp=(mongo.cursor.value(cursor));</span>
<span style="color: #228B22;">#  if(is.xts(size))</span>
<span style="color: #228B22;">#    size=rbind(size,xts(cbind(mongo.bson.value(temp,&quot;field&quot;),mongo.bson.value(temp,&quot;size&quot;)),order.by=as.POSIXct(mongo.bson.value(temp,&quot;tstamp&quot;)/1000,origin='1970-01-01',tz='Europa/Paris')))</span>
<span style="color: #228B22;">#  else</span>
<span style="color: #228B22;">#    size=xts(cbind(mongo.bson.value(temp,&quot;field&quot;),mongo.bson.value(temp,&quot;size&quot;)),order.by=as.POSIXct(mongo.bson.value(temp,&quot;tstamp&quot;)/1000,origin='1970-01-01',tz='Europa/Paris'))</span>
<span style="color: #080;">&#125;</span><span style="color: #080;">&#41;</span>
<span style="color: #228B22;">#############  end very slow  #############</span>
&nbsp;
size<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/matrix.html"><span style="color: #0000FF; font-weight: bold;">matrix</span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/nrow.html"><span style="color: #0000FF; font-weight: bold;">nrow</span></a><span style="color: #080;">=</span>count,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/ncol.html"><span style="color: #0000FF; font-weight: bold;">ncol</span></a><span style="color: #080;">=</span><span style="color: #ff0000;">3</span><span style="color: #080;">&#41;</span>
counter<span style="color: #080;">=</span><span style="color: #ff0000;">1</span>
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/system.time.html"><span style="color: #0000FF; font-weight: bold;">system.<span style="">time</span></span></a><span style="color: #080;">&#40;</span>
  <a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/while.html"><span style="color: #0000FF; font-weight: bold;">while</span></a><span style="color: #080;">&#40;</span>mongo.<span style="">cursor</span>.<span style="">next</span><span style="color: #080;">&#40;</span>cursor<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  <span style="color: #080;">&#123;</span>
    temp<span style="color: #080;">=</span><span style="color: #080;">&#40;</span>mongo.<span style="">cursor</span>.<span style="">value</span><span style="color: #080;">&#40;</span>cursor<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span>
    size<span style="color: #080;">&#91;</span>counter,<span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span><span style="color: #080;">=</span>mongo.<span style="">bson</span>.<span style="">value</span><span style="color: #080;">&#40;</span>temp,<span style="color: #ff0000;">&quot;field&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span>
    size<span style="color: #080;">&#91;</span>counter,<span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span><span style="color: #080;">=</span>mongo.<span style="">bson</span>.<span style="">value</span><span style="color: #080;">&#40;</span>temp,<span style="color: #ff0000;">&quot;size&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span>
    size<span style="color: #080;">&#91;</span>counter,<span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span><span style="color: #080;">=</span>mongo.<span style="">bson</span>.<span style="">value</span><span style="color: #080;">&#40;</span>temp,<span style="color: #ff0000;">&quot;tstamp&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span>
    counter<span style="color: #080;">=</span>counter<span style="color: #080;">+</span><span style="color: #ff0000;">1</span><span style="color: #080;">;</span>
    <a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/if.html"><span style="color: #0000FF; font-weight: bold;">if</span></a><span style="color: #080;">&#40;</span>counter<span style="color: #080;">&amp;</span>gt<span style="color: #080;">;</span>count<span style="color: #080;">&#41;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/break.html"><span style="color: #0000FF; font-weight: bold;">break</span></a><span style="color: #080;">;</span>
    <span style="color: #080;">&#125;</span><span style="color: #080;">&#41;</span>
size<span style="color: #080;">=</span>xts<span style="color: #080;">&#40;</span>size<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span>,order.<span style="">by</span><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.POSIXct.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">POSIXct</span></span></a><span style="color: #080;">&#40;</span>size<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span><span style="color: #080;">/</span><span style="color: #ff0000;">1000</span>,origin<span style="color: #080;">=</span><span style="color: #ff0000;">'1970-01-01'</span>,tz<span style="color: #080;">=</span><span style="color: #ff0000;">'Europe/Paris'</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/colnames.html"><span style="color: #0000FF; font-weight: bold;">colnames</span></a><span style="color: #080;">&#40;</span>size<span style="color: #080;">&#41;</span><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/c.html"><span style="color: #0000FF; font-weight: bold;">c</span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'field'</span>,<span style="color: #ff0000;">'size'</span><span style="color: #080;">&#41;</span>
&nbsp;
&nbsp;
buf <span style="color: #080;">=</span> mongo.<span style="">bson</span>.<span style="">buffer</span>.<span style="">create</span><span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>
mongo.<span style="">bson</span>.<span style="">buffer</span>.<span style="">append</span><span style="color: #080;">&#40;</span>buf, <span style="color: #ff0000;">&quot;tickerId&quot;</span>, 26L<span style="color: #080;">&#41;</span>
mongo.<span style="">bson</span>.<span style="">buffer</span>.<span style="">start</span>.<span style="">object</span><span style="color: #080;">&#40;</span>buf, <span style="color: #ff0000;">&quot;price&quot;</span><span style="color: #080;">&#41;</span>
mongo.<span style="">bson</span>.<span style="">buffer</span>.<span style="">append</span><span style="color: #080;">&#40;</span>buf, <span style="color: #ff0000;">&quot;$exists&quot;</span>, <span style="color: #ff0000;">&quot;true&quot;</span><span style="color: #080;">&#41;</span>
mongo.<span style="">bson</span>.<span style="">buffer</span>.<span style="">finish</span>.<span style="">object</span><span style="color: #080;">&#40;</span>buf<span style="color: #080;">&#41;</span>
&nbsp;
query <span style="color: #080;">=</span> mongo.<span style="">bson</span>.<span style="">from</span>.<span style="">buffer</span><span style="color: #080;">&#40;</span>buf<span style="color: #080;">&#41;</span>
count <span style="color: #080;">=</span> mongo.<span style="">count</span><span style="color: #080;">&#40;</span>mongo,<span style="color: #ff0000;">'quotes.trinti'</span>,query<span style="color: #080;">&#41;</span>
&nbsp;
cursor<span style="color: #080;">=</span>mongo.<span style="">find</span><span style="color: #080;">&#40;</span>mongo,<span style="color: #ff0000;">'quotes.trinti'</span>,query<span style="color: #080;">&#41;</span>
price<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/matrix.html"><span style="color: #0000FF; font-weight: bold;">matrix</span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/nrow.html"><span style="color: #0000FF; font-weight: bold;">nrow</span></a><span style="color: #080;">=</span>count,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/ncol.html"><span style="color: #0000FF; font-weight: bold;">ncol</span></a><span style="color: #080;">=</span><span style="color: #ff0000;">3</span><span style="color: #080;">&#41;</span>
counter<span style="color: #080;">=</span><span style="color: #ff0000;">1</span>
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/system.time.html"><span style="color: #0000FF; font-weight: bold;">system.<span style="">time</span></span></a><span style="color: #080;">&#40;</span>
  <a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/while.html"><span style="color: #0000FF; font-weight: bold;">while</span></a><span style="color: #080;">&#40;</span>mongo.<span style="">cursor</span>.<span style="">next</span><span style="color: #080;">&#40;</span>cursor<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  <span style="color: #080;">&#123;</span>
    temp<span style="color: #080;">=</span><span style="color: #080;">&#40;</span>mongo.<span style="">cursor</span>.<span style="">value</span><span style="color: #080;">&#40;</span>cursor<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span>
    price<span style="color: #080;">&#91;</span>counter,<span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span><span style="color: #080;">=</span>mongo.<span style="">bson</span>.<span style="">value</span><span style="color: #080;">&#40;</span>temp,<span style="color: #ff0000;">&quot;field&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span>
    price<span style="color: #080;">&#91;</span>counter,<span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span><span style="color: #080;">=</span>mongo.<span style="">bson</span>.<span style="">value</span><span style="color: #080;">&#40;</span>temp,<span style="color: #ff0000;">&quot;price&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span>
    price<span style="color: #080;">&#91;</span>counter,<span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span><span style="color: #080;">=</span>mongo.<span style="">bson</span>.<span style="">value</span><span style="color: #080;">&#40;</span>temp,<span style="color: #ff0000;">&quot;tstamp&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span>
    counter<span style="color: #080;">=</span>counter<span style="color: #080;">+</span><span style="color: #ff0000;">1</span><span style="color: #080;">;</span>
    <a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/if.html"><span style="color: #0000FF; font-weight: bold;">if</span></a><span style="color: #080;">&#40;</span>counter<span style="color: #080;">&gt;</span>count<span style="color: #080;">&#41;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/break.html"><span style="color: #0000FF; font-weight: bold;">break</span></a><span style="color: #080;">;</span>
  <span style="color: #080;">&#125;</span><span style="color: #080;">&#41;</span>
price<span style="color: #080;">=</span>xts<span style="color: #080;">&#40;</span>price<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span>,order.<span style="">by</span><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.POSIXct.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">POSIXct</span></span></a><span style="color: #080;">&#40;</span>price<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span><span style="color: #080;">/</span><span style="color: #ff0000;">1000</span>,origin<span style="color: #080;">=</span><span style="color: #ff0000;">'1970-01-01'</span>,tz<span style="color: #080;">=</span><span style="color: #ff0000;">'Europe/Paris'</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
price<span style="color: #080;">=</span><span style="color: #080;">&#40;</span>price<span style="color: #080;">&#91;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/which.html"><span style="color: #0000FF; font-weight: bold;">which</span></a><span style="color: #080;">&#40;</span>price<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span><span style="color: #080;">&gt;</span><span style="color: #ff0000;">0</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>
&nbsp;
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/colnames.html"><span style="color: #0000FF; font-weight: bold;">colnames</span></a><span style="color: #080;">&#40;</span>price<span style="color: #080;">&#41;</span><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/c.html"><span style="color: #0000FF; font-weight: bold;">c</span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'field'</span>,<span style="color: #ff0000;">'price'</span><span style="color: #080;">&#41;</span>
&nbsp;
quotes<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/cbind.html"><span style="color: #0000FF; font-weight: bold;">cbind</span></a><span style="color: #080;">&#40;</span>price<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#91;</span>price<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span><span style="color: #080;">==</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>,
             <span style="color: #228B22;">#cac40.volume[,2][cac40.volume[,1]==0],</span>
             price<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#91;</span>price<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span><span style="color: #080;">==</span><span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span>,
             <span style="color: #228B22;">#cac40.volume[,2][cac40.volume[,1]==3],</span>
             price<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#91;</span>price<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span><span style="color: #080;">==</span><span style="color: #ff0000;">4</span><span style="color: #080;">&#93;</span>
             ,size<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#91;</span>size<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span><span style="color: #080;">==</span><span style="color: #ff0000;">5</span><span style="color: #080;">&#93;</span>
             <span style="color: #080;">&#41;</span>
&nbsp;
quotes<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span><span style="color: #080;">=</span>na.<span style="">locf</span><span style="color: #080;">&#40;</span>quotes<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>
quotes<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span><span style="color: #080;">=</span>na.<span style="">locf</span><span style="color: #080;">&#40;</span>quotes<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>
quotes<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span><span style="color: #080;">=</span>na.<span style="">locf</span><span style="color: #080;">&#40;</span>quotes<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>
quotes<span style="color: #080;">&#91;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/which.html"><span style="color: #0000FF; font-weight: bold;">which</span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/is.na.html"><span style="color: #0000FF; font-weight: bold;">is.<span style="">na</span></span></a><span style="color: #080;">&#40;</span>quotes<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">4</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,<span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span><span style="color: #080;">=</span>NA
&nbsp;
temp<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">tail</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">head</span><span style="color: #080;">&#40;</span>quotes,<span style="color: #ff0000;">3000</span><span style="color: #080;">&#41;</span>,<span style="color: #ff0000;">1000</span><span style="color: #080;">&#41;</span>
temp<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/data.frame.html"><span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span></a><span style="color: #080;">&#40;</span>ind<span style="color: #080;">=</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/NROW.html"><span style="color: #0000FF; font-weight: bold;">NROW</span></a><span style="color: #080;">&#40;</span>temp<span style="color: #080;">&#41;</span>,trd<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>temp<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>                
                ,bid<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>temp<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>,ask<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>temp<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>
                ,size<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>temp<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">4</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>
                <span style="color: #080;">&#41;</span>
temp<span style="color: #080;">=</span>melt<span style="color: #080;">&#40;</span>temp,id<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/c.html"><span style="color: #0000FF; font-weight: bold;">c</span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'ind'</span><span style="color: #080;">&#41;</span>,na.<span style="">rm</span><span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span>
x<span style="color: #080;">=</span>temp<span style="color: #080;">&#91;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/which.html"><span style="color: #0000FF; font-weight: bold;">which</span></a><span style="color: #080;">&#40;</span>temp$variable<span style="color: #080;">==</span><span style="color: #ff0000;">'trd'</span><span style="color: #080;">&#41;</span>,<span style="color: #080;">&#93;</span>
&nbsp;
rez<span style="color: #080;">=</span>temp<span style="color: #080;">&#91;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/which.html"><span style="color: #0000FF; font-weight: bold;">which</span></a><span style="color: #080;">&#40;</span>temp$variable<span style="color: #080;">!=</span><span style="color: #ff0000;">'trd'</span><span style="color: #080;">&#41;</span>,<span style="color: #080;">&#93;</span>
rez<span style="color: #080;">=</span>rez<span style="color: #080;">&#91;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/which.html"><span style="color: #0000FF; font-weight: bold;">which</span></a><span style="color: #080;">&#40;</span>rez$variable<span style="color: #080;">!=</span><span style="color: #ff0000;">'size'</span><span style="color: #080;">&#41;</span>,<span style="color: #080;">&#93;</span>
a<span style="color: #080;">=</span>temp<span style="color: #080;">&#91;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/which.html"><span style="color: #0000FF; font-weight: bold;">which</span></a><span style="color: #080;">&#40;</span>temp$variable<span style="color: #080;">==</span><span style="color: #ff0000;">'size'</span><span style="color: #080;">&#41;</span>,<span style="color: #080;">&#93;</span><span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span>
ggplot<span style="color: #080;">&#40;</span>rez,aes<span style="color: #080;">&#40;</span>x<span style="color: #080;">=</span>ind,y<span style="color: #080;">=</span>value,color<span style="color: #080;">=</span>variable<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>geom_line<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>geom_point<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">=</span>x,aes<span style="color: #080;">&#40;</span>size<span style="color: #080;">=</span>a<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>

<p>&nbsp;</p>

<p><a href="http://feedads.g.doubleclick.net/~a/zKIKAeaR10n58lworOCuEQ_i2EA/0/da"><img src="http://feedads.g.doubleclick.net/~a/zKIKAeaR10n58lworOCuEQ_i2EA/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/zKIKAeaR10n58lworOCuEQ_i2EA/1/da"><img src="http://feedads.g.doubleclick.net/~a/zKIKAeaR10n58lworOCuEQ_i2EA/1/di" border="0" ismap="true"></img></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.investuotojas.eu/2012/03/01/i-see-high-frequency-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.investuotojas.eu/2012/03/01/i-see-high-frequency-data/</feedburner:origLink></item>
		<item>
		<title>How to save high frequency data in mongodb</title>
		<link>http://feedproxy.google.com/~r/investuotojas/~3/XfgO6ndP39g/</link>
		<comments>http://www.investuotojas.eu/2012/02/24/save-high-frequency-data-in-mongodb/#comments</comments>
		<pubDate>Fri, 24 Feb 2012 15:09:12 +0000</pubDate>
		<dc:creator>Dzidorius Martinaitis</dc:creator>
				<category><![CDATA[EN]]></category>
		<category><![CDATA[quant]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[quantitative]]></category>

		<guid isPermaLink="false">http://www.investuotojas.eu/?p=749</guid>
		<description><![CDATA[Are you looking for ways how to save real time, high frequency data taken from Interactivebrokers.com API ? I built an example in C++ which saves all incoming data in Mongodb. Check this link if you are interested: https://github.com/kafka399/TwsMongo &#160;]]></description>
			<content:encoded><![CDATA[<p>Are you looking for ways how to save real time, high frequency data taken from Interactivebrokers.com API ? I built an example in C++ which saves all incoming data in Mongodb. Check this link if you are interested:</p>
<p><a href="https://github.com/kafka399/TwsMongo">https://github.com/kafka399/TwsMongo</a></p>
<p>&nbsp;</p>

<p><a href="http://feedads.g.doubleclick.net/~a/gDBE0z-0dQ1aK1vCR44hw41ybgA/0/da"><img src="http://feedads.g.doubleclick.net/~a/gDBE0z-0dQ1aK1vCR44hw41ybgA/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/gDBE0z-0dQ1aK1vCR44hw41ybgA/1/da"><img src="http://feedads.g.doubleclick.net/~a/gDBE0z-0dQ1aK1vCR44hw41ybgA/1/di" border="0" ismap="true"></img></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.investuotojas.eu/2012/02/24/save-high-frequency-data-in-mongodb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.investuotojas.eu/2012/02/24/save-high-frequency-data-in-mongodb/</feedburner:origLink></item>
		<item>
		<title>VilniusR – R users group in Lithuania</title>
		<link>http://feedproxy.google.com/~r/investuotojas/~3/w_79gBsMOjI/</link>
		<comments>http://www.investuotojas.eu/2012/02/16/vilniusr-r-users-group-in-lithuania/#comments</comments>
		<pubDate>Thu, 16 Feb 2012 13:54:53 +0000</pubDate>
		<dc:creator>Dzidorius Martinaitis</dc:creator>
				<category><![CDATA[EN]]></category>
		<category><![CDATA[R-language]]></category>
		<category><![CDATA[Lithuania]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.investuotojas.eu/?p=747</guid>
		<description><![CDATA[Today is Lithuania&#8217;s independence day and I have created R user group in Lithuania &#8211; VilniusR. If you are near by please follow the link, sign up and I hope that we will have a meeting soon.]]></description>
			<content:encoded><![CDATA[<p>Today is <a href="http://en.wikipedia.org/wiki/Lithuania" target="_blank">Lithuania&#8217;s</a> independence day and I have created R user group in Lithuania &#8211; <a href="http://www.vilniusr.org" target="_blank">VilniusR</a>. If you are near by please follow <a href="http://www.vilniusr.org" target="_blank">the link</a>, sign up and I hope that we will have <a href="http://www.vilniusr.org/2012/02/16/vilniusr/" target="_blank">a meeting soon</a>.</p>

<p><a href="http://feedads.g.doubleclick.net/~a/dnBEo1grvv43ysvNJ76IC6vxRKI/0/da"><img src="http://feedads.g.doubleclick.net/~a/dnBEo1grvv43ysvNJ76IC6vxRKI/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/dnBEo1grvv43ysvNJ76IC6vxRKI/1/da"><img src="http://feedads.g.doubleclick.net/~a/dnBEo1grvv43ysvNJ76IC6vxRKI/1/di" border="0" ismap="true"></img></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.investuotojas.eu/2012/02/16/vilniusr-r-users-group-in-lithuania/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.investuotojas.eu/2012/02/16/vilniusr-r-users-group-in-lithuania/</feedburner:origLink></item>
		<item>
		<title>Vectorized R vs Rcpp</title>
		<link>http://feedproxy.google.com/~r/investuotojas/~3/h9hMxfqWSIc/</link>
		<comments>http://www.investuotojas.eu/2012/02/01/vectorized-r-vs-rcpp/#comments</comments>
		<pubDate>Wed, 01 Feb 2012 20:03:09 +0000</pubDate>
		<dc:creator>Dzidorius Martinaitis</dc:creator>
				<category><![CDATA[EN]]></category>
		<category><![CDATA[quant]]></category>
		<category><![CDATA[R-language]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[quantitative]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[rcpp]]></category>

		<guid isPermaLink="false">http://www.investuotojas.eu/?p=730</guid>
		<description><![CDATA[In my previous post, I tried to show, that Rcpp is 1000 faster than pure R and that generated the fuss in the comments. Being lazy, I didn&#8217;t vectorize R code and at the end I was comparing apples vs oranges. To fix that problem, I built a new script, where I&#8217;m trying to compare [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.investuotojas.eu/2012/01/30/the-power-of-rcpp/"  target="_blank">In my previous post</a>, I tried to show, that Rcpp is 1000 faster than pure R and that generated the fuss in the comments. Being lazy, I didn&#8217;t vectorize R code and at the end I was comparing apples vs oranges.</p>
<p>To fix that problem, I built a new script, where I&#8217;m trying to compare apples against apples. First piece of code named &#8220;ifelse R&#8221; uses R &#8220;ifelse&#8221; function to vectorize code. Second piece of code is fully vectorized code written in R, third &#8211; pure C++ code and the last one is C++, where  Rcpp &#8221;ifelse&#8221; function is used.</p>
<p><a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=performance.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/performance.png" alt="Photobucket" border="0" /></a></p>
<p>&nbsp;</p>
<table border="0">
<tbody>
<tr>
<th>name</th>
<th>seconds</th>
</tr>
<tr>
<td align="right">ifelse R</td>
<td align="right">27.50</td>
</tr>
<tr>
<td align="right">vectorized R</td>
<td align="right">10.40</td>
</tr>
<tr>
<td align="right">pure C++</td>
<td align="right">0.44</td>
</tr>
<tr>
<td align="right">vectorized C++</td>
<td align="right">2.24</td>
</tr>
</tbody>
</table>
<p>Here we go &#8211; vectorization truly helps, but pure C++ code still 23 times faster. Of course you pay the price when writing it in C++.<br />
I found a bit strange, that vectorized C++ code doesn&#8217;t perform that well&#8230;</p>
<p>You can get the code from <a href="https://github.com/kafka399/Rproject/blob/master/performance/performance.R"  target="_blank">github</a> or review it below:</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p730code12'); return false;">View Code</a> RSPLUS</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p73012"><td class="code" id="p730code12"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;">#Author Dzidorius Martinaitis</span>
<span style="color: #228B22;">#Date 2012-02-01</span>
<span style="color: #228B22;">#Description http://www.investuotojas.eu/2012/02/01/vectorized-r-vs-rcpp</span>
&nbsp;
bid <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">runif</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">50000000</span>,<span style="color: #ff0000;">5</span>,<span style="color: #ff0000;">9</span><span style="color: #080;">&#41;</span>
ask <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">runif</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">50000000</span>,<span style="color: #ff0000;">5</span>,<span style="color: #ff0000;">9</span><span style="color: #080;">&#41;</span>
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/close.html"><span style="color: #0000FF; font-weight: bold;">close</span></a> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">runif</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">50000000</span>,<span style="color: #ff0000;">5</span>,<span style="color: #ff0000;">9</span><span style="color: #080;">&#41;</span>
&nbsp;
x<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/data.frame.html"><span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span></a><span style="color: #080;">&#40;</span>bid<span style="color: #080;">=</span>bid,ask<span style="color: #080;">=</span>ask,last_price<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/close.html"><span style="color: #0000FF; font-weight: bold;">close</span></a><span style="color: #080;">&#41;</span>
rez<span style="color: #080;">=</span><span style="color: #ff0000;">0</span>
&nbsp;
<span style="color: #228B22;">###########    ifelse R  #################</span>
answ<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.vector.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">vector</span></span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/system.time.html"><span style="color: #0000FF; font-weight: bold;">system.<span style="">time</span></span></a><span style="color: #080;">&#40;</span>
<span style="color: #080;">&#123;</span>
rez <span style="color: #080;">=</span> <a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/ifelse.html"><span style="color: #0000FF; font-weight: bold;">ifelse</span></a><span style="color: #080;">&#40;</span>x$last_price<span style="color: #080;">&gt;</span><span style="color: #ff0000;">0</span>,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/ifelse.html"><span style="color: #0000FF; font-weight: bold;">ifelse</span></a><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#91;</span>, <span style="color: #ff0000;">&quot;bid&quot;</span><span style="color: #080;">&#93;</span> <span style="color: #080;">&gt;</span> x<span style="color: #080;">&#91;</span>, <span style="color: #ff0000;">&quot;last_price&quot;</span><span style="color: #080;">&#93;</span>, x<span style="color: #080;">&#91;</span>, <span style="color: #ff0000;">&quot;bid&quot;</span><span style="color: #080;">&#93;</span>, <a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/ifelse.html"><span style="color: #0000FF; font-weight: bold;">ifelse</span></a><span style="color: #080;">&#40;</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#91;</span>, <span style="color: #ff0000;">&quot;ask&quot;</span><span style="color: #080;">&#93;</span> <span style="color: #080;">&gt;</span> <span style="color: #ff0000;">0</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&amp;</span> <span style="color: #080;">&#40;</span>x<span style="color: #080;">&#91;</span>, <span style="color: #ff0000;">&quot;ask&quot;</span><span style="color: #080;">&#93;</span> <span style="color: #080;">&lt;</span> x<span style="color: #080;">&#91;</span>, <span style="color: #ff0000;">&quot;last_price&quot;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>, x<span style="color: #080;">&#91;</span>, <span style="color: #ff0000;">&quot;ask&quot;</span><span style="color: #080;">&#93;</span>, x<span style="color: #080;">&#91;</span>, <span style="color: #ff0000;">&quot;last_price&quot;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">0.5</span><span style="color: #080;">*</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#91;</span>, <span style="color: #ff0000;">&quot;ask&quot;</span><span style="color: #080;">&#93;</span> <span style="color: #080;">+</span> x<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">&quot;bid&quot;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>
<span style="color: #228B22;">###########   end ifelse R  #################</span>
&nbsp;
<span style="color: #228B22;">###########    vectorized R  #################</span>
&nbsp;
answ<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/append.html"><span style="color: #0000FF; font-weight: bold;">append</span></a><span style="color: #080;">&#40;</span>answ,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/system.time.html"><span style="color: #0000FF; font-weight: bold;">system.<span style="">time</span></span></a><span style="color: #080;">&#40;</span>
<span style="color: #080;">&#123;</span>
lgt0 <span style="color: #080;">=</span> x$last_price <span style="color: #080;">&gt;</span> <span style="color: #ff0000;">0</span>
bgtl <span style="color: #080;">=</span> x$bid <span style="color: #080;">&gt;</span> x$last_price
agt0 <span style="color: #080;">=</span> x$ask <span style="color: #080;">&gt;</span> <span style="color: #ff0000;">0</span>
altl <span style="color: #080;">=</span> x$ask <span style="color: #080;">&gt;</span> x$last_price
rez <span style="color: #080;">=</span> x$last_price
rez<span style="color: #080;">&#91;</span>lgt0 <span style="color: #080;">&amp;</span> agt0 <span style="color: #080;">&amp;</span> altl<span style="color: #080;">&#93;</span> <span style="color: #080;">=</span> x$ask<span style="color: #080;">&#91;</span>lgt0 <span style="color: #080;">&amp;</span> agt0 <span style="color: #080;">&amp;</span> altl<span style="color: #080;">&#93;</span>
rez<span style="color: #080;">&#91;</span>lgt0 <span style="color: #080;">&amp;</span> bgtl<span style="color: #080;">&#93;</span> <span style="color: #080;">=</span> x$bid<span style="color: #080;">&#91;</span>lgt0 <span style="color: #080;">&amp;</span> bgtl<span style="color: #080;">&#93;</span>
rez<span style="color: #080;">&#91;</span><span style="color: #080;">!</span>lgt0<span style="color: #080;">&#93;</span> <span style="color: #080;">=</span> <span style="color: #080;">&#40;</span>x$ask<span style="color: #080;">&#91;</span><span style="color: #080;">!</span>lgt0<span style="color: #080;">&#93;</span><span style="color: #080;">+</span>x$bid<span style="color: #080;">&#91;</span><span style="color: #080;">!</span>lgt0<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span><span style="color: #080;">/</span><span style="color: #ff0000;">2</span>
<span style="color: #080;">&#125;</span>
<span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>
<span style="color: #228B22;">###########   end vectorized R  #################</span>
&nbsp;
<span style="color: #228B22;">#C++ code starts here</span>
&nbsp;
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/library.html"><span style="color: #0000FF; font-weight: bold;">library</span></a><span style="color: #080;">&#40;</span>inline<span style="color: #080;">&#41;</span>
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/library.html"><span style="color: #0000FF; font-weight: bold;">library</span></a><span style="color: #080;">&#40;</span>Rcpp<span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;">###########    pure C++  #################</span>
&nbsp;
code<span style="color: #080;">=</span><span style="color: #ff0000;">'
NumericVector bid(bid_);NumericVector ask(ask_);NumericVector close(close_);
int bid_size = bid.size();
NumericVector ret(bid_size);
for(int i =0;i&lt;bid_size;i++)
{
  if(close[i]&gt;0)
  {
    if(bid[i]&gt;close[i])
    {
      ret[i] = bid[i]; 
    }
    else if(ask[i]&gt;0 &amp;&amp; ask[i]&lt;close[i])
    {
      ret[i] = ask[i];//
    }
    else
    {
      ret[i] = close[i];//
    }
  }
  else
  {
    ret[i]=(bid[i]+ask[i])/2;
  }
&nbsp;
}
return ret;
'</span>
getLastPrice <span style="color: #080;">&lt;-</span> cxxfunction<span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/base/html/Log.html"><span style="color: #0000FF; font-weight: bold;">signature</span></a><span style="color: #080;">&#40;</span> bid_ <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;numeric&quot;</span>,ask_ <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;numeric&quot;</span>,close_<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;numeric&quot;</span><span style="color: #080;">&#41;</span>,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/body.html"><span style="color: #0000FF; font-weight: bold;">body</span></a><span style="color: #080;">=</span>code,plugin<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Rcpp&quot;</span><span style="color: #080;">&#41;</span>
rez<span style="color: #080;">=</span><span style="color: #ff0000;">0</span>
answ<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/append.html"><span style="color: #0000FF; font-weight: bold;">append</span></a><span style="color: #080;">&#40;</span>answ,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/system.time.html"><span style="color: #0000FF; font-weight: bold;">system.<span style="">time</span></span></a><span style="color: #080;">&#40;</span>
  <span style="color: #080;">&#123;</span>
    rez<span style="color: #080;">=</span>getLastPrice<span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>x$bid<span style="color: #080;">&#41;</span>,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>x$ask<span style="color: #080;">&#41;</span>,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>x$last_price<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  <span style="color: #080;">&#125;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;">###########   end pure C++  #################</span>
&nbsp;
<span style="color: #228B22;">#summary(rez)</span>
&nbsp;
&nbsp;
<span style="color: #228B22;">###########    vectorized C++  #################</span>
code<span style="color: #080;">=</span><span style="color: #ff0000;">'
NumericVector bid(bid_);NumericVector ask(ask_);NumericVector close(close_);
int bid_size = bid.size();
NumericVector ret=ifelse(close&gt;0,ifelse(bid &gt;close, bid, ifelse(ask &gt; 0,ifelse(ask &lt; close,ask, close),close)), 0.5*(ask + bid));
return ret;
'</span>
getLastPrice <span style="color: #080;">&lt;-</span> cxxfunction<span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/base/html/Log.html"><span style="color: #0000FF; font-weight: bold;">signature</span></a><span style="color: #080;">&#40;</span> bid_ <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;numeric&quot;</span>,ask_ <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;numeric&quot;</span>,close_<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;numeric&quot;</span><span style="color: #080;">&#41;</span>,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/body.html"><span style="color: #0000FF; font-weight: bold;">body</span></a><span style="color: #080;">=</span>code,plugin<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Rcpp&quot;</span><span style="color: #080;">&#41;</span>
rez<span style="color: #080;">=</span><span style="color: #ff0000;">0</span>
answ<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/append.html"><span style="color: #0000FF; font-weight: bold;">append</span></a><span style="color: #080;">&#40;</span>answ,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/system.time.html"><span style="color: #0000FF; font-weight: bold;">system.<span style="">time</span></span></a><span style="color: #080;">&#40;</span>
<span style="color: #080;">&#123;</span>
  rez<span style="color: #080;">=</span>getLastPrice<span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>x$bid<span style="color: #080;">&#41;</span>,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>x$ask<span style="color: #080;">&#41;</span>,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>x$last_price<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span>
<span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;">###########   end vectorized C++  #################</span>
&nbsp;
<span style="color: #228B22;">#summary(rez)</span>
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/names.html"><span style="color: #0000FF; font-weight: bold;">names</span></a><span style="color: #080;">&#40;</span>answ<span style="color: #080;">&#41;</span><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/c.html"><span style="color: #0000FF; font-weight: bold;">c</span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'ifelse R'</span>,<span style="color: #ff0000;">'vectorized R'</span>,<span style="color: #ff0000;">'pure C++'</span>,<span style="color: #ff0000;">'vectorized C++'</span><span style="color: #080;">&#41;</span>
&nbsp;
<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/library.html"><span style="color: #0000FF; font-weight: bold;">library</span></a><span style="color: #080;">&#40;</span>ggplot2<span style="color: #080;">&#41;</span>
a<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/data.frame.html"><span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span></a><span style="color: #080;">&#40;</span>ind<span style="color: #080;">=</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">4</span>,val<span style="color: #080;">=</span>answ<span style="color: #080;">&#41;</span>
ggplot<span style="color: #080;">&#40;</span>a,aes<span style="color: #080;">&#40;</span>ind,val<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>geom_point<span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/stats/html/legend.html"><span style="color: #0000FF; font-weight: bold;">legend</span></a><span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/F.html"><span style="color: #0000FF; font-weight: bold;">F</span></a><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>geom_text<span style="color: #080;">&#40;</span>aes<span style="color: #080;">&#40;</span>label<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/names.html"><span style="color: #0000FF; font-weight: bold;">names</span></a><span style="color: #080;">&#40;</span>answ<span style="color: #080;">&#41;</span>,hjust<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/c.html"><span style="color: #0000FF; font-weight: bold;">c</span></a><span style="color: #080;">&#40;</span><span style="color: #080;">-</span><span style="color: #ff0000;">0.2</span>,<span style="color: #080;">-</span><span style="color: #ff0000;">0.2</span>,<span style="color: #080;">-</span><span style="color: #ff0000;">0.2</span>,<span style="color: #ff0000;">0.8</span><span style="color: #080;">&#41;</span>,vjust<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/c.html"><span style="color: #0000FF; font-weight: bold;">c</span></a><span style="color: #080;">&#40;</span><span style="color: #ff0000;">0</span>,<span style="color: #ff0000;">0</span>,<span style="color: #ff0000;">0</span>,<span style="color: #080;">-</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,size<span style="color: #080;">=</span><span style="color: #ff0000;">4</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>


<p><a href="http://feedads.g.doubleclick.net/~a/yKj9VY7CC_AbiRjveIg83aer9NE/0/da"><img src="http://feedads.g.doubleclick.net/~a/yKj9VY7CC_AbiRjveIg83aer9NE/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/yKj9VY7CC_AbiRjveIg83aer9NE/1/da"><img src="http://feedads.g.doubleclick.net/~a/yKj9VY7CC_AbiRjveIg83aer9NE/1/di" border="0" ismap="true"></img></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.investuotojas.eu/2012/02/01/vectorized-r-vs-rcpp/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		<feedburner:origLink>http://www.investuotojas.eu/2012/02/01/vectorized-r-vs-rcpp/</feedburner:origLink></item>
		<item>
		<title>The power of Rcpp</title>
		<link>http://feedproxy.google.com/~r/investuotojas/~3/fzEt94ihsRs/</link>
		<comments>http://www.investuotojas.eu/2012/01/30/the-power-of-rcpp/#comments</comments>
		<pubDate>Mon, 30 Jan 2012 21:51:38 +0000</pubDate>
		<dc:creator>Dzidorius Martinaitis</dc:creator>
				<category><![CDATA[EN]]></category>
		<category><![CDATA[quant]]></category>
		<category><![CDATA[R-language]]></category>
		<category><![CDATA[quantitative]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[rcpp]]></category>

		<guid isPermaLink="false">http://www.investuotojas.eu/?p=703</guid>
		<description><![CDATA[While ago I built two R scripts to track OMX Baltic Benchmark Fund against the index. One script returns the deviation of  fund from the index and it works fast enough. The second calculates the value of the fund every minute and it used to take for while. For example, it spent 2 minutes or [...]]]></description>
			<content:encoded><![CDATA[<p>While ago I built two R scripts to track <a href="http://markets.ft.com/research/Markets/Tearsheets/Financials?s=OAMOBBF1L:VLX" target="_blank">OMX Baltic Benchmark Fund</a> against the index. One script returns the deviation of  fund from the index and it works fast enough. The second calculates the value of the fund every minute and it used to take for while. For example, it spent 2 minutes or more to get the values for one day. Here is an example of the result:</p>
<p><a href="http://s176.photobucket.com/albums/w180/investuotojas/?action=view&amp;current=ind.png" target="_blank"><img src="http://i176.photobucket.com/albums/w180/investuotojas/ind.png" alt="Photobucket" border="0" /></a></p>
<p>Following piece of code was in question:</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p703code15'); return false;">View Code</a> RSPLUS</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p70315"><td class="code" id="p703code15"><pre class="rsplus" style="font-family:monospace;"><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/for.html"><span style="color: #0000FF; font-weight: bold;">for</span></a><span style="color: #080;">&#40;</span>y <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/NROW.html"><span style="color: #0000FF; font-weight: bold;">NROW</span></a><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
 <span style="color: #080;">&#123;</span>
    z<span style="color: #080;">=</span>x<span style="color: #080;">&#91;</span>y,<span style="color: #080;">&#93;</span>
    <a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/if.html"><span style="color: #0000FF; font-weight: bold;">if</span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>z$last_price<span style="color: #080;">&gt;</span><span style="color: #ff0000;">0</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
    <span style="color: #080;">&#123;</span>
      <a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/if.html"><span style="color: #0000FF; font-weight: bold;">if</span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>z$bid<span style="color: #080;">&gt;</span>z$last_price<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>rez<span style="color: #080;">&#91;</span>y<span style="color: #080;">&#93;</span><span style="color: #080;">=</span>z$bid
      <span style="color: #0000FF; font-weight: bold;">else</span> <a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/if.html"><span style="color: #0000FF; font-weight: bold;">if</span></a><span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>z$ask<span style="color: #080;">&#41;</span><span style="color: #080;">&gt;</span><span style="color: #ff0000;">0</span> <span style="color: #080;">&amp;</span>amp<span style="color: #080;">;</span> <a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>z$ask<span style="color: #080;">&#41;</span><span style="color: #080;">&lt;</span>z$last_price<span style="color: #080;">&#41;</span>rez<span style="color: #080;">&#91;</span>y<span style="color: #080;">&#93;</span><span style="color: #080;">=</span>z$ask
      <span style="color: #0000FF; font-weight: bold;">else</span> rez<span style="color: #080;">&#91;</span>y<span style="color: #080;">&#93;</span><span style="color: #080;">=</span>z$last_price
    <span style="color: #080;">&#125;</span>
    <span style="color: #0000FF; font-weight: bold;">else</span>
    <span style="color: #080;">&#123;</span>
      rez<span style="color: #080;">&#91;</span>y<span style="color: #080;">&#93;</span><span style="color: #080;">=</span><span style="color: #080;">&#40;</span>z$ask<span style="color: #080;">+</span>z$bid<span style="color: #080;">&#41;</span><span style="color: #080;">/</span><span style="color: #ff0000;">2</span>
    <span style="color: #080;">&#125;</span>
 <span style="color: #080;">&#125;</span></pre></td></tr></table></div>

<p>The code above loops over time series and based on set of rules tries to decide which price (bid, ask or previous one) to use for calculations. Pure R script used to take 100 seconds to derive the price.</p>
<p>During the weekend I found time to watch very interesting <a href="http://goo.gl/zzq0B" target="_blank">Rcpp presentation</a>. To my surprise, there are numerous ways to seamlessly integrate C++ into R code. So, I decided to rewrite the code above in C++ (Rcpp and inline packages were used).</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p703code16'); return false;">View Code</a> RSPLUS</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p70316"><td class="code" id="p703code16"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;">#c++ code embed in code value</span>
code<span style="color: #080;">=</span><span style="color: #ff0000;">'
NumericVector bid(bid_);NumericVector ask(ask_);NumericVector close(close_);NumericVector ret(ask_);
int bid_size = bid.size();
for(int i =0;i&lt;bid_size;i++)
{
  if(close[i]&gt;0)
  {
    if(bid[i]&gt;close[i])
    {
      ret[i] = bid[i];
    }
    else if(ask[i]&gt;0 &amp;amp;&amp;amp; ask[i]&lt;close[i])
    {
      ret[i] = ask[i];//
    }
    else
    {
      ret[i] = close[i];//
    }
  }
  else
  {
    ret[i]=(bid[i]+ask[i])/2;
  }
&nbsp;
}
return ret;
'</span>
<span style="color: #228B22;">#a glue function between C++ and R</span>
getLastPrice <span style="color: #080;">=</span> cxxfunction<span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/base/html/Log.html"><span style="color: #0000FF; font-weight: bold;">signature</span></a><span style="color: #080;">&#40;</span> bid_ <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;numeric&quot;</span>,ask_ <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;numeric&quot;</span>,close_<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;numeric&quot;</span><span style="color: #080;">&#41;</span>,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/body.html"><span style="color: #0000FF; font-weight: bold;">body</span></a><span style="color: #080;">=</span>code,plugin<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Rcpp&quot;</span><span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;">#and the call of the function</span>
getLastPrice<span style="color: #080;">&#40;</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>x$bid<span style="color: #080;">&#41;</span>,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>x$ask<span style="color: #080;">&#41;</span>,<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/as.numeric.html"><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span></a><span style="color: #080;">&#40;</span>x$last_price<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>

<p>What did I get in return? Well, 0.1 of a second instead of 100 seconds!</p>

<p><a href="http://feedads.g.doubleclick.net/~a/bgmEkdF3wQB1bpZzaF5SNWHaXfk/0/da"><img src="http://feedads.g.doubleclick.net/~a/bgmEkdF3wQB1bpZzaF5SNWHaXfk/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/bgmEkdF3wQB1bpZzaF5SNWHaXfk/1/da"><img src="http://feedads.g.doubleclick.net/~a/bgmEkdF3wQB1bpZzaF5SNWHaXfk/1/di" border="0" ismap="true"></img></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.investuotojas.eu/2012/01/30/the-power-of-rcpp/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		<feedburner:origLink>http://www.investuotojas.eu/2012/01/30/the-power-of-rcpp/</feedburner:origLink></item>
		<item>
		<title>ai-class.com vs ml-class.com</title>
		<link>http://feedproxy.google.com/~r/investuotojas/~3/lxXj_H2Njzw/</link>
		<comments>http://www.investuotojas.eu/2011/12/16/ai-class-com-vs-ml-class-com/#comments</comments>
		<pubDate>Fri, 16 Dec 2011 22:52:04 +0000</pubDate>
		<dc:creator>Dzidorius Martinaitis</dc:creator>
				<category><![CDATA[books]]></category>
		<category><![CDATA[EN]]></category>
		<category><![CDATA[quant]]></category>
		<category><![CDATA[R-language]]></category>

		<guid isPermaLink="false">http://www.investuotojas.eu/?p=690</guid>
		<description><![CDATA[For those who did not know, Stanford university offered free off charge 3 courses at beginning of the autumn. It is kind of shocking &#8211; US based institution offers education for free! Take any socialism oriented country and one of the promises is education for free. But it seems, that the argument loosing the power &#8211; Stanford, [...]]]></description>
			<content:encoded><![CDATA[<p>For those who did not know, Stanford university offered free off charge 3 courses at beginning of the autumn. It is kind of shocking &#8211; US based institution offers education for free! Take any socialism oriented country and one of the promises is education for free. But it seems, that the argument loosing the power &#8211; Stanford, <a href="http://www.khanacademy.org/" target="_blank">khanacademy</a> and <a href="http://ocw.mit.edu/index.htm" target="_blank">bunch of others</a> offer high quality learning for everyone.</p>
<p><a href="http://jan2012.ml-class.org" target="_blank">In January</a> (scroll down to get full list), Stanford will provide more than 15 courses for free and I thought that I could provide my based opinion about the courses.</p>
<p><a href="http://www.ml-class.com" target="_blank">ml-class.com</a> This course was perfect fit for my personality and I loved it. Every week there was video lessons about the topics like machine learning, datamining, and statistical pattern recognition, overview questions and programming exercises, which had to be completed in Octave/Matlab. The quality of the video was superb, the length of the lessons was 8-14 minutes and format of the lessons was great as well (Prof. Andrew Ng was seamlessly switching between the white board and talks).<br />
This course inspired me to build anomaly detection system at my work, where we already spotted few anomalies. Now I&#8217;m working on  kind of &#8220;spam filter implementation&#8221; for text analysis.<br />
For me, the practical part of the course is like the water for the fish &#8211; without it theoretical part is empty and to be forgotten within the hours.</p>
<p><a href="https://www.ai-class.com/">ai-class.com</a> This course gave to me a broad view about artificial intelligence: machine learning, robotics, natural language processing, computer vision, search algorithms and etc. I suppose, that because the topics are so different the course was align towards theoretical part &#8211; otherwise the practical parts would take forever. However, in the last part there was an optional exercise &#8211; to encrypt two texts, which I loved!<br />
The instructors, namely <a href="http://www.linkedin.com/pub/sebastian-thrun/17/713/88">Sebastian Thrun</a> and <a href="http://www.linkedin.com/in/pnorvig">Peter Norvig</a>, recommend this book: <a href="http://www.amazon.com/gp/product/0136042597/ref=as_li_qf_sp_asin_tl?ie=UTF8&amp;tag=quantitativ0e-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0136042597">Artificial Intelligence: A Modern Approach</a><img style="border: none !important; margin: 0px !important;" src="http://www.assoc-amazon.com/e/ir?t=quantitativ0e-20&amp;l=as2&amp;o=1&amp;a=0136042597" border="0" alt="" width="1" height="1" />. I should say, that the book was very helpful during the course and but I won&#8217;t use it outside the course.<br />
The courses have different evaluation systems. AI class will score your homework and exams, where the top 1% will be awarded with special paper and maybe a <a href="http://pastebin.com/JiczaBxb" target="_blank">job offer</a>, while ML class inclined towards delivering knowledge &#8211; almost everyone working hard could get 100% score without a penalty. I think, that based on such environments, different communities sprang up - <a href="http://www.aiqus.com/" target="_blank">aiqus.com</a> forum is very harsh to any question, where the answers start by stating, like &#8220;I know the answer, but hey, I can&#8217;t tell you anything, because honor code doesn&#8217;t allow and I&#8217;m the smartest guy on the Earth&#8221;, while <a href="http://www.ml-class.org/course/qna/index" target="_blank">ml-class forum</a> is more open minded &#8211; if you can&#8217;t crack the problem then other students will help you.<br />
I was in light shock, when I saw the format of AI lectures first time &#8211; the instructors used real white board, namely paper and pencil and took me a while to get use it.</p>
<p>But overall, I really really enjoy both courses and special thanks to Stanford professors, concretely Andrew Ng, Sebastian Thrun and Peter Norvig!</p>

<p><a href="http://feedads.g.doubleclick.net/~a/lp4L1NBM5tavsaPOWV7qEgx9aO0/0/da"><img src="http://feedads.g.doubleclick.net/~a/lp4L1NBM5tavsaPOWV7qEgx9aO0/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/lp4L1NBM5tavsaPOWV7qEgx9aO0/1/da"><img src="http://feedads.g.doubleclick.net/~a/lp4L1NBM5tavsaPOWV7qEgx9aO0/1/di" border="0" ismap="true"></img></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.investuotojas.eu/2011/12/16/ai-class-com-vs-ml-class-com/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.investuotojas.eu/2011/12/16/ai-class-com-vs-ml-class-com/</feedburner:origLink></item>
	</channel>
</rss>

