<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom">
 
 <title>Chris Lowis</title>
 
 <link href="http://blog.chrislowis.co.uk/" />
 <updated>2009-08-25T21:49:35+01:00</updated>
 <id>http://blog.chrislowis.co.uk/</id>
 <author>
   <name>Chris Lowis</name>
   <email>chris.lowis@gmail.com</email>
 </author>

 
 <link rel="self" href="http://feeds.feedburner.com/ChrisLowis" type="application/atom+xml" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><entry>
   <title>Recent Activities</title>
   <link href="http://blog.chrislowis.co.uk/2009/08/25/recent-activities.html" />
   <updated>2009-08-25T00:00:00+01:00</updated>
   <id>http://blog.chrislowis.co.uk/2009/08/25/recent-activities</id>
   <content type="html">&lt;h1&gt;Recent Activities&lt;/h1&gt;
&lt;p class="meta"&gt;25 Aug 2009 &amp;#8211; London&lt;/p&gt;
&lt;h2&gt;Recently &amp;#8230;&lt;/h2&gt;
&lt;p&gt;Back in May I started a new job at the &lt;span class="caps"&gt;BBC&lt;/span&gt;. I&amp;#8217;ve been working on some &lt;a href="http://www.bbc.co.uk/music/introducing"&gt;interesting&lt;/a&gt; &lt;a href="http://www.bbc.co.uk/music"&gt;projects&lt;/a&gt; with some great people. I&amp;#8217;ve also been lucky enough to take part in a couple of Hack Days in an official-ish capacity, and I intend to blog about some of the technical aspects soon. For now:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;&lt;a href="http://musichackday.org/hacks.php?page=MusicBore"&gt;The Music Bore&lt;/a&gt; &amp;#8211; an automated DJ system that &lt;a href="http://www.aelius.com/njh/"&gt;Nick&lt;/a&gt;, &lt;a href="http://www.metade.org/"&gt;Patrick&lt;/a&gt;, &lt;a href="http://moustaki.org/"&gt;Yves&lt;/a&gt; and I built at the Music Hack Day. I wrote some more about this over on the &lt;a href="http://www.bbc.co.uk/blogs/radiolabs/2009/07/the_music_bore.shtml"&gt;&lt;span class="caps"&gt;BBC&lt;/span&gt; Radio Labs blog&lt;/a&gt;. Coming soon to an &lt;span class="caps"&gt;IRC&lt;/span&gt; channel, hosted server, or if Patrick and I get our way Art Installation near you!&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt;David and I were invited to the Guardian&amp;#8217;s Hack Day where we worked   on a &lt;span class="caps"&gt;RDF&lt;/span&gt; and Processing powered mash-up of Guardian and &lt;span class="caps"&gt;BBC&lt;/span&gt; content   around MPs. David   &lt;a href="http://www.bbc.co.uk/blogs/bbcinternet/2009/08/at_the_end_of_july.html"&gt;blogged&lt;/a&gt;   about this on the &lt;span class="caps"&gt;BBC&lt;/span&gt; Internet blog.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Check back, or subscribe to the &lt;span class="caps"&gt;RSS&lt;/span&gt; feed for some technical details coming soon.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ChrisLowis/~4/KwxwPtaJkLg" height="1" width="1"/&gt;</content>
 </entry>
 
 <entry>
   <title>Using R and Ruby - slides from February LRUG</title>
   <link href="http://blog.chrislowis.co.uk/2009/02/15/LRUG-R-Ruby-talk.html" />
   <updated>2009-02-15T00:00:00+00:00</updated>
   <id>http://blog.chrislowis.co.uk/2009/02/15/LRUG-R-Ruby-talk</id>
   <content type="html">&lt;h1&gt;Using R and Ruby &amp;#8211; slides from February &lt;span class="caps"&gt;LRUG&lt;/span&gt;&lt;/h1&gt;
&lt;p class="meta"&gt;15 Feb 2009 &amp;#8211; Southampton&lt;/p&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Last week I gave a short talk at the &lt;a href="http://lrug.org/meetings/2009/01/20/february-2009-meeting/"&gt;February &lt;span class="caps"&gt;LRUG&lt;/span&gt; meeting&lt;/a&gt; about using the scientific programming environment R together with Ruby.&lt;/p&gt;
&lt;p&gt;My presentation gave a few examples of the kind of statistical analysis that can be performed with R, and showed how easy this functionality is to access from within your Ruby code.&lt;/p&gt;
&lt;h2&gt;Slides&lt;/h2&gt;
&lt;p&gt;&lt;object width='1024' height='768'&gt;&lt;br /&gt;
&lt;param name='movie' value='lrug.swf'&gt;&lt;br /&gt;
&lt;embed src='/images/lrug.swf' width='600' height='450'&gt;&lt;br /&gt;
&lt;/embed&gt;&lt;br /&gt;
&lt;/object&gt;&lt;/p&gt;
&lt;h2&gt;Code&lt;/h2&gt;
&lt;p&gt;I&amp;#8217;ll post the code for the Twitter analysis to github soon. It needs a little tidying up first.&lt;/p&gt;
&lt;p&gt;An initial draft of this presentation used R to develop a recommendation algorithm (using a k-means clustering) using the GitHub api. Although it worked, the recommendations it made were not great, so I removed the code from the presentation and replaced it with the Twitter analysis. I&amp;#8217;d like to resurrect this code at some point, so remember to subscribe to my &lt;a href="http://feeds.feedburner.com/ChrisLowis"&gt;&lt;span class="caps"&gt;RSS&lt;/span&gt; feed&lt;/a&gt; if you&amp;#8217;re interested, or nag me to do it in the comments field below!&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ChrisLowis/~4/90CkCj0du1c" height="1" width="1"/&gt;</content>
 </entry>
 
 <entry>
   <title>Calculating the Pearson correlation coefficient using R and Ruby</title>
   <link href="http://blog.chrislowis.co.uk/2009/01/21/pearson-correlation-using-R-and-Ruby.html" />
   <updated>2009-01-21T00:00:00+00:00</updated>
   <id>http://blog.chrislowis.co.uk/2009/01/21/pearson-correlation-using-R-and-Ruby</id>
   <content type="html">&lt;h1&gt;Calculating the Pearson correlation coefficient using R and Ruby&lt;/h1&gt;
&lt;p class="meta"&gt;21 Jan 2009 &amp;#8211; Southampton&lt;/p&gt;
&lt;p&gt;In a &lt;a href="http://blog.chrislowis.co.uk/2008/11/24/ruby-gsl-pearson.html"&gt;previous article&lt;/a&gt; I talked about using the &lt;span class="caps"&gt;GNU&lt;/span&gt; scientific library to implement the Pearson correlation algorithm, as used for example in &lt;a href="http://github.com/maccman/acts_as_recommendable/tree/master"&gt;acts_as_recommendable&lt;/a&gt;. As a prelude to some forthcoming articles, I&amp;#8217;d like to show you how easy it is to implement the same thing using the Ruby bindings to the &lt;a href="http://www.r-project.org/"&gt;R project&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For this to work you&amp;#8217;ll need to install &lt;a href="http://www.r-project.org/"&gt;R&lt;/a&gt; and the &lt;a href="http://web.kuicr.kyoto-u.ac.jp/~alexg/rsruby/"&gt;rsruby&lt;/a&gt; gem. Take a look at the documentation for the rsruby gem, as while installation is straightforward there&amp;#8217;s a couple of things to be aware of.&lt;/p&gt;
&lt;p&gt;Having done that, let&amp;#8217;s reopen the Pearson class from the &lt;a href="http://blog.chrislowis.co.uk/2008/11/24/ruby-gsl-pearson.html"&gt;previous article&lt;/a&gt; and add a new R-based method&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
class Pearson
  def initialize
    require 'rsruby'
    @r = RSRuby.instance
  end

  def R_pearson(x,y)
    @r.cor(x,y)
  end
end     
&lt;/pre&gt;&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The new initialize method sets up the communication with the R instance. The actual definition for the Pearson method simple, conversion from Ruby Arrays to R vectors is handled automatically by the bindings, you simply need to know the correct R method to call &amp;#8211; in this case &amp;#8216;cor&amp;#8217;. Take a look through the R manual to learn more about this powerful tool &amp;#8211; almost all the features are accessible through the Ruby bindings.&lt;/p&gt;
&lt;p&gt;Here&amp;#8217;s a quick modification to the benchmark to compare performances&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
require 'benchmark'

n = 100000
x = []; n.times{x &amp;lt;&amp;lt; rand}
y = []; n.times{y &amp;lt;&amp;lt; rand}

p = Pearson.new()

Benchmark.bm() do |bm|
  bm.report("Ruby:") {p.ruby_pearson(x,y)}
  bm.report("GSL:") {p.gsl_pearson(x,y)}
  bm.report("Inline:") {p.inline_pearson(n,x,y)}
  bm.report("R:") {p.R_pearson(x,y)}
end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And the results,&lt;/p&gt;
&lt;pre&gt;
           user     system      total        real
Ruby:    1.590000   0.020000   1.610000 (  1.610470)
GSL:     0.010000   0.000000   0.010000 (  0.062538)
Inline:  0.010000   0.000000   0.010000 (  0.004548)
R:       0.220000   0.010000   0.230000 (  0.227184)
&lt;/pre&gt;
&lt;p&gt;The R version is around 7 times faster than the native Ruby version, but not as fast as the C-based approaches &lt;a href="http://blog.chrislowis.co.uk/2008/11/24/ruby-gsl-pearson.html"&gt;described earlier&lt;/a&gt; .&lt;/p&gt;
&lt;p&gt;The real power of interfacing with R however is in the ability to quickly swap out one algorithm for another, or experiment interactively with your data in an irb console. Once you have an algorithm that works well in your case, then it may be necessary to re-implement in a faster language if performance is a concern.&lt;/p&gt;
&lt;p&gt;I&amp;#8217;ll be talking about R and Ruby a little more in the future,  so subscribe to the &lt;a href="http://feeds.feedburner.com/ChrisLowis"&gt;&lt;span class="caps"&gt;RSS&lt;/span&gt; feed&lt;/a&gt; if you&amp;#8217;re interested.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ChrisLowis/~4/81VWsh7pJLk" height="1" width="1"/&gt;</content>
 </entry>
 
 <entry>
   <title>Identify Programming Languages with SourceClassifier</title>
   <link href="http://blog.chrislowis.co.uk/2009/01/04/identify-programming-languages-with-source-classifier.html" />
   <updated>2009-01-04T00:00:00+00:00</updated>
   <id>http://blog.chrislowis.co.uk/2009/01/04/identify-programming-languages-with-source-classifier</id>
   <content type="html">&lt;h1&gt;Identify Programming Languages with SourceClassifier&lt;/h1&gt;
&lt;p class="meta"&gt;4 Jan 2009 &amp;#8211; Southampton&lt;/p&gt;
&lt;p&gt;Do you need to identify the programming language used in a snippet of code? For example, in a &lt;a href="http://pastie.org/"&gt;pastie&lt;/a&gt; style application, in your blog comments system. I&amp;#8217;ve just released version 0.2.1 of SourceClassifier over on &lt;a href="http://github.com/chrislo/sourceclassifier/tree/master"&gt;github&lt;/a&gt; .&lt;/p&gt;
&lt;p&gt;Source classifier identifies programming language using a Bayesian classifier trained on a corpus generated from the &lt;a href="http://shootout.alioth.debian.org/"&gt;Computer Language Benchmarks Game&lt;/a&gt; . It is written in Ruby and available as a gem. Out of the box SourceClassifier recognises C, Java, Javascript, Perl, Python and Ruby. A nice advantage of using a Bayesian classifier to identify the source code is that even false matches will still give some usable highlighting. To train the classifier to identify new languages download the sources from &lt;a href="http://github.com/chrislo/sourceclassifier/tree/master"&gt;github&lt;/a&gt; .&lt;/p&gt;
&lt;h2&gt;Usage&lt;/h2&gt;
&lt;p&gt;First install the gem using github as a source&lt;/p&gt;
&lt;pre&gt;
  $ gem sources -a http://gems.github.com 
  $ sudo gem install chrislo-sourceclassifier
&lt;/pre&gt;
&lt;p&gt;Then, to use&lt;/p&gt;
&lt;pre&gt;
  require 'rubygems'
  require 'sourceclassifier'
  
  s = SourceClassifier.new
  
  ruby_text = &amp;lt;&amp;lt;EOT
  def my_sorting_function(a)
    a.sort
  end
  EOT
  
  c_text = &amp;lt;&amp;lt;EOT
  #include &amp;lt;unistd.h&amp;gt;
  
  int main() {
    write(1, "hello world\n", 12);
    return(0);
  }
  EOT
  
  s.identify(ruby_text) #=&amp;gt; Ruby
  s.identify(c_text) #=&amp;gt; Gcc
&lt;/pre&gt;
&lt;h2&gt;Training&lt;/h2&gt;
&lt;p&gt;Download the sources from github and in the directory run the training rake test&lt;/p&gt;
&lt;pre&gt;
  $ rake train
&lt;/pre&gt;
&lt;p&gt;In the ./sources directory are sub-directories for each language you wish to identify. Each sub-directory contains examples of programs written in that language. The name of the directory is significant &amp;#8211; it is the value returned by the SourceClassifier.identify() method.&lt;/p&gt;
&lt;p&gt;The rake task populate can be used to build these sub-directories from a checkout of the &lt;a href="http://alioth.debian.org/scm/?group_id=30402"&gt;computer language shootout sources&lt;/a&gt; but you are free to train the classifier using any available examples.&lt;/p&gt;
&lt;h2&gt;Acknowledgments&lt;/h2&gt;
&lt;p&gt;This library depends heavily on the great &lt;a href="http://classifier.rubyforge.org/"&gt;Classifier&lt;/a&gt; gem by Lucas Carlson and David Fayram II.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ChrisLowis/~4/HZUtUTJXH4E" height="1" width="1"/&gt;</content>
 </entry>
 
 <entry>
   <title>Get working quickly with a customised Rails project launcher</title>
   <link href="http://blog.chrislowis.co.uk/2008/12/30/get-started-quickly-rails-project-launcher.html" />
   <updated>2008-12-30T00:00:00+00:00</updated>
   <id>http://blog.chrislowis.co.uk/2008/12/30/get-started-quickly-rails-project-launcher</id>
   <content type="html">&lt;h1&gt;Get working quickly with a customised Rails project launcher&lt;/h1&gt;
&lt;p class="meta"&gt;30 Dec 2008 &amp;#8211; Southampton&lt;/p&gt;
&lt;p&gt;I just knocked up a quick Ruby script to automate some of the steps I always do when starting work on a Rails project, namely:&lt;br /&gt;
  &amp;#8211; open an iTerm tab running script/server&lt;br /&gt;
  &amp;#8211; open an iTerm tab running script/console&lt;br /&gt;
  &amp;#8211; open an iTerm tab running autotest&lt;br /&gt;
  &amp;#8211; open an iTerm tab running a shell window&lt;br /&gt;
  &amp;#8211; launch Safari on localhost:3000 for testing.&lt;/p&gt;
&lt;p&gt;Customise this little script to suit your own needs. You&amp;#8217;ll need to install the rb-appscript gem to start with, and obviously this only applies to OS X users.&lt;/p&gt;
&lt;pre class="terminal"&gt;&lt;code&gt;
#!/usr/bin/env ruby

require 'rubygems'
require 'appscript'

RAILS_PROJECT_PATH = "~/path_to_your_rails_project" # Customise this

def create_iterm_tab( command = "" )
  @iterm ||= Appscript::app( 'iTerm' )
  session = @iterm.current_terminal.sessions.end.make( :new =&amp;gt; :session )
  session.exec( :command =&amp;gt; 'bash -l' )
  session.write( :text =&amp;gt; "cd #{RAILS_PROJECT_PATH}" )
  session.write( :text =&amp;gt; command ) unless command.nil?
end

def launch_browser( url = "http://localhost:3000")
  @safari ||= Appscript::app('Safari')
  @safari.open_location(url)
end

create_iterm_tab("./script/server")
create_iterm_tab("./script/console")
create_iterm_tab("autotest")
create_iterm_tab()
sleep(5)
launch_browser()
&lt;/pre&gt;&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The sleep(5) is in there to give time for the server to spin up before launching the browser.&lt;/p&gt;
&lt;p&gt;Now put all this in a file called &amp;#8220;launch_project.command&amp;#8221; on the Desktop, and chmod +x it to allow it to allow it to execute when double-clicked.&lt;/p&gt;
&lt;p&gt;I got some tips from &lt;a href="http://logaan.wordpress.com/2008/01/16/quicksilver-rails-project-opener-revisited/"&gt;Dribblings of a Deranged Hermit&lt;/a&gt; .&lt;/p&gt;
&lt;p&gt;Happy New Year!&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ChrisLowis/~4/YxV3bf_tCoI" height="1" width="1"/&gt;</content>
 </entry>
 
 <entry>
   <title>Fitting curves to data using Ruby and the GNU Scientific Library</title>
   <link href="http://blog.chrislowis.co.uk/2008/12/01/curve-fit-with-ruby-gsl.html" />
   <updated>2008-12-01T00:00:00+00:00</updated>
   <id>http://blog.chrislowis.co.uk/2008/12/01/curve-fit-with-ruby-gsl</id>
   <content type="html">&lt;h1&gt;Fitting curves to data using Ruby and the &lt;span class="caps"&gt;GNU&lt;/span&gt; Scientific Library&lt;/h1&gt;
&lt;p class="meta"&gt;6 Dec 2008 &amp;#8211; Southampton&lt;/p&gt;
&lt;p&gt;In this post I&amp;#8217;ll show you how to use the &lt;a href="http://www.gnu.org/software/gsl/"&gt;&lt;span class="caps"&gt;GNU&lt;/span&gt; Scientific Library&lt;/a&gt; and its &lt;a href="http://rb-gsl.rubyforge.org/"&gt;Ruby bindings&lt;/a&gt; to fit curves to data. This technique is useful if, for example, you want to extrapolate into the future on the basis of some past information.&lt;/p&gt;
&lt;p&gt;By way of example, your boss approaches you with some historic revenue figures from your new Web 2.0 venture, and asks you to predict future growth. Let&amp;#8217;s generate some example data to play with:&lt;/p&gt;
&lt;pre class="terminal"&gt;&lt;code&gt;
require 'gsl'
time = GSL::Vector.linspace(0,24,100)
revenue = GSL::Vector.linspace(0,1,100).collect{|yi| yi+rand()/10}
&lt;/pre&gt;&lt;p&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;In this code we&amp;#8217;re using the &lt;span class="caps"&gt;GSL&lt;/span&gt; Vector class to represent time. Here we have a vector of months representing two years of data (24 months). The linspace method causes our vector to have 100 evenly spaced elements between 0 and 24.&lt;/p&gt;
&lt;p&gt;We concoct some revenue data using &lt;span class="caps"&gt;GSL&lt;/span&gt;::Vector again. The &lt;span class="caps"&gt;GSL&lt;/span&gt; Vector class does not implement all of methods found in Ruby&amp;#8217;s native array class, but does have &amp;#8216;collect&amp;#8217;. Here we create a random scattering of points.&lt;/p&gt;
&lt;p&gt;Let&amp;#8217;s visualise this data first to see what we are dealing with. To do this we have a number of options, here I&amp;#8217;d like to show how we can use Gnuplot to do the visualisation.&lt;/p&gt;
&lt;p&gt;First install the gnuplot gem:&lt;/p&gt;
&lt;pre class="terminal"&gt;&lt;code&gt;
$ gem install gnuplot
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You&amp;#8217;ll need to have &lt;a href="http://www.gnuplot.info/"&gt;gnuplot&lt;/a&gt; installed to use the bindings. Binary packages are available for many platforms, just make sure that gnuplot is in your path after installation. OS X users should install &lt;a href="http://sourceforge.net/projects/aquaterm/"&gt;aquaterm&lt;/a&gt; to allow plotting to a desktop window.&lt;/p&gt;
&lt;p&gt;Now let&amp;#8217;s plot the data generated above:&lt;/p&gt;
&lt;pre class="terminal"&gt;&lt;code&gt;
require 'rubygems'
require 'gnuplot'

Gnuplot.open do |gp|
  Gnuplot::Plot.new( gp ) do |plot|

    plot.title  "Company turnover"
    plot.xlabel "Month"
    plot.ylabel "Billions $"
    
    plot.data &amp;lt;&amp;lt; Gnuplot::DataSet.new( [time.to_a, revenue.to_a] ) do |ds|
      ds.with = "points"
      ds.notitle
    end

    plot.terminal "svg"
    plot.output "revenue.svg"
  end
end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The code above generates the following graphic in &lt;span class="caps"&gt;SVG&lt;/span&gt; format.&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/revenue.jpg" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;Gnuplot has a number of output formats, known as &amp;#8220;terminals&amp;#8221; to choose from including one which plots straight to screen &amp;#8211; great for rapid exploration of data.&lt;/p&gt;
&lt;p&gt;Looking at the data, we see that a straight line drawn through the points would be a good model for the growth in revenue. We can use GSL&amp;#8217;s line-fitting tools to perform this fit for us:&lt;/p&gt;
&lt;pre class="terminal"&gt;&lt;code&gt;
(c0, c1, cov00, cov01, cov11, chisq, status) = GSL::Fit::linear(time,revenue)
revenue_fit = (time * c1) +c0
puts c1
puts c0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The value of c1 is the growth rate per month, in this case around 0.04 Billion $ per month! We have used a linear fit here, but the &lt;span class="caps"&gt;GSL&lt;/span&gt; provides &lt;a href="http://rb-gsl.rubyforge.org/fit.html"&gt;many&lt;/a&gt; &lt;a href="http://rb-gsl.rubyforge.org/nonlinearfit.html"&gt;other&lt;/a&gt; &lt;a href="http://rb-gsl.rubyforge.org/bspline.html"&gt;fitting methods&lt;/a&gt; for more complicated data. A quick plot checks the goodness of our fit:&lt;/p&gt;
&lt;pre class="terminal"&gt;&lt;code&gt;
    plot.data &amp;lt;&amp;lt; Gnuplot::DataSet.new( [time.to_a, revenue_fit.to_a] ) do |ds|
      ds.with = "lines"
      ds.notitle
    end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Add this snippet of code to the plotting function above (after the first plot.data and before the call to plot.terminal)&lt;/p&gt;
&lt;p&gt;&lt;img src="/images/fitted.jpg" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;If you need to perform data analysis, provide graphics for your users in your webapp, or produce high quality plots I encourage you to investigate the combination of ruby, &lt;span class="caps"&gt;GSL&lt;/span&gt; and GNUPlot.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ChrisLowis/~4/94ql0gHVirk" height="1" width="1"/&gt;</content>
 </entry>
 
 <entry>
   <title>Implementing the Pearson correlation algorithm using Ruby and the GNU Scientific Library</title>
   <link href="http://blog.chrislowis.co.uk/2008/11/24/ruby-gsl-pearson.html" />
   <updated>2008-11-24T00:00:00+00:00</updated>
   <id>http://blog.chrislowis.co.uk/2008/11/24/ruby-gsl-pearson</id>
   <content type="html">&lt;h1&gt;Implementing the Pearson correlation algorithm using Ruby and the &lt;span class="caps"&gt;GNU&lt;/span&gt; Scientific Library&lt;/h1&gt;
&lt;p class="meta"&gt;24 Nov 2008 &amp;#8211; Southampton&lt;/p&gt;
&lt;p&gt;At &lt;a href="http://rubymanor.org/"&gt;Ruby Manor&lt;/a&gt; on Saturday &lt;a href="http://www.eribium.org/blog/"&gt;Alex MacCaw&lt;/a&gt; gave a great introduction to his &lt;a href="http://github.com/maccman/acts_as_recommendable/tree/master"&gt;acts_as_recommendable&lt;/a&gt; plugin for Rails. acts_as_recommendable simplifies collaborative filtering for Rails models, automatically generating recommended items, at an on-line store for example, based on a database of user preferences.&lt;/p&gt;
&lt;p&gt;At its heart, acts_as_recommendable uses a statistical measure known as the Pearson correlation coefficient to calculate the &amp;#8220;nearness&amp;#8221; of items to one another. Alex talked about the performance issues he encountered when implementing the algorithm in pure ruby. To allow recommendations to be calculated for the entire database he had to switch to making calculations offline and reimplementing the algorithm in C using &lt;a href="http://rubyforge.org/projects/rubyinline/"&gt;RubyInline&lt;/a&gt; .&lt;/p&gt;
&lt;p&gt;The &lt;a href="http://www.gnu.org/software/gsl/"&gt;&lt;span class="caps"&gt;GNU&lt;/span&gt; Scientific Library&lt;/a&gt; has an implementation of the Pearson algorithm, and in this post I&amp;#8217;d like to show how the Ruby code, or its inline-C equivalent can be replaced with &lt;span class="caps"&gt;GSL&lt;/span&gt; code using Ruby bindings to the &lt;span class="caps"&gt;GSL&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;My naive pure Ruby version of the algorithm looks like this:&lt;/p&gt;
&lt;pre class="terminal"&gt;&lt;code&gt;
 def ruby_pearson(x,y)
   n=x.length 

   sumx=x.inject(0) {|r,i| r + i}
   sumy=y.inject(0) {|r,i| r + i}

   sumxSq=x.inject(0) {|r,i| r + i**2}
   sumySq=y.inject(0) {|r,i| r + i**2}

   prods=[]; x.each_with_index{|this_x,i| prods &amp;lt;&amp;lt; this_x*y[i]}
   pSum=prods.inject(0){|r,i| r + i}

   # Calculate Pearson score 
   num=pSum-(sumx*sumy/n) 
   den=((sumxSq-(sumx**2)/n)*(sumySq-(sumy**2)/n))**0.5 
   if den==0
     return 0 
   end
   r=num/den 
   return r 
 end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here I should note that the acts_as_recommendable code is considerably more complicated, however the heart of the calculation looks something like the above. We can replace that with an inline C version using something like:&lt;/p&gt;
&lt;pre class="terminal"&gt;&lt;code&gt;

 require 'rubygems'
 require 'inline'

 inline do |builder|
   builder.c '
     #include &amp;lt;math.h&amp;gt;
     double inline_pearson(int n, VALUE x, VALUE y) {
     double sum1 = 0.0;
     double sum2 = 0.0;
     double sum1Sq = 0.0;
     double sum2Sq = 0.0;
     double pSum = 0.0;

     VALUE *x_a = RARRAY(x)-&amp;gt;ptr;
     VALUE *y_a = RARRAY(y)-&amp;gt;ptr;

     int i;
     for(i=0; i&amp;lt;n; i++) {
       double this_x;
       double this_y;
       this_x = NUM2DBL(x_a[i]);
       this_y = NUM2DBL(y_a[i]);
       sum1 += this_x;
       sum2 += this_y;
       sum1Sq += pow(this_x, 2);
       sum2Sq += pow(this_y, 2);
       pSum += this_y * this_x;
     }

     double num;
     double den;
     num = pSum - ( ( sum1 * sum2 ) / n );
     den = sqrt( ( sum1Sq - ( pow(sum1, 2) ) / n ) * 
           ( sum2Sq - ( pow(sum2, 2) ) / n ) );
     if(den == 0){
       return 0.0;
     } else {
       return num / den;
     }
    }'
 end

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which is a considerable amount of code. I love the fact that this code can be embedded directly in the Ruby source code but, as was pointed out to me, for some people this would be seen as something of a maintenance nightmare. If you&amp;#8217;re prepared to install the pre-requisite &lt;span class="caps"&gt;GSL&lt;/span&gt; library and its &lt;a href="http://rb-gsl.rubyforge.org/"&gt;Ruby bindings&lt;/a&gt; you can actually replace all of the above code with the simple:&lt;/p&gt;
&lt;pre class="terminal"&gt;&lt;code&gt;
 require 'gsl'

  def gsl_pearson(x,y)
   GSL::Stats::correlation(GSL::Vector.alloc(x),GSL::Vector.alloc(y))
  end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this code, the &lt;verb&gt;&lt;span class="caps"&gt;GSL&lt;/span&gt;::Vector.alloc()&lt;/verb&gt; method converts a Ruby Array to a &lt;span class="caps"&gt;GSL&lt;/span&gt; Vector class. We then call the correlation method, helpfully provided by the &lt;a href="http://rb-gsl.rubyforge.org/stats.html"&gt;Statistics&lt;/a&gt; portion of the &lt;span class="caps"&gt;GSL&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;So, how does this perform ? A quick benchmark :&lt;/p&gt;
&lt;pre class="terminal"&gt;&lt;code&gt;
 require 'benchmark'

 n = 100000
 x = []; n.times{x &amp;lt;&amp;lt; rand}
 y = []; n.times{y &amp;lt;&amp;lt; rand}
 
 Benchmark.bm() do |bm|
   bm.report("Ruby:") {ruby_pearson(x,y)}
   bm.report("GSL:") {gsl_pearson(x,y)}
   bm.report("Inline:") {inline_pearson(n,x,y)}
 end
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Gives some indicative results :&lt;/p&gt;
&lt;pre class="terminal"&gt;&lt;code&gt;
 user     system      total        real
Ruby:   1.530000   0.020000   1.550000 (  1.544765)
GSL:    0.010000   0.000000   0.010000 (  0.015925)
Inline: 0.000000   0.000000   0.000000 (  0.004115)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While the Inline version in this example outperforms the &lt;span class="caps"&gt;GSL&lt;/span&gt; version, both offer considerable savings on the Ruby version. One of the real advantages of the &lt;span class="caps"&gt;GSL&lt;/span&gt; is it allows you to rapidly experiment with alternative implementations of an algorithm. Also you can be safe in the knowledge that your code is based on a well-tested library of scientific functions.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ChrisLowis/~4/BIThI6CXLcA" height="1" width="1"/&gt;</content>
 </entry>
 
 
</feed>
