<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Christophe Ladroue &#187; Blog</title>
	<atom:link href="http://chrisladroue.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://chrisladroue.com</link>
	<description></description>
	<lastBuildDate>Thu, 31 Dec 2015 10:05:54 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=4.3.15</generator>
	<item>
		<title>Another take on building a multi-lingual shiny app</title>
		<link>http://chrisladroue.com/2014/11/another-take-on-building-a-multi-lingual-shiny-app/</link>
		<comments>http://chrisladroue.com/2014/11/another-take-on-building-a-multi-lingual-shiny-app/#comments</comments>
		<pubDate>Sat, 08 Nov 2014 15:58:00 +0000</pubDate>
		<dc:creator><![CDATA[CL]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[multilingual]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[shiny]]></category>

		<guid isPermaLink="false">http://chrisladroue.com/?p=1134</guid>
		<description><![CDATA[I was reading this interesting post about how to build a multi-lingual Shiny app. I&#8217;m also building a multi-lingual Shiny app and came up with slightly different take on it. First, I don&#8217;t use a function for finding the translation, &#8230; <a href="http://chrisladroue.com/2014/11/another-take-on-building-a-multi-lingual-shiny-app/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p><a href="http://chrisladroue.com/wp-content/uploads/2014/11/multilingualShinyApp1.png"><img src="http://chrisladroue.com/wp-content/uploads/2014/11/multilingualShinyApp1.png" alt="" title="multilingualShinyApp" width="252" height="174" class="alignleft size-full wp-image-1163" /></a>I was reading <a href="http://withr.me/blog/2014/10/17/design-a-bilingual-shiny-application/">this interesting post</a> about how to build a multi-lingual <TT>Shiny</TT> app. I&#8217;m also building a multi-lingual <TT>Shiny</TT> app and came up with slightly different take on it.</p>
<p>First, I don&#8217;t use a function for finding the translation, but a 2D list. This way I can directly get to the translation with a simple access to the list.</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;">translation <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span>
  <span style="color: #ff0000;">&quot;youhaveselected&quot;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;en&quot;</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;You have selected:&quot;</span>, <span style="color: #ff0000;">&quot;fr&quot;</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Vous avez sélectionné:&quot;</span><span style="color: #080;">&#41;</span>,
  <span style="color: #ff0000;">&quot;greetings&quot;</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;en&quot;</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;Hello&quot;</span>, <span style="color: #ff0000;">&quot;fr&quot;</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Bonjour&quot;</span><span style="color: #080;">&#41;</span>
  <span style="color: #080;">&#41;</span>
<span style="color: #228B22;"># then:</span>
translation<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span><span style="color: #ff0000;">'greetings'</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span><span style="color: #ff0000;">'en'</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span> <span style="color: #228B22;"># Hello</span>
translation<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span><span style="color: #ff0000;">'greetings'</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span><span style="color: #ff0000;">'fr'</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>  <span style="color: #228B22;"># Bonjour</span></pre></td></tr></table></div></div>

<p>Second, I don&#8217;t use <TT>observe</TT>, as I didn&#8217;t find it necessary. I simply have a radio button for switching between languages, and a function <TT>tr()</TT> to translate a phrase or a list of phrases. Like in the original post, the UI is built from <TT>server.R</TT> using <TT>renderUI()</TT>.</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;">  tr <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">text</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#123;</span> <span style="color: #228B22;"># translates text into current language</span>
    <span style="color: #0000FF; font-weight: bold;">sapply</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">text</span>,<span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>s<span style="color: #080;">&#41;</span> translation<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>s<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>input$language<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>, USE.<span style="">NAMES</span><span style="color: #080;">=</span>FALSE<span style="color: #080;">&#41;</span>
  <span style="color: #080;">&#125;</span></pre></td></tr></table></div></div>


<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;">&nbsp;
  <span style="color: #228B22;"># UI</span>
  output$uiObs <span style="color: #080;">&lt;-</span> renderUI<span style="color: #080;">&#40;</span><span style="color: #080;">&#123;</span>
    sliderInput<span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;obs&quot;</span>, tr<span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;numberOfObservations&quot;</span><span style="color: #080;">&#41;</span>,  
                  <span style="color: #0000FF; font-weight: bold;">min</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">max</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">100</span>, value <span style="color: #080;">=</span> <span style="color: #ff0000;">50</span><span style="color: #080;">&#41;</span>
  <span style="color: #080;">&#125;</span><span style="color: #080;">&#41;</span>
&nbsp;
  output$uiWeekdays <span style="color: #080;">&lt;-</span> renderUI<span style="color: #080;">&#40;</span><span style="color: #080;">&#123;</span>
    <span style="color: #228B22;"># Using a named list in order to pass the same value regardless of the label (which changes with the language)</span>
    daysValue <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Monday&quot;</span>, <span style="color: #ff0000;">&quot;Tuesday&quot;</span>, <span style="color: #ff0000;">&quot;Wednesday&quot;</span>, <span style="color: #ff0000;">&quot;Thursday&quot;</span>, <span style="color: #ff0000;">&quot;Friday&quot;</span>, <span style="color: #ff0000;">&quot;Saturday&quot;</span>, <span style="color: #ff0000;">&quot;Sunday&quot;</span><span style="color: #080;">&#41;</span>
    days <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">structure</span><span style="color: #080;">&#40;</span>daysValue, .<span style="">Names</span> <span style="color: #080;">=</span> tr<span style="color: #080;">&#40;</span>daysValue<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
&nbsp;
    selectInput<span style="color: #080;">&#40;</span>inputId   <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;weekdays&quot;</span>,
                label     <span style="color: #080;">=</span> tr<span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Selection:&quot;</span><span style="color: #080;">&#41;</span>,
                choices   <span style="color: #080;">=</span> days,
                multiple  <span style="color: #080;">=</span> TRUE<span style="color: #080;">&#41;</span>
  <span style="color: #080;">&#125;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>

<p>To make things easier for the translators, the dictionary is stored as a csv file, which is easy to edit. A small <TT>R</TT> script turns the csv into the expected 2D list, and saves it in a binary file, to avoid re-processing the file every time the user decides to switch language.</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;"># update the processed translation file translation.bin</span>
<span style="color: #228B22;"># run this every time dictionary.csv is updated </span>
<span style="color: #228B22;"># it reads the look-up table in dictionary.csv and turns it into a 2D list</span>
&nbsp;
<span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span>plyr<span style="color: #080;">&#41;</span>
translationContent <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">read.<span style="">delim</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;dictionary.csv&quot;</span>, header <span style="color: #080;">=</span> TRUE, sep <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">\t</span>&quot;</span>, as.<span style="">is</span> <span style="color: #080;">=</span> TRUE<span style="color: #080;">&#41;</span> 
translation <span style="color: #080;">&lt;-</span> dlply<span style="color: #080;">&#40;</span>translationContent ,.<span style="color: #080;">&#40;</span>key<span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>s<span style="color: #080;">&#41;</span> key <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">as.<span style="">list</span></span><span style="color: #080;">&#40;</span>s<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #0000FF; font-weight: bold;">save</span><span style="color: #080;">&#40;</span>translation, <span style="color: #0000FF; font-weight: bold;">file</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;translation.bin&quot;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>

<p>You can consult the whole code on <a href="https://github.com/chrislad/multilingualShinyApp">the github repository</a> and run it directly from <TT>R</TT> using:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;">shiny<span style="color: #080;">::</span><span style="">runGitHub</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;multilingualShinyApp&quot;</span>,<span style="color: #ff0000;">&quot;chrislad&quot;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>

]]></content:encoded>
			<wfw:commentRss>http://chrisladroue.com/2014/11/another-take-on-building-a-multi-lingual-shiny-app/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How to pick up 3 numbers from a uniform distribution in a transparent manner?</title>
		<link>http://chrisladroue.com/2014/07/how-to-pick-up-3-numbers-from-a-uniform-distribution-in-a-transparent-manner/</link>
		<comments>http://chrisladroue.com/2014/07/how-to-pick-up-3-numbers-from-a-uniform-distribution-in-a-transparent-manner/#comments</comments>
		<pubDate>Mon, 07 Jul 2014 13:38:38 +0000</pubDate>
		<dc:creator><![CDATA[CL]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[lexicographic order]]></category>
		<category><![CDATA[puzzle]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[selection]]></category>
		<category><![CDATA[strategy]]></category>

		<guid isPermaLink="false">http://chrisladroue.com/?p=1089</guid>
		<description><![CDATA[Over in my previous post, I&#8217;m giving away 3 copies of my video course on ggplot2 and shiny. To win a copy, you just need to leave a comment and I will select 3 winners among the n participants at &#8230; <a href="http://chrisladroue.com/2014/07/how-to-pick-up-3-numbers-from-a-uniform-distribution-in-a-transparent-manner/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Over in my <a href="http://chrisladroue.com/2014/07/win-a-free-copy-of-a-new-video-course-on-ggplot2-and-shiny/">previous post</a>, I&#8217;m giving away 3 copies of my video course on <strong>ggplot2</strong> and <strong>shiny</strong>. To win a copy, you just need to leave a comment and I will select 3 winners among the <em>n</em> participants at random after a deadline.</p>
<p>But how do I pick 3 winners such that:
<ul>
<li>all players are equally likely to win.</li>
<li>no-one can contest the fairness of the selection.</li>
</ul>
<p>The first thing that comes to mind is to run <code>sample(n,3, replace = FALSE)</code> and report the winners. But how do you know I actually ran this code? I could have decided on the winners well in advance and just pretended to run the code.</p>
<p>A way to approach this issue could be to set the random seed to some value so that anyone suspecting foul play can run the code themselves and get the same answer as me:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span>someSeed<span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>n, <span style="color: #ff0000;">3</span>, <span style="color: #0000FF; font-weight: bold;">replace</span><span style="color: #080;">=</span>FALSE<span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>

<p>I see at least two problems with it: 1) I could still have selected a seed that gives me the sample I eventually want, and 2) even using a function (<em>e.g.</em> of <em>n</em> the number of participants) as the seed doesn&#8217;t guarantee a uniform distribution for each player.</p>
<p>I came up with a plan which I think addresses both the uniform distribution over the players, and the incontestability of the selection. </p>
<p>First, I simplify the problem of selecting 3 winners among <em>n</em> participants to selecting 1 integer from a uniform distribution. This is easy: instead of choosing 3 items among <em>n</em>, I&#8217;m selecting 1 of the <code>choose(n,3) </code>possible combinations. Once I&#8217;ve sampled 1 number <em>i</em>, I simply use <code>combn(n,3)</code> to generate all the combinations and pick the <em>i</em>th item:<br />
<code>combn(n,3, simplify=FALSE)[[i]]</code>.</p>
<p>Second, I have a way to pick a number from a uniform distribution that&#8217;s completely transparent. To do this, I pick up a number uniformly at random in a much bigger set (1:N, with <code>N>>choose(n,3)</code>) and project this number back to the interval I&#8217;m interested in (<code>1:choose(n,3)</code>). That is, once I have a number <em>j</em> between 1 and N, I use</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;">i <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">ceiling</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">choose</span><span style="color: #080;">&#40;</span>n,<span style="color: #ff0000;">3</span><span style="color: #080;">&#41;</span> <span style="color: #080;">*</span> j <span style="color: #080;">/</span>N<span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>

<p>to find a random value uniformly distributed from 1 to <code>choose(n,3)</code></p>
<p>Ideally you&#8217;d want N to be a multiple of choose(n,3) for every outcome to be exactly equally likely, but if N is much larger than choose(n,3), the slight difference in probability for each outcome is negligible (of the order of <code>1/N</code>).</p>
<p>Now, how do I pick up a number in a bigger set in a transparent manner? We saw that using the seed is fraught with difficulty, so I need something else. I&#8217;m going to use something which neither I nor the players have any control over: the UK national lottery, which is a combination of 7 integers from the set {1,&#8230;,49}. More precisely, I&#8217;m doing this:</p>
<ul>
<li>declare in advance which future lottery draw I&#8217;m using.</li>
<li>use the<em> lexicographic index</em> of the combination drawn as my number <em>j</em>; <em>j</em> comes from a uniform distribution between 1 and <code>N=choose(49,7)</code>, which is a relatively big number.</li>
</ul>
<p>And this is it: this way, I pick up 3 winners among <em>n</em> and there is no way for me (or the players) to rig the selection. Here is the code I&#8217;ll run:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;">lotteryResult <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span> , , , , , , <span style="color: #080;">&#41;</span> <span style="color: #228B22;"># to be filled in by the actual lottery draw</span>
nPlayers <span style="color: #080;">&lt;-</span> <span style="color: #ff0000;">200</span> <span style="color: #228B22;"># to be updated with the number of participants</span>
nWinners <span style="color: #080;">&lt;-</span> <span style="color: #ff0000;">3</span>
index <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">ceiling</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">choose</span><span style="color: #080;">&#40;</span>nPlayers, nWinners<span style="color: #080;">&#41;</span> <span style="color: #080;">*</span> lexicographicIndex<span style="color: #080;">&#40;</span>lotteryResult, <span style="color: #ff0000;">49</span><span style="color: #080;">&#41;</span> <span style="color: #080;">/</span> <span style="color: #0000FF; font-weight: bold;">choose</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">49</span>,<span style="color: #ff0000;">7</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
winners <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">combn</span><span style="color: #080;">&#40;</span>nPlayers,nWinners,simplify <span style="color: #080;">=</span> FALSE<span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>index<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
<span style="color: #0000FF; font-weight: bold;">cat</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">\n</span> The winners are:&quot;</span>, winners<span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>

<p>The deadline for the competition is Wednesday 09th July at midnight UK time. The next lottery (UK lotto) draw after that is on Saturday 12th July, and I&#8217;ll use that draw to decide on the winners, using the code I&#8217;ve presented here.</p>
<p>What do you think? Can you poke holes in my strategy? Can you come up with something simpler?</p>
<p><strong>Note about the lexicographic index</strong><br />
It is not terribly difficult to find the index of a combination without generating them all. All you need to do is to count the number of combinations that appeared before. For example, if the combination starts with 3, you know it comes after all the combinations that start with 1 and 2. Here is the code I wrote to go from a combination to its lexicographic index. There&#8217;s also a test function after it.</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;"># gives the index of the combination if sorted in lexicographic order, starting at 1</span>
<span style="color: #228B22;"># lexicographicIndex(c(2,3,4), 5) = 7</span>
<span style="color: #228B22;"># because [2,3,4] is the 7th item in the lexicographically ordered combinations of length 3 using 5 letters: </span>
<span style="color: #228B22;"># 1 2 3</span>
<span style="color: #228B22;"># 1 2 4</span>
<span style="color: #228B22;"># 1 2 5</span>
<span style="color: #228B22;"># 1 3 4</span>
<span style="color: #228B22;"># 1 3 5</span>
<span style="color: #228B22;"># 1 4 5</span>
<span style="color: #228B22;"># 2 3 4</span>
<span style="color: #228B22;"># 2 3 5</span>
<span style="color: #228B22;"># 2 4 5</span>
<span style="color: #228B22;"># 3 4 5</span>
<span style="color: #228B22;"># C. Ladroue</span>
<span style="color: #228B22;"># combination is a sequence of unique integers between 1 and alphabetSize</span>
&nbsp;
lexicographicIndex <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>combination, alphabetSize<span style="color: #080;">&#41;</span><span style="color: #080;">&#123;</span>
  combination <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">sort</span><span style="color: #080;">&#40;</span>combination<span style="color: #080;">&#41;</span>
&nbsp;
  combinationLength <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>combination<span style="color: #080;">&#41;</span>
&nbsp;
  index <span style="color: #080;">&lt;-</span> <span style="color: #ff0000;">1</span> 
  <span style="color: #0000FF; font-weight: bold;">for</span><span style="color: #080;">&#40;</span>p <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span>combinationLength<span style="color: #080;">&#41;</span><span style="color: #080;">&#123;</span>
&nbsp;
    starting  <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">ifelse</span><span style="color: #080;">&#40;</span>p <span style="color: #080;">==</span> <span style="color: #ff0000;">1</span>, <span style="color: #ff0000;">1</span> , combination<span style="color: #080;">&#91;</span>p<span style="color: #080;">-</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> <span style="color: #080;">+</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>
    finishing <span style="color: #080;">&lt;-</span> combination<span style="color: #080;">&#91;</span>p<span style="color: #080;">&#93;</span> <span style="color: #080;">-</span> <span style="color: #ff0000;">1</span>
&nbsp;
    <span style="color: #0000FF; font-weight: bold;">if</span><span style="color: #080;">&#40;</span>starting <span style="color: #080;">&lt;=</span> finishing<span style="color: #080;">&#41;</span>
      index <span style="color: #080;">&lt;-</span> index <span style="color: #080;">+</span> <span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">sapply</span><span style="color: #080;">&#40;</span>starting<span style="color: #080;">:</span>finishing, <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>j<span style="color: #080;">&#41;</span> <span style="color: #0000FF; font-weight: bold;">choose</span><span style="color: #080;">&#40;</span>alphabetSize <span style="color: #080;">-</span> j, combinationLength <span style="color: #080;">-</span> p<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  <span style="color: #080;">&#125;</span>
    index 
<span style="color: #080;">&#125;</span>
&nbsp;
&nbsp;
lexicographicIndexTest <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#123;</span>
  alphabetSize <span style="color: #080;">&lt;-</span> <span style="color: #ff0000;">10</span>
  combinationLength <span style="color: #080;">&lt;-</span> <span style="color: #ff0000;">3</span>
  x <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">combn</span><span style="color: #080;">&#40;</span>alphabetSize, combinationLength, simplify <span style="color: #080;">=</span> FALSE<span style="color: #080;">&#41;</span>
  <span style="color: #0000FF; font-weight: bold;">cat</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">\n</span> test all combinations with alphabet size = &quot;</span>,alphabetSize,<span style="color: #ff0000;">&quot;and combination length = &quot;</span>,combinationLength,<span style="color: #ff0000;">&quot;: &quot;</span>,
      <span style="color: #0000FF; font-weight: bold;">all</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">sapply</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>index<span style="color: #080;">&#41;</span> lexicographicIndex<span style="color: #080;">&#40;</span>x<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>index<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>, alphabetSize<span style="color: #080;">&#41;</span> <span style="color: #080;">==</span> index <span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
      <span style="color: #080;">&#41;</span>
  <span style="color: #0000FF; font-weight: bold;">cat</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span></pre></td></tr></table></div></div>

]]></content:encoded>
			<wfw:commentRss>http://chrisladroue.com/2014/07/how-to-pick-up-3-numbers-from-a-uniform-distribution-in-a-transparent-manner/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Win a free copy of a new video course on ggplot2 and Shiny!</title>
		<link>http://chrisladroue.com/2014/07/win-a-free-copy-of-a-new-video-course-on-ggplot2-and-shiny/</link>
		<comments>http://chrisladroue.com/2014/07/win-a-free-copy-of-a-new-video-course-on-ggplot2-and-shiny/#comments</comments>
		<pubDate>Tue, 01 Jul 2014 20:03:58 +0000</pubDate>
		<dc:creator><![CDATA[CL]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[contest]]></category>
		<category><![CDATA[course]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[shiny]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://chrisladroue.com/?p=1073</guid>
		<description><![CDATA[Noticed all these posts on r-bloggers about ggplot2 and shiny? Do you want in? My course &#8220;Building Interactive Graphs with ggplot2 and Shiny&#8221; (published by Packt Publishing) covers those 2 packages in a series of 40 videos, each one dedicated &#8230; <a href="http://chrisladroue.com/2014/07/win-a-free-copy-of-a-new-video-course-on-ggplot2-and-shiny/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Noticed all these posts on r-bloggers about <strong>ggplot2</strong> and <strong>shiny</strong>? Do you want in? My course &#8220;<em>Building Interactive Graphs with ggplot2 and Shiny</em>&#8221; (published by Packt Publishing) covers those 2 packages in a series of 40 videos, each one dedicated to one concept at the time. After this course, you&#8217;ll be able to make fancy customised plots and publish them on the internet on some cool interactive webpages to be shared with the world!</p>
<p>See the content of the course <a href="http://bit.ly/1kEqYFZ">here</a>, and my <a href="http://chrisladroue.com/2014/06/a-two-hour-online-course-on-ggplot2-and-shiny/">previous post</a> about it for an introduction and some reviews.</p>
<p><a href="http://bit.ly/1kEqYFZ"><img alt="" src="http://dgdsbygo8mp3h.cloudfront.net/sites/default/files/imagecache/videoproduct_large/bookretailers/4337OS_Video.jpg" title="ggplot2 Shiny course" class="aligncenter" width="372" height="214" /></a></p>
<p>Now, I am pleased to announce that I have teamed up with Packt Publishing to organise a <strong>giveaway</strong>! Three (3) of you beautiful readers are going to get this course for free!</p>
<p>Here&#8217;s how it works:</p>
<ul>
<li>Simply leave a comment below (<em>e.g.</em> &#8220;I want it!&#8221;, &#8220;You rock!&#8221; or whatever takes your fancy) together with your email address*.</li>
<li>After the deadline (Wed. 09th July at midnight, UK time), I&#8217;ll pick 3 winners at random.</li>
</ul>
<p>Winners will be contacted by email, so be sure to use your real email address when you comment!</p>
<p>(By the way here&#8217;s a question for you to ponder over: how do you pick three numbers between <i>1</i> and <i>n</i> at random, transparently so that no-one can contest the fairness of the selection?)</p>
<p>Good luck!</p>
<p>* Use the field &#8220;email&#8221; in the form; the address won&#8217;t show in the comment.</p>
<p>EDIT: Don&#8217;t worry if I don&#8217;t approve your comment straight away. I&#8217;ll get to it as soon as I can. Thanks.</p>
<p><strong>UPDATE:</strong> The comments are now closed. Thanks for the great response! I&#8217;ll announce the 3 winners early next week, after selecting them with the strategy explained <a href="http://chrisladroue.com/2014/07/how-to-pick-up-3-numbers-from-a-uniform-distribution-in-a-transparent-manner/">here</a>.</p>
<p><strong>SECOND UPDATE: We have the winners!</strong></p>
<p>I have followed the protocol I designed <a href="http://chrisladroue.com/2014/07/how-to-pick-up-3-numbers-from-a-uniform-distribution-in-a-transparent-manner/">here</a> and selected 3 lucky winners. I got the list of unique participants sorted by their posting date directly from the blog&#8217;s database and ran the following code:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">players <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">read.<span style="">csv2</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;wp_comments.csv&quot;</span>, sep<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;,&quot;</span>, header<span style="color: #080;">=</span>FALSE, col.<span style="">names</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;ID&quot;</span>,<span style="color: #ff0000;">&quot;PlayerName&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
lotteryResult <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">3</span>,<span style="color: #ff0000;">19</span>,<span style="color: #ff0000;">23</span>,<span style="color: #ff0000;">28</span>,<span style="color: #ff0000;">37</span>,<span style="color: #ff0000;">43</span>,<span style="color: #ff0000;">34</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># UK national lottery resultst on the 12th July 2014 https://www.national-lottery.co.uk/player/lotto/results/results.ftl</span>
nPlayers <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span>players<span style="color: #080;">&#41;</span>
nWinners <span style="color: #080;">&lt;-</span> <span style="color: #ff0000;">3</span>
&nbsp;
allCombinations <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">combn</span><span style="color: #080;">&#40;</span>nPlayers,nWinners,simplify <span style="color: #080;">=</span> FALSE<span style="color: #080;">&#41;</span> 
index <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">ceiling</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">choose</span><span style="color: #080;">&#40;</span>nPlayers, nWinners<span style="color: #080;">&#41;</span> <span style="color: #080;">*</span> lexicographicIndex<span style="color: #080;">&#40;</span>lotteryResult, <span style="color: #ff0000;">49</span><span style="color: #080;">&#41;</span> <span style="color: #080;">/</span> <span style="color: #0000FF; font-weight: bold;">choose</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">49</span>,<span style="color: #ff0000;">7</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
winners <span style="color: #080;">&lt;-</span> allCombinations<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>index<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span>
<span style="color: #0000FF; font-weight: bold;">cat</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">\n</span> The winners are:&quot;</span>, winners<span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">cat</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;<span style="color: #000099; font-weight: bold;">\n</span> that is:&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>players<span style="color: #080;">&#91;</span>winners,<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span></pre></div></div></div>

<p>So the winners are:
<ul>
<li><a href='http://chrisladroue.com/2014/07/win-a-free-copy-of-a-new-video-course-on-ggplot2-and-shiny/#comment-4130'>Eduardo García (ID: 4130)</a></li>
<li><a href='http://chrisladroue.com/2014/07/win-a-free-copy-of-a-new-video-course-on-ggplot2-and-shiny/#comment-4219'>David (ID: 4219)</a></li>
<li><a href='http://chrisladroue.com/2014/07/win-a-free-copy-of-a-new-video-course-on-ggplot2-and-shiny/#comment-4276'>Jeff (ID: 4276)</a></li>
</ul>
<p>Eduardo, David and Jeff: packt will contact you soon about this.</p>
<p>Thanks again everyone for playing, it was great to see such a great response to the competition!</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisladroue.com/2014/07/win-a-free-copy-of-a-new-video-course-on-ggplot2-and-shiny/feed/</wfw:commentRss>
		<slash:comments>173</slash:comments>
		</item>
		<item>
		<title>A two-hour online course on ggplot2 and Shiny</title>
		<link>http://chrisladroue.com/2014/06/a-two-hour-online-course-on-ggplot2-and-shiny/</link>
		<comments>http://chrisladroue.com/2014/06/a-two-hour-online-course-on-ggplot2-and-shiny/#comments</comments>
		<pubDate>Wed, 11 Jun 2014 08:06:13 +0000</pubDate>
		<dc:creator><![CDATA[CL]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[packt]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[shiny]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://chrisladroue.com/?p=1040</guid>
		<description><![CDATA[I&#8217;ve just published a video course with Packt Publishing about ggplot2 and Shiny! In just two hours, you&#8217;ll get to learn the popular R packages ggplot2 and Shiny, as well as how to put them together to build interactive webpages. &#8230; <a href="http://chrisladroue.com/2014/06/a-two-hour-online-course-on-ggplot2-and-shiny/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p><a href="http://bit.ly/1kEqYFZ"><img alt="" src="http://dgdsbygo8mp3h.cloudfront.net/sites/default/files/imagecache/videoproduct_large/bookretailers/4337OS_Video.jpg" title="ggplot2 Shiny course" class="alignleft" width="372" height="214" /></a><br />
I&#8217;ve just published <a href="http://bit.ly/1kEqYFZ">a video course</a> with Packt Publishing about <strong>ggplot2</strong> and <strong>Shiny</strong>!<br />
In just two hours, you&#8217;ll get to learn the popular R packages <strong>ggplot2</strong> and <strong>Shiny</strong>, as well as how to put them together to build interactive webpages. And all that from R.</p>
<p>The course consists of short videos (around 2 or 3 minutes) that explain one concept at the time. Each video comes with the relevant code, and pointers to go further in your own time.</p>
<p>It&#8217;s divided in 8 chapters:
<ol>
<li>Getting Started with ggplot2 [15:00 minutes] </li>
<li>Understanding Basic Plots [11:33 minutes]</li>
<li>Using Conditional Plots [09:32 minutes]</li>
<li>Using Statistics in Our Plot [09:49 minutes]</li>
<li>Customizing Your Graphs [11:18 minutes]</li>
<li>Shiny – Part 1 [14:39 minutes]</li>
<li>Shiny – Part 2 [12:25 minutes]</li>
<li>Putting Everything Together [12:17 minutes]</li>
</ol>
<p>In the last chapter, we build a multi-page dashboard, with adaptive controls, showing some ggplot2 graphs produced on the fly.</p>
<p>Here is a sample video &#8220;Customizing the Color Palette for Continuous Variables&#8221; with ggplot2:<br />
<iframe width="640" height="360" src="https://www.youtube.com/embed/D0dGwXnhJvM?feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<p>If you&#8217;ve been meaning to learn ggplot2 (for making nice looking graphs) or Shiny (for building interactive websites using R alone), but never got round to it, this is the course for you!</p>
<p>You can buy it from <a href="http://bit.ly/1kEqYFZ">here</a>. Download the scripts by clicking &#8216;<code>Support</code>&#8216; on <a href="http://bit.ly/1kEqYFZ">this page</a>.</p>
<p>PS: Arthur Zubarev, from compudicted, wrote a review of the whole course here: <a href="https://compudicted.wordpress.com/2014/05/11/building-interactive-graphs-with-ggplot2-and-shiny-r-by-christophe-ladroue-packt-publishing-video-review/">https://compudicted.wordpress.com/2014/05/11/building-interactive-graphs-with-ggplot2-and-shiny-r-by-christophe-ladroue-packt-publishing-video-review/</a>.</p>
<p>PPS: More reviews:
<ol>
<li>By  Mark van der Loo: <a href="http://www.r-bloggers.com/review-of-building-interactive-graphs-with-ggplot2-and-shiny/">Review of “Building interactive graphs with ggplot2 and shiny”</a></li>
<li>By MilanoR: <a href="http://www.milanor.net/blog/?p=1436">Building Interactive Graphs with ggplot2 and Shiny</a></li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://chrisladroue.com/2014/06/a-two-hour-online-course-on-ggplot2-and-shiny/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>A very quick introduction to ggplot2</title>
		<link>http://chrisladroue.com/2012/10/a-very-quick-introduction-to-ggplot2/</link>
		<comments>http://chrisladroue.com/2012/10/a-very-quick-introduction-to-ggplot2/#comments</comments>
		<pubDate>Sun, 14 Oct 2012 15:39:08 +0000</pubDate>
		<dc:creator><![CDATA[CL]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[birminghamR]]></category>
		<category><![CDATA[deck.js]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[slides]]></category>

		<guid isPermaLink="false">http://chrisladroue.com/?p=1004</guid>
		<description><![CDATA[I gave a very brief 10mn introduction to ggplot2 at the Birmingham R user group meeting on Monday. The aim was to give a headstart to R users who&#8217;ve heard of ggplot2 but never got around to trying it. I &#8230; <a href="http://chrisladroue.com/2012/10/a-very-quick-introduction-to-ggplot2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p><a href="http://chrisladroue.com/wp-content/uploads/2012/10/ggplot2Introduction.png"><img src="http://chrisladroue.com/wp-content/uploads/2012/10/ggplot2Introduction.png" alt="" title="ggplot2Introduction" width="250" height="194" class="alignleft size-full wp-image-1007" /></a>I gave a very brief 10mn introduction to ggplot2 at the <a href="http://BirminghamR.org">Birmingham R user group</a> meeting on Monday. The aim was to give a headstart to R users who&#8217;ve heard of ggplot2 but never got around to trying it.</p>
<p>I made the talk with <a href="http://imakewebthings.com/deck.js/">deck.js</a>, with a couple of css customisations. You can browse the slides <a href="http://chrisladroue.com/extra/ggplot2Intro/introductionGGplot2/">here</a>. Right and left arrows to navigate and &#8216;<em>m</em>&#8216; to have an overview. Most code snippets are clickable and will show the resulting plot. Click on the plot to make it disappear.</p>
<p>Download all the files in one go with this <a href='http://chrisladroue.com/wp-content/uploads/2012/10/ggplot2Intro.zip'>zip file</a> if you&#8217;re interested. Unzip and open <code>ggplot2Intro/introductionGGplot2/index.html</code> with a modern browser.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisladroue.com/2012/10/a-very-quick-introduction-to-ggplot2/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>An exercise in R using local open data</title>
		<link>http://chrisladroue.com/2012/06/an-exercise-in-r-using-local-open-data/</link>
		<comments>http://chrisladroue.com/2012/06/an-exercise-in-r-using-local-open-data/#comments</comments>
		<pubDate>Sun, 17 Jun 2012 08:55:27 +0000</pubDate>
		<dc:creator><![CDATA[CL]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[choropleth]]></category>
		<category><![CDATA[grep]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[map]]></category>
		<category><![CDATA[open data]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[spreadsheet]]></category>
		<category><![CDATA[uk]]></category>

		<guid isPermaLink="false">http://chrisladroue.com/?p=942</guid>
		<description><![CDATA[Last week I went to the &#8220;Government Open Data Hack Day&#8221; ([@,#]godhd on twitter) in Birmingham (UK), organised by Gavin Broughton and James Catell. The idea was to get hold of local open data and try and make use of &#8230; <a href="http://chrisladroue.com/2012/06/an-exercise-in-r-using-local-open-data/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p><a href="http://chrisladroue.com/wp-content/uploads/2012/06/westmidlandsShadow.png"><img src="http://chrisladroue.com/wp-content/uploads/2012/06/westmidlandsShadow.png" alt="" title="West Midlands" width="332" height="224" class="alignleft size-full wp-image-960" /></a>Last week I went to the &#8220;<a href="http://en.wikipedia.org/wiki/User:Jacattell/sandbox">Government Open Data Hack Day</a>&#8221; (<a href="https://twitter.com/#!/GODHackDay">[@,#]godhd</a> on twitter) in Birmingham (UK), organised by <a href="http://twitter.com/ukgav">Gavin Broughton</a> and <a href="http://twitter.com/jacattell">James Catell</a>. The idea was to get hold of local open data and try and make use of them in just one day. You can see some of the work done on that day presented <a href="http://www.youtube.com/results?search_type=videos&#038;search_query=%23GODHD&#038;uni=3">here</a>. It was good fun and I&#8217;ve learned a few tricks and resources in the process, which I&#8217;m going to go through in this post. I&#8217;ll refrain from any data analysis because I know next to nothing about this type of data. Rather, I&#8217;m going to explain how to go from the raw format (in this case an Excel sheet) to something useful and exploitable.</p>
<p><small>(all files <a href='http://chrisladroue.com/wp-content/uploads/2012/06/godhackday.zip'>here</a>)</small></p>
<p>The data I was given come from <a href="https://www.nomisweb.co.uk/">nomis</a> and consist of job vacancies in West Midlands for the years 2011 and 2012, broken down by job types. The spreadsheet lists 353 job types for 59 constituencies, one after the other:</p>
<p><a href="http://chrisladroue.com/wp-content/uploads/2012/06/XL.png"><img src="http://chrisladroue.com/wp-content/uploads/2012/06/XL-300x145.png" alt="" title="XL" width="300" height="145" class="aligncenter size-medium wp-image-963" /></a></p>
<p><strong>From the spread sheet to an R data frame</strong><br />
The first thing to do is to turn this into an R data frame for easier manipulation and reshaping. Luckily, each dataset follows the exact same pattern and it&#8217;s easy to extract the name of each constituency and each job type in two files with a simple <TT>grep</TT>, and combine both files into a data frame from R. The lines starting with &#8220;area name&#8221; contain the name of the constituency, those starting with 4 digits contains the job type and the numbers we want. (nomis_jobs_wm_2011_2012.csv is the tab-separated version of the spreadsheet)</p>
<p>In a terminal:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">grep</span> <span style="color: #ff0000;">&quot;^area name&quot;</span> nomis_jobs_wm_2011_2012.csv <span style="color: #000000; font-weight: bold;">&gt;</span> areaname.csv
<span style="color: #c20cb9; font-weight: bold;">grep</span> <span style="color: #ff0000;">&quot;^[0-9]\{4\}&quot;</span> nomis_jobs_wm_2011_2012.csv <span style="color: #000000; font-weight: bold;">&gt;</span> jobtypes.csv</pre></td></tr></table></div></div>

<p>In R:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;"># reading data</span>
areas<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">read.<span style="">table</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;arename.csv&quot;</span>,header<span style="color: #080;">=</span>FALSE,sep<span style="color: #080;">=</span><span style="color: #ff0000;">'<span style="color: #000099; font-weight: bold;">\t</span>'</span><span style="color: #080;">&#41;</span>
jobs<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">read.<span style="">table</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;jobtypes.csv&quot;</span>,header<span style="color: #080;">=</span>FALSE,sep<span style="color: #080;">=</span><span style="color: #ff0000;">'<span style="color: #000099; font-weight: bold;">\t</span>'</span><span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;"># combine them</span>
areas<span style="color: #080;">&lt;-</span>areas$V2
jobs$region<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">rep</span><span style="color: #080;">&#40;</span>areas,each<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span>jobs<span style="color: #080;">&#41;</span><span style="color: #080;">/</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>areas<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>jobs<span style="color: #080;">&#41;</span><span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;JobType&quot;</span>,<span style="color: #ff0000;">&quot;Vacancies2011&quot;</span>,<span style="color: #ff0000;">&quot;Vacancies2012&quot;</span>,<span style="color: #ff0000;">&quot;change&quot;</span>,<span style="color: #ff0000;">&quot;changePercent&quot;</span>,<span style="color: #ff0000;">&quot;region&quot;</span><span style="color: #080;">&#41;</span>
jobs<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">subset</span><span style="color: #080;">&#40;</span>jobs,select<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;JobType&quot;</span>,<span style="color: #ff0000;">&quot;Vacancies2011&quot;</span>,<span style="color: #ff0000;">&quot;Vacancies2012&quot;</span>,<span style="color: #ff0000;">&quot;region&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;"># A subtlety here: Excel formatted the number with a comma e.g. 1,234 for 1234. </span>
<span style="color: #228B22;"># So the comma has to be removed for the cast to work properly</span>
jobs$Vacancies2011<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">gsub</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;,&quot;</span>,<span style="color: #ff0000;">&quot;&quot;</span>,jobs$Vacancies2011<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
jobs$Vacancies2012<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">gsub</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;,&quot;</span>,<span style="color: #ff0000;">&quot;&quot;</span>,jobs$Vacancies2012<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>


<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"> <span style="color: #c20cb9; font-weight: bold;">head</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">jobs</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
                                                       JobType Vacancies2011 Vacancies2012              region
<span style="color: #000000;">1</span>               <span style="color: #000000;">1111</span> : Senior officials <span style="color: #000000; font-weight: bold;">in</span> national government             <span style="color: #000000;">0</span>             <span style="color: #000000;">0</span> Aldridge-Brownhills
<span style="color: #000000;">2</span> <span style="color: #000000;">1112</span> : Directors and chief executives of major organisations             <span style="color: #000000;">0</span>             <span style="color: #000000;">0</span> Aldridge-Brownhills
<span style="color: #000000;">3</span>                  <span style="color: #000000;">1113</span> : Senior officials <span style="color: #000000; font-weight: bold;">in</span> <span style="color: #7a0874; font-weight: bold;">local</span> government             <span style="color: #000000;">0</span>             <span style="color: #000000;">0</span> Aldridge-Brownhills
<span style="color: #000000;">4</span>    <span style="color: #000000;">1114</span> : Senior officials of special interest organisations             <span style="color: #000000;">0</span>             <span style="color: #000000;">0</span> Aldridge-Brownhills
<span style="color: #000000;">5</span>            <span style="color: #000000;">1121</span> : Production, works and maintenance managers            <span style="color: #000000;">10</span>             <span style="color: #000000;">4</span> Aldridge-Brownhills
<span style="color: #000000;">6</span>                              <span style="color: #000000;">1122</span> : Managers <span style="color: #000000; font-weight: bold;">in</span> construction            <span style="color: #000000;">47</span>             <span style="color: #000000;">9</span> Aldridge-Brownhills</pre></div></div></div>

<p>Now that we have the data in one single data frame, it&#8217;s much easier to do something with it.</p>
<p><strong>Aggregating the job types</strong><br />
There are 353 job types in total, which is too fine of a granularity for us. It turns out that the 4 digit numbers that were so useful for parsing the data are from the SOC (Standard Occupational Classification) code and follow a hierarchical pattern, with 1xxx meaning &#8220;Managers and Senior Officials&#8221;, 2xxx &#8220;Professional Occupations&#8221; etc.. Somewhere on the internet (I can&#8217;t remember where) I tracked down an exploitable list (as in, not a b. pdf!) of those SOC numbers, which I promptly turned into a tab-separated file.</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;">soc<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">read.<span style="">table</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;soc.csv&quot;</span>,sep<span style="color: #080;">=</span><span style="color: #ff0000;">'<span style="color: #000099; font-weight: bold;">\t</span>'</span>,<span style="color: #0000FF; font-weight: bold;">head</span><span style="color: #080;">=</span>TRUE,stringsAsFactor<span style="color: #080;">=</span>FALSE<span style="color: #080;">&#41;</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">head</span><span style="color: #080;">&#40;</span>soc<span style="color: #080;">&#41;</span>
  Major.<span style="">Group</span> Sub.<span style="">Major</span>.<span style="">Group</span> Minor.<span style="">Group</span> Unit...<span style="">Group</span>                              Group.<span style="">Title</span>
<span style="color: #ff0000;">1</span>           <span style="color: #ff0000;">1</span>              NA          NA           NA MANAGERS, DIRECTORS AND SENIOR OFFICIALS
<span style="color: #ff0000;">2</span>          NA              <span style="color: #ff0000;">11</span>          NA           NA         CORPORATE MANAGERS AND DIRECTORS
<span style="color: #ff0000;">3</span>          NA              NA         <span style="color: #ff0000;">111</span>           NA    Chief Executives and Senior Officials
<span style="color: #ff0000;">4</span>          NA              NA          NA         <span style="color: #ff0000;">1115</span>    Chief executives and senior officials
<span style="color: #ff0000;">5</span>          NA              NA          NA         <span style="color: #ff0000;">1116</span>     Elected officers and representatives
<span style="color: #ff0000;">6</span>          NA              NA         <span style="color: #ff0000;">112</span>           NA        Production Managers and Directors</pre></td></tr></table></div></div>

<p>Now that we have the description of each level in the SOC code, we can aggregate the 353 jobs into, for example, the 9 job types of level 1 (&#8216;Major Group&#8217;).</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;"># select the job types in the major group</span>
level1<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">subset</span><span style="color: #080;">&#40;</span>soc,<span style="color: #080;">!</span><span style="color: #0000FF; font-weight: bold;">is.<span style="">na</span></span><span style="color: #080;">&#40;</span>Major.<span style="">Group</span><span style="color: #080;">&#41;</span>,select<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Major.Group&quot;</span>,<span style="color: #ff0000;">&quot;Group.Title&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #228B22;"># build a look-up table to go from a digit to a job type</span>
lookup<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span>level1<span style="color: #080;">&#41;</span>
lookup<span style="color: #080;">&#91;</span>level1$Major.<span style="">Group</span><span style="color: #080;">&#93;</span><span style="color: #080;">&lt;-</span>level1$Group.<span style="">Title</span>
lookup<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">factor</span><span style="color: #080;">&#40;</span>lookup,<span style="color: #0000FF; font-weight: bold;">levels</span><span style="color: #080;">=</span>lookup<span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>


<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> lookup
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> MANAGERS, DIRECTORS AND SENIOR OFFICIALS         PROFESSIONAL OCCUPATIONS                        
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span> ASSOCIATE PROFESSIONAL AND TECHNICAL OCCUPATIONS ADMINISTRATIVE AND SECRETARIAL OCCUPATIONS      
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">5</span><span style="color: #080;">&#93;</span> SKILLED TRADES OCCUPATIONS                       CARING, LEISURE AND OTHER SERVICE OCCUPATIONS   
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">7</span><span style="color: #080;">&#93;</span> SALES AND CUSTOMER SERVICE OCCUPATIONS           PROCESS, PLANT AND MACHINE OPERATIVES           
<span style="color: #080;">&#91;</span><span style="color: #ff0000;">9</span><span style="color: #080;">&#93;</span> ELEMENTARY OCCUPATIONS                          
<span style="color: #ff0000;">9</span> Levels<span style="color: #080;">:</span> MANAGERS, DIRECTORS AND SENIOR OFFICIALS PROFESSIONAL OCCUPATIONS ... <span style="">ELEMENTARY</span> OCCUPATIONS</pre></div></div></div>


<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;"># add a column 'level1' to jobs which contains one of the 9 possible job titles</span>
jobs$level1<span style="color: #080;">&lt;-</span>lookup<span style="color: #080;">&#91;</span>
  <span style="color: #0000FF; font-weight: bold;">as.<span style="">numeric</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">sapply</span><span style="color: #080;">&#40;</span>jobs$JobType,<span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>s<span style="color: #080;">&#41;</span> <span style="color: #0000FF; font-weight: bold;">substr</span><span style="color: #080;">&#40;</span>s,<span style="color: #ff0000;">1</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>,USE.<span style="">NAMES</span><span style="color: #080;">=</span>FALSE<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span>
<span style="color: #228B22;"># Build a new data frame byLevel1, the aggregated data</span>
byLevel1<span style="color: #080;">&lt;-</span>ddply<span style="color: #080;">&#40;</span>jobs,.<span style="color: #080;">&#40;</span>region,level1<span style="color: #080;">&#41;</span>,summarise,Vacancies2011<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>Vacancies2011<span style="color: #080;">&#41;</span>,Vacancies2012<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>Vacancies2012<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span></pre></div></div></div>


<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">head</span><span style="color: #080;">&#40;</span>byLevel1<span style="color: #080;">&#41;</span>
               region                                           level1 Vacancies2011 Vacancies2012
<span style="color: #ff0000;">1</span> Aldridge<span style="color: #080;">-</span>Brownhills         MANAGERS, DIRECTORS AND SENIOR OFFICIALS           <span style="color: #ff0000;">173</span>           <span style="color: #ff0000;">134</span>
<span style="color: #ff0000;">2</span> Aldridge<span style="color: #080;">-</span>Brownhills                         PROFESSIONAL OCCUPATIONS            <span style="color: #ff0000;">97</span>           <span style="color: #ff0000;">100</span>
<span style="color: #ff0000;">3</span> Aldridge<span style="color: #080;">-</span>Brownhills ASSOCIATE PROFESSIONAL AND TECHNICAL OCCUPATIONS           <span style="color: #ff0000;">548</span>           <span style="color: #ff0000;">190</span>
<span style="color: #ff0000;">4</span> Aldridge<span style="color: #080;">-</span>Brownhills       ADMINISTRATIVE AND SECRETARIAL OCCUPATIONS           <span style="color: #ff0000;">288</span>           <span style="color: #ff0000;">202</span>
<span style="color: #ff0000;">5</span> Aldridge<span style="color: #080;">-</span>Brownhills                       SKILLED TRADES OCCUPATIONS           <span style="color: #ff0000;">693</span>          <span style="color: #ff0000;">1470</span>
<span style="color: #ff0000;">6</span> Aldridge<span style="color: #080;">-</span>Brownhills    CARING, LEISURE AND OTHER SERVICE OCCUPATIONS           <span style="color: #ff0000;">477</span>           <span style="color: #ff0000;">566</span></pre></div></div></div>

<p>We now have a smaller data frame, with 59&#215;9=531 (constituencies x job types) rows. An obvious graph to do is looking at the distribution of vacancies in each constituency:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span>ggplot2<span style="color: #080;">&#41;</span>
<span style="color: #228B22;"># sort the constituencies backward, to have them listed alphabetically from top to bottom in the graph</span>
<span style="color: #0000FF; font-weight: bold;">levels</span><span style="color: #080;">&#40;</span>byLevel1$region<span style="color: #080;">&#41;</span><span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">rev</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">levels</span><span style="color: #080;">&#40;</span>byLevel1$region<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
p<span style="color: #080;">&lt;-</span>ggplot<span style="color: #080;">&#40;</span>byLevel1<span style="color: #080;">&#41;</span>
p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>geom_bar<span style="color: #080;">&#40;</span>aes<span style="color: #080;">&#40;</span>x<span style="color: #080;">=</span>region,Vacancies2011,fill<span style="color: #080;">=</span>level1<span style="color: #080;">&#41;</span>,position<span style="color: #080;">=</span><span style="color: #ff0000;">'fill'</span><span style="color: #080;">&#41;</span>
p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>coord_flip<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>scale_fill_brewer<span style="color: #080;">&#40;</span>type<span style="color: #080;">=</span><span style="color: #ff0000;">'qual'</span>,<span style="color: #0000FF; font-weight: bold;">palette</span><span style="color: #080;">=</span><span style="color: #ff0000;">'Set1'</span><span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>p<span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>

<p><a href="http://chrisladroue.com/wp-content/uploads/2012/06/JobTypeDistribution2011.png"><img src="http://chrisladroue.com/wp-content/uploads/2012/06/JobTypeDistribution2011-1024x500.png" alt="" title="Job Type Distribution for 2011" width="640" height="312" class="aligncenter size-large wp-image-966" /></a></p>
<p>This representation shows the relative proportion of job types within each constituency. It would be misleading to try and compare the number of vacancies from one constituency with another for example, since they might not represent the same population etc.. I don&#8217;t have this data so can&#8217;t normalise in a sensible manner.</p>
<p><strong>Maps!</strong><br />
Since we&#8217;re dealing with regional data, wouldn&#8217;t it be cool to plot that on a map? <TT>geom_map</TT> from <TT>ggplot2</TT> can help with that, but we first need to find the boundaries of all the 59 constituencies to get started. My office mate helpfully pointed me to <a href="http://mapit.mysociety.org/">mapit</a>, a great service from <a href="http://www.mysociety.org">mysociety.org</a>. If you know the id of an area, mapit can give you its boundaries in a JSON object, which you can easily turn into a data frame with the package <TT>rjson</TT>. Here&#8217;s how I did it:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span>rjson<span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span>plyr<span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;"># list of all the areas we need</span>
areas<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">read.<span style="">table</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;arename.csv&quot;</span>,header<span style="color: #080;">=</span>FALSE,sep<span style="color: #080;">=</span><span style="color: #ff0000;">'<span style="color: #000099; font-weight: bold;">\t</span>'</span><span style="color: #080;">&#41;</span>
areas<span style="color: #080;">&lt;-</span>areas$V2
areas<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">as.<span style="">character</span></span><span style="color: #080;">&#40;</span>areas<span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;"># All UK Parliament Constituencies</span>
WMC<span style="color: #080;">&lt;-</span>fromJSON<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">file</span><span style="color: #080;">=</span><span style="color: #ff0000;">'http://mapit.mysociety.org/areas/WMC'</span><span style="color: #080;">&#41;</span>
<span style="color: #228B22;"># Extract name and id</span>
constituencies<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">sapply</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>WMC<span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>id<span style="color: #080;">&#41;</span> WMC<span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span>id,<span style="color: #ff0000;">'name'</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>
<span style="color: #228B22;"># Select only those we need</span>
constituencies<span style="color: #080;">&lt;-</span>constituencies<span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">which</span><span style="color: #080;">&#40;</span>constituencies <span style="color: #080;">%</span>in<span style="color: #080;">%</span> areas<span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span>
<span style="color: #228B22;"># id and name</span>
areas<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>group<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">names</span><span style="color: #080;">&#40;</span>constituencies<span style="color: #080;">&#41;</span>,region<span style="color: #080;">=</span>constituencies<span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;"># boundaries to all West Midlands constituencies</span>
WestMidlands<span style="color: #080;">&lt;-</span>ddply<span style="color: #080;">&#40;</span>areas,.<span style="color: #080;">&#40;</span>group,region<span style="color: #080;">&#41;</span>,.<span style="">fun</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">row</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#123;</span>
  x<span style="color: #080;">&lt;-</span>fromJSON<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">file</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">paste</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'http://mapit.mysociety.org/area/'</span>,<span style="color: #0000FF; font-weight: bold;">row</span>$group,<span style="color: #ff0000;">'.geojson'</span>,sep<span style="color: #080;">=</span><span style="color: #ff0000;">''</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  x<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">unlist</span><span style="color: #080;">&#40;</span>x$coordinates<span style="color: #080;">&#41;</span>
  n<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span><span style="color: #080;">/</span><span style="color: #ff0000;">2</span>
  <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>long<span style="color: #080;">=</span>x<span style="color: #080;">&#91;</span><span style="color: #ff0000;">2</span><span style="color: #080;">*</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span>n<span style="color: #080;">&#41;</span><span style="color: #080;">-</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>,lat<span style="color: #080;">=</span>x<span style="color: #080;">&#91;</span><span style="color: #ff0000;">2</span><span style="color: #080;">*</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span>n<span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>  
<span style="color: #080;">&#125;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>

<p>(ignore the warnings, they&#8217;re all due to some non-existent end-of-line.)</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">head</span><span style="color: #080;">&#40;</span>WestMidlands<span style="color: #080;">&#41;</span>
  group                region      long      lat
<span style="color: #ff0000;">1</span> <span style="color: #ff0000;">65563</span> Shrewsbury and Atcham <span style="color: #080;">-</span><span style="color: #ff0000;">3.044652</span> <span style="color: #ff0000;">52.66554</span>
<span style="color: #ff0000;">2</span> <span style="color: #ff0000;">65563</span> Shrewsbury and Atcham <span style="color: #080;">-</span><span style="color: #ff0000;">3.044531</span> <span style="color: #ff0000;">52.66568</span>
<span style="color: #ff0000;">3</span> <span style="color: #ff0000;">65563</span> Shrewsbury and Atcham <span style="color: #080;">-</span><span style="color: #ff0000;">3.044481</span> <span style="color: #ff0000;">52.66573</span>
<span style="color: #ff0000;">4</span> <span style="color: #ff0000;">65563</span> Shrewsbury and Atcham <span style="color: #080;">-</span><span style="color: #ff0000;">3.044355</span> <span style="color: #ff0000;">52.66585</span>
<span style="color: #ff0000;">5</span> <span style="color: #ff0000;">65563</span> Shrewsbury and Atcham <span style="color: #080;">-</span><span style="color: #ff0000;">3.044110</span> <span style="color: #ff0000;">52.66605</span>
<span style="color: #ff0000;">6</span> <span style="color: #ff0000;">65563</span> Shrewsbury and Atcham <span style="color: #080;">-</span><span style="color: #ff0000;">3.043950</span> <span style="color: #ff0000;">52.66621</span></pre></div></div></div>

<p>Let&#8217;s see what&#8217;s the relative change in vacancies for each constituency per job title:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;">&nbsp;
<span style="color: #228B22;"># Compute relative change</span>
byLevel1$percentChange<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">100</span><span style="color: #080;">*</span><span style="color: #080;">&#40;</span>byLevel1$Vacancies2012<span style="color: #080;">-</span>byLevel1$Vacancies2011<span style="color: #080;">&#41;</span><span style="color: #080;">/</span>byLevel1$Vacancies2011
&nbsp;
<span style="color: #228B22;"># Connect map and data</span>
p<span style="color: #080;">&lt;-</span>ggplot<span style="color: #080;">&#40;</span>byLevel1, aes<span style="color: #080;">&#40;</span>map_id <span style="color: #080;">=</span> region<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> 
p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>geom_map<span style="color: #080;">&#40;</span>aes<span style="color: #080;">&#40;</span>fill <span style="color: #080;">=</span> percentChange<span style="color: #080;">&#41;</span>, map <span style="color: #080;">=</span> WestMidlands<span style="color: #080;">&#41;</span>
p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>expand_limits<span style="color: #080;">&#40;</span>x <span style="color: #080;">=</span> WestMidlands$long, y <span style="color: #080;">=</span> WestMidlands$lat<span style="color: #080;">&#41;</span>
<span style="color: #228B22;"># Colour scale from red to blue, cropped at -100 and 100. &quot;Lab&quot; is nicer on the eyes than RGB.</span>
p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>scale_fill_gradient2<span style="color: #080;">&#40;</span>limits<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #080;">-</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">100</span><span style="color: #080;">&#41;</span>,name<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;% change&quot;</span>,space<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Lab&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #228B22;"># mercator</span>
p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>coord_map<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>
<span style="color: #228B22;"># one plot per job level</span>
p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>facet_wrap<span style="color: #080;">&#40;</span>~level1<span style="color: #080;">&#41;</span>
<span style="color: #228B22;"># remove grid etc.</span>
p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>opts<span style="color: #080;">&#40;</span>
  axis.<span style="">title</span>.<span style="">x</span><span style="color: #080;">=</span>theme_blank<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>,
  axis.<span style="">title</span>.<span style="">y</span><span style="color: #080;">=</span>theme_blank<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>,
  panel.<span style="">grid</span>.<span style="">major</span><span style="color: #080;">=</span>theme_blank<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>,
  panel.<span style="">grid</span>.<span style="">minor</span><span style="color: #080;">=</span>theme_blank<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>,
  axis.<span style="">text</span>.<span style="">x</span><span style="color: #080;">=</span>theme_blank<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>,
  axis.<span style="">text</span>.<span style="">y</span><span style="color: #080;">=</span>theme_blank<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>,
  axis.<span style="">ticks</span><span style="color: #080;">=</span>theme_blank<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>
  <span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>p<span style="color: #080;">&#41;</span> <span style="color: #228B22;"># might take some time!</span></pre></td></tr></table></div></div>

<p><a href="http://chrisladroue.com/wp-content/uploads/2012/06/westmidlands.png"><img src="http://chrisladroue.com/wp-content/uploads/2012/06/westmidlands-941x1024.png" alt="" title="Job vacancies in West Midlands 2011-2012" width="640" height="696" class="aligncenter size-large wp-image-994" /></a></p>
<p>which is rather nice for a first go. Some areas appear gray because they&#8217;re off the scale; the relative change is over 100%. This hard limit is completely arbitrary, but setting it to 220 (and getting rid of the gray areas) results in very low constrast for the rest of the plot. One could fix that with a capping colour scale for example.</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">summary</span><span style="color: #080;">&#40;</span>byLevel1$percentChange<span style="color: #080;">&#41;</span>
   Min. 1st Qu.  <span style="">Median</span>    Mean 3rd Qu.    <span style="">Max</span>. 
<span style="color: #080;">-</span><span style="color: #ff0000;">74.580</span> <span style="color: #080;">-</span><span style="color: #ff0000;">18.200</span>   <span style="color: #ff0000;">2.732</span>   <span style="color: #ff0000;">7.292</span>  <span style="color: #ff0000;">26.780</span> <span style="color: #ff0000;">218.600</span> 
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">subset</span><span style="color: #080;">&#40;</span>byLevel1,percentChange<span style="color: #080;">&gt;</span><span style="color: #ff0000;">100</span><span style="color: #080;">&#41;</span>
                      region                                           level1 Vacancies2011 Vacancies2012 percentChange
<span style="color: #ff0000;">5</span>                Wyre Forest                       SKILLED TRADES OCCUPATIONS           <span style="color: #ff0000;">693</span>          <span style="color: #ff0000;">1470</span>      <span style="color: #ff0000;">112.1212</span>
<span style="color: #ff0000;">38</span>  Wolverhampton North East                         PROFESSIONAL OCCUPATIONS            <span style="color: #ff0000;">63</span>           <span style="color: #ff0000;">173</span>      <span style="color: #ff0000;">174.6032</span>
<span style="color: #ff0000;">83</span>                    Warley                         PROFESSIONAL OCCUPATIONS            <span style="color: #ff0000;">91</span>           <span style="color: #ff0000;">209</span>      <span style="color: #ff0000;">129.6703</span>
<span style="color: #ff0000;">114</span>               The Wrekin    CARING, LEISURE AND OTHER SERVICE OCCUPATIONS           <span style="color: #ff0000;">872</span>          <span style="color: #ff0000;">2236</span>      <span style="color: #ff0000;">156.4220</span>
<span style="color: #ff0000;">129</span>                 Tamworth ASSOCIATE PROFESSIONAL AND TECHNICAL OCCUPATIONS           <span style="color: #ff0000;">370</span>           <span style="color: #ff0000;">743</span>      <span style="color: #ff0000;">100.8108</span>
<span style="color: #ff0000;">135</span>                 Tamworth                           ELEMENTARY OCCUPATIONS           <span style="color: #ff0000;">634</span>          <span style="color: #ff0000;">1331</span>      <span style="color: #ff0000;">109.9369</span>
<span style="color: #ff0000;">190</span>   Stoke<span style="color: #080;">-</span>on<span style="color: #080;">-</span>Trent Central         MANAGERS, DIRECTORS AND SENIOR OFFICIALS           <span style="color: #ff0000;">188</span>           <span style="color: #ff0000;">599</span>      <span style="color: #ff0000;">218.6170</span>
<span style="color: #ff0000;">195</span>   Stoke<span style="color: #080;">-</span>on<span style="color: #080;">-</span>Trent Central    CARING, LEISURE AND OTHER SERVICE OCCUPATIONS           <span style="color: #ff0000;">667</span>          <span style="color: #ff0000;">1427</span>      <span style="color: #ff0000;">113.9430</span>
<span style="color: #ff0000;">204</span>  Staffordshire Moorlands    CARING, LEISURE AND OTHER SERVICE OCCUPATIONS           <span style="color: #ff0000;">363</span>           <span style="color: #ff0000;">730</span>      <span style="color: #ff0000;">101.1019</span>
<span style="color: #ff0000;">312</span>       Mid Worcestershire    CARING, LEISURE AND OTHER SERVICE OCCUPATIONS           <span style="color: #ff0000;">734</span>          <span style="color: #ff0000;">1869</span>      <span style="color: #ff0000;">154.6322</span>
<span style="color: #ff0000;">315</span>       Mid Worcestershire                           ELEMENTARY OCCUPATIONS          <span style="color: #ff0000;">1245</span>          <span style="color: #ff0000;">2744</span>      <span style="color: #ff0000;">120.4016</span>
<span style="color: #ff0000;">380</span>             Dudley North                         PROFESSIONAL OCCUPATIONS            <span style="color: #ff0000;">97</span>           <span style="color: #ff0000;">246</span>      <span style="color: #ff0000;">153.6082</span>
<span style="color: #ff0000;">384</span>             Dudley North    CARING, LEISURE AND OTHER SERVICE OCCUPATIONS          <span style="color: #ff0000;">1465</span>          <span style="color: #ff0000;">3117</span>      <span style="color: #ff0000;">112.7645</span>
<span style="color: #ff0000;">389</span>           Coventry South                         PROFESSIONAL OCCUPATIONS            <span style="color: #ff0000;">90</span>           <span style="color: #ff0000;">193</span>      <span style="color: #ff0000;">114.4444</span>
<span style="color: #ff0000;">521</span>    Birmingham, Edgbaston            PROCESS, PLANT AND MACHINE OPERATIVES          <span style="color: #ff0000;">1245</span>          <span style="color: #ff0000;">2548</span>      <span style="color: #ff0000;">104.6586</span></pre></td></tr></table></div></div>

<p>We can spot a 200% increase in managerial positions in Stoke-on-Trent from 2011 to 2012! I&#8217;ll leave it to the professionals to explain those numbers.</p>
<p>I&#8217;m stopping here but there&#8217;s obviously quite a lot you can do with this data, it all depends on what question you want to ask. Again, this data is available for free, which is rather nice and crossed with other datasets (like geographical location here), we can do quite a lot &#8212; after some preprocessing work &#8212; in a few lines of code. See for example what <a href="https://twitter.com/#!/AndyPryke">Andy Pryke</a> and <a href="https://twitter.com/#!/snim2">Sarah Mount</a> did with similar datasets on that day: <a href="http://www.youtube.com/watch?v=onskC-tPTH4">video</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisladroue.com/2012/06/an-exercise-in-r-using-local-open-data/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>spam evolution</title>
		<link>http://chrisladroue.com/2012/04/spam-evolution/</link>
		<comments>http://chrisladroue.com/2012/04/spam-evolution/#comments</comments>
		<pubDate>Thu, 26 Apr 2012 15:56:33 +0000</pubDate>
		<dc:creator><![CDATA[CL]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[spam]]></category>

		<guid isPermaLink="false">http://chrisladroue.com/?p=918</guid>
		<description><![CDATA[Despite some rather modest protection (like a simple captcha), I still receive spammy comments on this blog every now and again. They&#8217;re easily spotted and actually never appear on the website. There&#8217;s obviously an incentive for the spammer to post &#8230; <a href="http://chrisladroue.com/2012/04/spam-evolution/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p><div style="width: 146px" class="wp-caption alignleft"><img alt="" src="http://farm1.staticflickr.com/143/400569507_d84d1d8250_o.gif" title="Spam, by celesteh@flickr" width="136" height="191" /><p class="wp-caption-text"><a href='http://www.flickr.com/photos/celesteh/400569507/'>Spam, by celesteh@flickr</a></p></div>Despite some rather modest protection (like a simple captcha), I still receive spammy comments on this blog every now and again. They&#8217;re easily spotted and actually never appear on the website.</p>
<p>There&#8217;s obviously an incentive for the spammer to post something as convincing as possible: either you&#8217;re taken in and think it&#8217;s a genuine comment, or it takes so much time for you to decide whether it&#8217;s genuine or not, you just give up. In order to achieve that, I&#8217;ve noticed a new generation of comments that simply copy texts from somewhere on the web. The text is more readable than a Markov-chain generated blurb and thus more taxing for the blogger to identify. It does it with a twist though: there&#8217;s usually a word seemingly deliberately misspelt. Here is an example:</p>
<blockquote><p> Hi Louis apparently my honstig company have had a few issues today. As far as I can see, the images are there now. Have they returned for you as well? If not, I can try tweaking a few things and seeing what happens</p></blockquote>
<p>I wondered why the spelling mistake was introduced and my current, unsubstantiated guess is that it&#8217;s a way for the spammer to detect which have gone through and identify blogs that are weak on security.</p>
<p>Today I&#8217;ve started receiving an even more pernicious spammy comments on my blog: the comments are genuine comments from R-related blogs and thus even more difficult to spot since they seem, at least superficially, somewhat related to the post they&#8217;re posted under. Here is an example:</p>
<blockquote><p>Lattice and ggplot add a lot of value in that they <strong>pruocde</strong> objects with which you can do things. Also, the whole reason lattice (trellis) was created in the first place was to provide a powerful system that takes care of a lot of tedious things. For example, if you want a histogram conditional on some categorical variable, you’ve got it immediately. Just because it also works in the simple case presented above does not mean it is an equivalent alternative to hist(). I would say that having many options does not make R look like legacy at all. If you need something simple, use something simple (like hist()). If you need something more powerful and flexible, use that.</p></blockquote>
<p>It threw me at first, because my original post was indeed about ggplot but it was completely off-topic and I got suspicious. I found its origin on a <a href="http://www.theusrus.de/blog/babylonian-confusion-or-flexibility/">2009 blog post</a>. Notice that the spelling mistake does not appear in the original (?) comment. </p>
<p>I filed the comment as spam, slightly amused by the attempt and what do you know? A few hours later, I receive another spammy comment, which is exactly the reply of the comment in the original thread.</p>
<blockquote><p>to whom it may concern I was never in doubt, that havnig graphic objects and conditioning is an advantage (sorry, when I was unclear at this point) but as you already pointed out, there are already two packages which are mostly equivalent from an ordinary user’s perspective.My concern regards havnig many packages in parallel with very much overlap and little structured and coordinated progress.</p></blockquote>
<p>Again, with added misspelt words. This type of spam definitely requires more time to identify and I guess it&#8217;s achieving its purpose. I wonder how widespread this is. One unintended consequence of this might be fewer off-topic comments though!</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisladroue.com/2012/04/spam-evolution/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A graphical overview of your MySQL database</title>
		<link>http://chrisladroue.com/2012/03/a-graphical-overview-of-your-mysql-database/</link>
		<comments>http://chrisladroue.com/2012/03/a-graphical-overview-of-your-mysql-database/#comments</comments>
		<pubDate>Sun, 18 Mar 2012 21:48:07 +0000</pubDate>
		<dc:creator><![CDATA[CL]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://chrisladroue.com/?p=875</guid>
		<description><![CDATA[If you use MySQL, there&#8217;s a default schema called &#8216;information_schema&#8216; which contains lots of information about your schemas and tables among other things. Recently I wanted to know whether a table I use for storing the results of a large &#8230; <a href="http://chrisladroue.com/2012/03/a-graphical-overview-of-your-mysql-database/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p><a href="http://chrisladroue.com/wp-content/uploads/2012/03/tablesUsageThumb.png"><img src="http://chrisladroue.com/wp-content/uploads/2012/03/tablesUsageThumb.png" alt="" title="tablesUsageThumb" width="174" height="250" class="alignleft size-full wp-image-905" /></a>If you use MySQL, there&#8217;s a default schema called &#8216;<tt>information_schema</tt>&#8216; which contains lots of information about your schemas and tables among other things. Recently I wanted to know whether a table I use for storing the results of a large number experiments was any way near maxing out. To cut a brief story even shorter, the answer was &#8220;<em>not even close</em>&#8221; and could be found in &#8216;<tt>information_schema.TABLES</tt>&#8216;. Not being one to avoid any opportunity to procrastinate, I went on to write a short script to produce a global overview of the entire database.</p>
<p><tt>infomation_schema.TABLES</tt> contains the following fields: TABLE_SCHEMA, TABLE_NAME, TABLE_ROWS, AVG_ROW_LENGTH and MAX_DATA_LENGTH (and a few others). We can first have a look at the relative sizes of the schemas with the MySQL query &#8220;<tt>SELECT TABLE_SCHEMA,SUM(DATA_LENGTH) SCHEMA_LENGTH FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema' GROUP BY TABLE_SCHEMA</tt>&#8220;.</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;ggplot2&quot;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># You'll need ggplot2 0.9 for this.</span>
<span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;reshape2&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;RMySQL&quot;</span><span style="color: #080;">&#41;</span>
&nbsp;
connection<span style="color: #080;">&lt;-</span>dbConnect<span style="color: #080;">&#40;</span>MySQL<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>, user<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;username&quot;</span>, password<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;XXXXXX&quot;</span>,host<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;127.0.0.1&quot;</span>,port<span style="color: #080;">=</span><span style="color: #ff0000;">3306</span>,dbname<span style="color: #080;">=</span><span style="color: #ff0000;">''</span><span style="color: #080;">&#41;</span>
&nbsp;
  query<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">&quot;SELECT TABLE_SCHEMA,SUM(DATA_LENGTH) SCHEMA_LENGTH FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema' GROUP BY TABLE_SCHEMA&quot;</span>
  result<span style="color: #080;">&lt;-</span>dbGetQuery<span style="color: #080;">&#40;</span>connection,query<span style="color: #080;">&#41;</span>
  result$TABLE_SCHEMA<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">reorder</span><span style="color: #080;">&#40;</span>result$TABLE_SCHEMA,result$SCHEMA_LENGTH<span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>ggplot<span style="color: #080;">&#40;</span>result<span style="color: #080;">&#41;</span><span style="color: #080;">+</span>geom_bar<span style="color: #080;">&#40;</span>aes<span style="color: #080;">&#40;</span>x<span style="color: #080;">=</span>TABLE_SCHEMA,y<span style="color: #080;">=</span>SCHEMA_LENGTH<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>coord_flip<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>xlab<span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Size&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>ylab<span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;&quot;</span><span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>opts<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">title</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Schemas' size&quot;</span><span style="color: #080;">&#41;</span>
  <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>p<span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>

<p><a href="http://chrisladroue.com/wp-content/uploads/2012/03/schemas.png"><img src="http://chrisladroue.com/wp-content/uploads/2012/03/schemas-269x300.png" alt="" title="schemas" width="269" height="300" class="aligncenter size-medium wp-image-897" /></a></p>
<p>And for the whole overview, let&#8217;s break each schema down by tables:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;">  query<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">&quot;SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_ROWS,DATA_LENGTH FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema'&quot;</span>
  result<span style="color: #080;">&lt;-</span>dbGetQuery<span style="color: #080;">&#40;</span>connection,query<span style="color: #080;">&#41;</span>
  result<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">within</span><span style="color: #080;">&#40;</span>result,TABLE_NAME<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">factor</span><span style="color: #080;">&#40;</span>TABLE_NAME,<span style="color: #0000FF; font-weight: bold;">levels</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">sort</span><span style="color: #080;">&#40;</span>TABLE_NAME,decreasing<span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>ggplot<span style="color: #080;">&#40;</span>result<span style="color: #080;">&#41;</span><span style="color: #080;">+</span>geom_bar<span style="color: #080;">&#40;</span>aes<span style="color: #080;">&#40;</span>x<span style="color: #080;">=</span>TABLE_NAME,y<span style="color: #080;">=</span>DATA_LENGTH<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>coord_flip<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>facet_wrap<span style="color: #080;">&#40;</span>~TABLE_SCHEMA,scales<span style="color: #080;">=</span><span style="color: #ff0000;">'free'</span><span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>xlab<span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Size&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>ylab<span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;&quot;</span><span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>opts<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">title</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Tables' size&quot;</span><span style="color: #080;">&#41;</span>
  <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>p<span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>

<p><a href="http://chrisladroue.com/wp-content/uploads/2012/03/tables.png"><img class="aligncenter size-medium wp-image-896" title="tables" src="http://chrisladroue.com/wp-content/uploads/2012/03/tables-300x298.png" alt="" width="300" height="298" /></a><br />
Also, using the AVG_ROW_LENGTH and MAX_DATA_LENGTH and assuming a relatively constant row length, we can derive the maximum number of rows that a table can use, which gives us an estimate of how much space there is left:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;">  query<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">&quot;SELECT TABLE_SCHEMA,TABLE_NAME,100*TABLE_ROWS/FLOOR(MAX_DATA_LENGTH/AVG_ROW_LENGTH) AS USED FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema'&quot;</span>
<span style="color: #228B22;">#   query&lt;-&quot;SELECT TABLE_SCHEMA,TABLE_NAME,RAND(42)*100 AS USED FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema'&quot;</span>
&nbsp;
  result<span style="color: #080;">&lt;-</span>dbGetQuery<span style="color: #080;">&#40;</span>connection,query<span style="color: #080;">&#41;</span>
  result$LEFTOVER<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">100</span><span style="color: #080;">-</span>result$USED
  result<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">within</span><span style="color: #080;">&#40;</span>result,TABLE_NAME<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">factor</span><span style="color: #080;">&#40;</span>TABLE_NAME,<span style="color: #0000FF; font-weight: bold;">levels</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">sort</span><span style="color: #080;">&#40;</span>TABLE_NAME,decreasing<span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  result<span style="color: #080;">&lt;-</span>melt<span style="color: #080;">&#40;</span>result,id.<span style="">vars</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;TABLE_SCHEMA&quot;</span>,<span style="color: #ff0000;">&quot;TABLE_NAME&quot;</span><span style="color: #080;">&#41;</span>,variable.<span style="">name</span><span style="color: #080;">=</span><span style="color: #ff0000;">'TYPE'</span>,value.<span style="">name</span><span style="color: #080;">=</span><span style="color: #ff0000;">'PROPORTION'</span>,na.<span style="">rm</span><span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>ggplot<span style="color: #080;">&#40;</span>result<span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>geom_bar<span style="color: #080;">&#40;</span>aes<span style="color: #080;">&#40;</span>x<span style="color: #080;">=</span>TABLE_NAME,y<span style="color: #080;">=</span>PROPORTION,fill<span style="color: #080;">=</span>TYPE<span style="color: #080;">&#41;</span>,stat<span style="color: #080;">=</span><span style="color: #ff0000;">'identity'</span><span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>coord_flip<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>facet_wrap<span style="color: #080;">&#40;</span>~TABLE_SCHEMA,scales<span style="color: #080;">=</span><span style="color: #ff0000;">'free'</span><span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>scale_fill_manual<span style="color: #080;">&#40;</span>values<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;USED&quot;</span><span style="color: #080;">=</span><span style="color: #ff0000;">'#DD0000'</span>,LEFTOVER<span style="color: #080;">=</span><span style="color: #ff0000;">'#AAAAAA'</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>xlab<span style="color: #080;">&#40;</span><span style="color: #ff0000;">''</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>ylab<span style="color: #080;">&#40;</span><span style="color: #ff0000;">''</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>opts<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">title</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Tables' usage&quot;</span><span style="color: #080;">&#41;</span>
  <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>p<span style="color: #080;">&#41;</span>
&nbsp;
  query<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">&quot;SELECT TABLE_SCHEMA, MAX(100*TABLE_ROWS/FLOOR(MAX_DATA_LENGTH/AVG_ROW_LENGTH)) AS USED FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema' GROUP BY TABLE_SCHEMA&quot;</span>
<span style="color: #228B22;">#   query&lt;-&quot;SELECT TABLE_SCHEMA, MAX(100*RAND(42)) AS USED FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema' GROUP BY TABLE_SCHEMA&quot;</span>
&nbsp;
  result<span style="color: #080;">&lt;-</span>dbGetQuery<span style="color: #080;">&#40;</span>connection,query<span style="color: #080;">&#41;</span>
  result$LEFTOVER<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">100</span><span style="color: #080;">-</span>result$USED
  result$TABLE_SCHEMA<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">reorder</span><span style="color: #080;">&#40;</span>result$TABLE_SCHEMA,result$USED<span style="color: #080;">&#41;</span>
  result<span style="color: #080;">&lt;-</span>melt<span style="color: #080;">&#40;</span>result,id.<span style="">vars</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;TABLE_SCHEMA&quot;</span><span style="color: #080;">&#41;</span>,variable.<span style="">name</span><span style="color: #080;">=</span><span style="color: #ff0000;">'TYPE'</span>,value.<span style="">name</span><span style="color: #080;">=</span><span style="color: #ff0000;">'PROPORTION'</span>,na.<span style="">rm</span><span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>ggplot<span style="color: #080;">&#40;</span>result<span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>geom_bar<span style="color: #080;">&#40;</span>aes<span style="color: #080;">&#40;</span>x<span style="color: #080;">=</span>TABLE_SCHEMA,y<span style="color: #080;">=</span>PROPORTION,fill<span style="color: #080;">=</span>TYPE<span style="color: #080;">&#41;</span>,stat<span style="color: #080;">=</span><span style="color: #ff0000;">'identity'</span><span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>coord_flip<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>scale_fill_manual<span style="color: #080;">&#40;</span>values<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;USED&quot;</span><span style="color: #080;">=</span><span style="color: #ff0000;">'#DD0000'</span>,LEFTOVER<span style="color: #080;">=</span><span style="color: #ff0000;">'#AAAAAA'</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  p<span style="color: #080;">&lt;-</span>p<span style="color: #080;">+</span>xlab<span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>ylab<span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>opts<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">title</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Largest Usage&quot;</span><span style="color: #080;">&#41;</span>
  <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>p<span style="color: #080;">&#41;</span>
dbDisconnect<span style="color: #080;">&#40;</span>connection<span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>

<p><a href="http://chrisladroue.com/wp-content/uploads/2012/03/schemasUsage.png"><img class="aligncenter size-medium wp-image-894" title="schemasUsage" src="http://chrisladroue.com/wp-content/uploads/2012/03/schemasUsage-300x298.png" alt="" width="300" height="298" /></a></p>
<p><a href="http://chrisladroue.com/wp-content/uploads/2012/03/tablesUsage.png"><img class="aligncenter size-medium wp-image-895" title="tablesUsage" src="http://chrisladroue.com/wp-content/uploads/2012/03/tablesUsage-300x274.png" alt="" width="300" height="274" /></a>Unless you are using very large tables, those last two graphs should come out pretty much all gray. You can check that the colouring works by using the commented out queries instead, which use random values for the estimates.</p>
<p>About <tt>dbConnect()</tt>: I left it here to make things easier to replicate but I normally call a simple function which is just a wrapper for it, with my username and password in. This way my credentials are in one single place instead of all over my scripts.</p>
<p>PS: This is my first anniveRsary! I&#8217;ve been using R for a year now. And I&#8217;m certainly planning to carry on.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisladroue.com/2012/03/a-graphical-overview-of-your-mysql-database/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Parallelising plink (or anything else) the easy way</title>
		<link>http://chrisladroue.com/2012/03/parallelising-plink-or-anything-else-the-easy-way/</link>
		<comments>http://chrisladroue.com/2012/03/parallelising-plink-or-anything-else-the-easy-way/#comments</comments>
		<pubDate>Sat, 10 Mar 2012 16:18:57 +0000</pubDate>
		<dc:creator><![CDATA[CL]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[gwas]]></category>
		<category><![CDATA[parallel]]></category>
		<category><![CDATA[plink]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://chrisladroue.com/?p=851</guid>
		<description><![CDATA[plink is the swiss-army knife of genome association studies. Its impressive tool set can be seen here. I am currently running some experiments for which I need to compute associations between 30&#8217;000 SNPs and 130 assays. This calculation is only &#8230; <a href="http://chrisladroue.com/2012/03/parallelising-plink-or-anything-else-the-easy-way/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p><img alt="" src="http://www.gnu.org/software/parallel/logo-gray+black300.png" title="GNU Parallel" class="alignright" width="300" height="191" /><a href="http://pngu.mgh.harvard.edu/~purcell/plink/index.shtml">plink</a> is the swiss-army knife of genome association studies. Its impressive tool set can be seen <a href="http://pngu.mgh.harvard.edu/~purcell/plink/reference.shtml">here</a>. I am currently running some experiments for which I need to compute associations between 30&#8217;000 SNPs and 130 assays. This calculation is only the first step of the experiments, which I want to run as many times as possible. So to save time, the more direct approach is to try and parallelise the whole process.</p>
<p>Enters <a href="http://www.gnu.org/software/parallel/">GNU parallel</a>, an amazing unix command which makes the parallelisation a piece of cake. The best way to learn is to go through <a href="http://www.gnu.org/software/parallel/man.html">the numerous examples</a>. As you can see, it&#8217;s used pretty much as a normal <TT>xargs</TT> (see this recent post about <a href="http://gettinggeneticsdone.blogspot.com/2012/03/find-xargs-like-boss.html">xargs</a> on Getting Genetics Done).</p>
<p>To compute the associations, I created a <TT>.phen</TT> file which contains all the assays, as well as each subjects&#8217; family and ID. This is just a long tab-separated text file. Its header starts with <TT>FID    IID    nameAssay1    nameAssay2 etc.</TT>.</p>
<p>A normal use of <TT>plink</TT> would look like this:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="bash" style="font-family:monospace;">plink <span style="color: #660033;">--manyOptions</span> <span style="color: #660033;">--pheno</span> allAssays.phen <span style="color: #660033;">--all-pheno</span> <span style="color: #660033;">--linear</span> <span style="color: #660033;">--out</span> analyses</pre></td></tr></table></div></div>

<p>This will calculate a linear regression for each pair (SNP,assays) and store the results in a directory <TT>analyses</TT>. Time it took in my case: about 80mn.</p>
<p>Using GNU <TT>parallel</TT> however, the change is minimal. I just need:</p>
<ul>
<li>to parse the header in order to extract all the assays&#8217; names.
</li>
<li>to tell plink which phenotype I want to process. This is done with <TT>&#8211;pheno-name</TT></li>
</ul>
<p>The first bit is done with a simple combination of usual unix tools:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">head</span> <span style="color: #660033;">-n1</span> allAssays.phen <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">cut</span> <span style="color: #660033;">-f</span> <span style="color: #000000;">3</span>- <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">sed</span> <span style="color: #ff0000;">'s/\t/\n/g'</span></pre></td></tr></table></div></div>

<p>This will produce the list of assays.</p>
<p>Now combine this with <TT>&#8211;pheno-name</TT> and <TT>parallel</TT>:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">head</span> <span style="color: #660033;">-n1</span> allAssays.phen <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">cut</span> <span style="color: #660033;">-f</span> <span style="color: #000000;">3</span>- <span style="color: #000000; font-weight: bold;">|</span><span style="color: #c20cb9; font-weight: bold;">sed</span> <span style="color: #ff0000;">'s/\t/\n/g'</span><span style="color: #000000; font-weight: bold;">|</span>parallel plink <span style="color: #660033;">--manyOptions</span> <span style="color: #660033;">--pheno</span> allAssays.phen <span style="color: #660033;">--pheno-name</span> <span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #7a0874; font-weight: bold;">&#125;</span> <span style="color: #660033;">--linear</span> <span style="color: #660033;">--out</span> analyses<span style="color: #000000; font-weight: bold;">/</span>experimentID.<span style="color: #7a0874; font-weight: bold;">&#123;</span><span style="color: #7a0874; font-weight: bold;">&#125;</span></pre></td></tr></table></div></div>

<p>And this is it! I&#8217;ve just piped the list of assays to <TT>parallel plink</TT>. This now runs <TT>#cores</TT> copies of plink, each processing one phenotype. Each instance of <TT>{}</TT> is replaced by what is piped in, in this case, the name of a phenotype. You really can&#8217;t make it easier. How satisfying it is to do an <TT>htop</TT> and watch all processors being used!<br />
The whole thing is now done in 10-15mn, with very little extra effort to make it work.</p>
<p><strong>Installation</strong><br />
The official website provides the <a href="http://www.gnu.org/software/parallel/">sources</a> and some binaries for it. If you use Ubuntu, there&#8217;s a PPA available <a href="https://launchpad.net/~ieltonf/+archive/ppa">here</a> and it&#8217;s straightforward to install. Note that there&#8217;s a Ubuntu package called &#8216;moreutils&#8217; which contains a parallel command, but it&#8217;s different from GNU parallel.</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisladroue.com/2012/03/parallelising-plink-or-anything-else-the-easy-way/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>polar histogram: pretty and useful</title>
		<link>http://chrisladroue.com/2012/02/polar-histogram-pretty-and-useful/</link>
		<comments>http://chrisladroue.com/2012/02/polar-histogram-pretty-and-useful/#comments</comments>
		<pubDate>Tue, 21 Feb 2012 16:41:39 +0000</pubDate>
		<dc:creator><![CDATA[CL]]></dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[histogram]]></category>
		<category><![CDATA[plyr]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://chrisladroue.com/?p=803</guid>
		<description><![CDATA[Do you have tens of histograms to show but no room to put them all on the page? As I was reading this paper in Nature Genetics, I came across a simple and clever way of packing all this information &#8230; <a href="http://chrisladroue.com/2012/02/polar-histogram-pretty-and-useful/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p><a href="http://chrisladroue.com/wp-content/uploads/2012/02/g11948.png"><img src="http://chrisladroue.com/wp-content/uploads/2012/02/polarHistogramIcon.png" alt="" title="polarHistogramIcon" width="150" height="200" class="alignleft size-full wp-image-804" /></a>Do you have tens of histograms to show but no room to put them all on the page? As I was reading <a href="http://dx.doi.org/10.1038/ng.1073">this paper</a> in Nature Genetics, I came across a simple and clever way of packing all this information in a small space: arrange them all around a circle, and add some guides to help their cross-comparison.</p>
<p>It didn&#8217;t look too difficult to implement in <TT>ggplot2</TT> thanks to polar coordinates and after a busy Saturday afternoon I ended up with the following image with my data (*) (and a <a href="http://chrisladroue.com/wp-content/uploads/2012/02/polarHistogramFudged.pdf">poster-ready pdf</a>, after 2 seconds of prettying up with <a href="http://inkscape.org/">Inkscape</a>):</p>
<p><a href="http://chrisladroue.com/wp-content/uploads/2012/02/polarHistogramFudged.png"><img src="http://chrisladroue.com/wp-content/uploads/2012/02/polarHistogramFudged-300x266.png" alt="" title="polarHistogramFudged" width="300" height="266" class="aligncenter size-medium wp-image-806" /></a></p>
<p>The graph shows the proportion of some SNP scores (&#8216;first&#8217;, &#8216;second&#8217; and &#8216;third&#8217;) for a number of phenotypes, which are grouped by themes. I&#8217;m quite happy with the result. It&#8217;s pretty and useful: it&#8217;s very easy to compare one histogram with any of the other 60. </p>
<p>The code is still a bit rough around the edges; a few things are not terribly elegant or are hard-coded. An improved version will be shipped with our graphical package next month. In the mean-time, <a href='http://chrisladroue.com/wp-content/uploads/2012/02/polarHistogram.R.zip'>here it is</a>, if you want to try it with your own data. </p>
<p>It returns a <TT>ggplot</TT> object containing the graph. You can either display it, with <TT>print()</TT>, save it as a pdf with <TT>ggsave(&#8220;myPlot.pdf&#8221;)</TT> or modify it with the usual <TT>ggplot2</TT> commands. I&#8217;ve called it polar histogram, which, I think, is self-explanatory. If you know how it&#8217;s actually called, please let me know. <small>(No, I will not call it polR histogram.)</small></p>
<p>And here is some fake data to get you going:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;"># fake data for polarHistogram()</span>
<span style="color: #228B22;"># Christophe Ladroue</span>
<span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span>plyr<span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span>ggplot2<span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">source</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;polarHistogram.R&quot;</span><span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;"># a little helper that generates random names for families and items.</span>
randomName<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>n<span style="color: #080;">=</span><span style="color: #ff0000;">1</span>,syllables<span style="color: #080;">=</span><span style="color: #ff0000;">3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#123;</span>
  vowels<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;a&quot;</span>,<span style="color: #ff0000;">&quot;e&quot;</span>,<span style="color: #ff0000;">&quot;i&quot;</span>,<span style="color: #ff0000;">&quot;o&quot;</span>,<span style="color: #ff0000;">&quot;u&quot;</span>,<span style="color: #ff0000;">&quot;y&quot;</span><span style="color: #080;">&#41;</span>
  consonants<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">setdiff</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">letters</span>,vowels<span style="color: #080;">&#41;</span>
  <span style="color: #0000FF; font-weight: bold;">replicate</span><span style="color: #080;">&#40;</span>n,
            <span style="color: #0000FF; font-weight: bold;">paste</span><span style="color: #080;">&#40;</span>
              <span style="color: #0000FF; font-weight: bold;">rbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>consonants,syllables,<span style="color: #0000FF; font-weight: bold;">replace</span><span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span>,
                    <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span>vowels,syllables,<span style="color: #0000FF; font-weight: bold;">replace</span><span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
              sep<span style="color: #080;">=</span><span style="color: #ff0000;">''</span>,collapse<span style="color: #080;">=</span><span style="color: #ff0000;">''</span><span style="color: #080;">&#41;</span>
            <span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span>
&nbsp;
  <span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">42</span><span style="color: #080;">&#41;</span>
&nbsp;
  nFamily<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">20</span>
  nItemPerFamily<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">6</span>,nFamily,<span style="color: #0000FF; font-weight: bold;">replace</span><span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span>
  nValues<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">3</span>
&nbsp;
  df<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>
    <span style="color: #0000FF; font-weight: bold;">family</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">rep</span><span style="color: #080;">&#40;</span>randomName<span style="color: #080;">&#40;</span>nFamily<span style="color: #080;">&#41;</span>,nItemPerFamily<span style="color: #080;">&#41;</span>,
    item<span style="color: #080;">=</span>randomName<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>nItemPerFamily<span style="color: #080;">&#41;</span>,<span style="color: #ff0000;">2</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
&nbsp;
df<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span>,<span style="color: #0000FF; font-weight: bold;">as.<span style="">data</span>.<span style="">frame</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">matrix</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">runif</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#41;</span><span style="color: #080;">*</span>nValues<span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">ncol</span><span style="color: #080;">=</span>nValues<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
&nbsp;
&nbsp;
  df<span style="color: #080;">&lt;-</span>melt<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span>,<span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;family&quot;</span>,<span style="color: #ff0000;">&quot;item&quot;</span><span style="color: #080;">&#41;</span>,variable_name<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;score&quot;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># from wide to long</span>
  p<span style="color: #080;">&lt;-</span>polarHistogram<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span>,familyLabel<span style="color: #080;">=</span>FALSE<span style="color: #080;">&#41;</span>
  <span style="color: #0000FF; font-weight: bold;">print</span><span style="color: #080;">&#40;</span>p<span style="color: #080;">&#41;</span></pre></td></tr></table></div></div>

<p><strong>Options:</strong><br />
Many defaults can be changed already, look at the code for the complete list. The two things you might want to change are <TT>familyLabels</TT> (logical) which displays (or not) the name of each group as well, and direction, which is either &#8216;inwards&#8217; or &#8216;outwards&#8217;. </p>
<p><strong>Coding notes:</strong><br />
It wasn&#8217;t terribly difficult but it did take me a bit longer than expected, for a few reasons:
<ol>
<LI> <TT>coord_polar()</TT> doesn&#8217;t affect the orientation of <TT>geom_text()</TT> so it had to be calculated manually.</LI><br />
<LI>You&#8217;ll notice that the label orientations change between 6 and 9 o&#8217;clock, or they would end up upside down and be difficult to read.</li>
<p><LI>There are some scoping issues with <a href="https://github.com/hadley/plyr/issues/3"><TT>plyr</TT></a> and <TT>ggplot2</TT> which can be a bit annoying once you encapsulate your code in a function. For example:</p>

<div class="my_syntax_box"><span class="my_syntax_selecall"><a href="javascript:;" onclick="selectCode(this); return false;">Select All</a> </span><span class="my_syntax_Bar">Code:</span><div class="my_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code"><pre class="rsplus" style="font-family:monospace;">df<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>
  x<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">runif</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">10</span><span style="color: #080;">&#41;</span>,
  y<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">runif</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">10</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
&nbsp;
z<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">10</span>
ggplot<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>geom_point<span style="color: #080;">&#40;</span>aes<span style="color: #080;">&#40;</span>x<span style="color: #080;">=</span>x<span style="color: #080;">+</span>z,y<span style="color: #080;">=</span>y<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># works</span>
&nbsp;
<span style="color: #0000FF; font-weight: bold;">rm</span><span style="color: #080;">&#40;</span>z<span style="color: #080;">&#41;</span>
fakeFunction<span style="color: #080;">&lt;-</span><span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#123;</span>
  z<span style="color: #080;">&lt;-</span><span style="color: #ff0000;">10</span>
  ggplot<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>geom_point<span style="color: #080;">&#40;</span>aes<span style="color: #080;">&#40;</span>x<span style="color: #080;">=</span>x<span style="color: #080;">+</span>z,y<span style="color: #080;">=</span>y<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  <span style="color: #080;">&#125;</span>
&nbsp;
fakeFunction<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">df</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># error</span></pre></td></tr></table></div></div>

<p></LI>
</ol>
<p>Happy plotting!</p>
<p><small>(*) The numbers are fudged, don&#8217;t spend time reverse-engineering them.</small></p>
<p><b>Update (24/03/2012):</b><br />
Christos Hatzis has modified my original code to plot a collection of un-normalised bar charts, like <a href='http://chrisladroue.com/wp-content/uploads/2012/02/Ex1.pdf'>this</a>.</p>
<p>He&#8217;s happy to share his code here: <a href='http://chrisladroue.com/wp-content/uploads/2012/02/PolarBarchart.zip'>PolarBarchart.zip</a>, together with a test file.</p>
<p><b>Update (02/06/2012):</b><br />
You can find a better version in my R package &#8216;<a href="http://chrisladroue.com/phorest/">phorest</a>&#8216;.</p>
<p><b>Update (24/04/2015):</b><br />
I&#8217;ve finally updated the code for the new version of ggplot2. It&#8217;s here:<br />
&#8216;<a href="https://github.com/chrislad/phenotypicForest/">https://github.com/chrislad/phenotypicForest/</a>&#8216;.<br />
Untested and provided as is. Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://chrisladroue.com/2012/02/polar-histogram-pretty-and-useful/feed/</wfw:commentRss>
		<slash:comments>39</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 2.011 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2018-03-11 16:10:56 -->
