<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Psychwire</title>
	
	<link>http://www.psychwire.co.uk</link>
	<description>Hayward Godwin's website</description>
	<lastBuildDate>Thu, 16 May 2013 12:39:29 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/psychwire_blog" /><feedburner:info uri="psychwire_blog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:browserFriendly></feedburner:browserFriendly><item>
		<title>Blog has Moved!!</title>
		<link>http://www.psychwire.co.uk/2011/05/blog-has-moved/</link>
		<comments>http://www.psychwire.co.uk/2011/05/blog-has-moved/#comments</comments>
		<pubDate>Wed, 11 May 2011 19:33:14 +0000</pubDate>
		<dc:creator>hayward</dc:creator>
				<category><![CDATA[Blog]]></category>

		<guid isPermaLink="false">http://www.psychwire.co.uk/?p=822</guid>
		<description><![CDATA[My site will remain here but my blog has migrated to the address below: http://psychwire.wordpress.com/ Thanks!]]></description>
				<content:encoded><![CDATA[<p>My site will remain here but my blog has migrated to the address below:</p>
<p><a href="http://psychwire.wordpress.com/">http://psychwire.wordpress.com/</a></p>
<p>Thanks!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychwire.co.uk/2011/05/blog-has-moved/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Charting the Defeat of AV using R (and some ggplot2 and merge operations on top)</title>
		<link>http://www.psychwire.co.uk/2011/05/charting-the-defeat-of-av-using-r-and-some-ggplot2-and-merge-operations-on-top/</link>
		<comments>http://www.psychwire.co.uk/2011/05/charting-the-defeat-of-av-using-r-and-some-ggplot2-and-merge-operations-on-top/#comments</comments>
		<pubDate>Sun, 08 May 2011 14:08:28 +0000</pubDate>
		<dc:creator>hayward</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[plyr]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[scatterplot]]></category>

		<guid isPermaLink="false">http://www.psychwire.co.uk/?p=778</guid>
		<description><![CDATA[In this post, I&#8217;ll be graphing some results from a recent referendum held here in the UK and combining it with the results of a set of local elections that were held at the same time. I&#8217;ll give some examples of graphing stuff using ggplot2 and will also show some info regarding merging datasets. At [...]]]></description>
				<content:encoded><![CDATA[<p>In this post, I&#8217;ll be graphing some results from a recent referendum held here in the UK and combining it with the results of a set of local elections that were held at the same time. I&#8217;ll give some examples of graphing stuff using ggplot2 and will also show some info regarding merging datasets.</p>
<p><span style="text-decoration: underline;"><strong>At the outset, I want to point out that this isn&#8217;t intended to be a &#8216;using stats to be political&#8217; post. I just like playing around with data. Don&#8217;t for a second assume that I&#8217;m trying to say anything meaningful here. It&#8217;s just for entertainment purposes only.</strong></span></p>
<p><span style="font-size: large;">The Vote, and the Alternative Vote</span></p>
<p>We have a coalition government in the UK, between the Conservatives and the Liberal Democrats. One thing the Lib Dems have pushed for, and in a rare instance of getting their own way have actually achieved, is having a referendum on changing the voting system here. They wanted to institute Alternative Voting. The vote was cast last week. AV was crushed.</p>
<p>At the same time, votes were cast for the local councils. The people who voted, myself included, were handed two exciting bits of paper to scribble on at the polling station. The Lib Dems lost the most ground in the past 30 years.</p>
<p><span style="font-size: large;">Charting the defeat of AV using R</span></p>
<p>So I saw on the Guardian website that they were offering a spreadsheet of the AV results broken down by different areas in the UK. I played around with a bit, and then thought that it might be interesting to compare the AV results to the local council election results. Yes, there&#8217;s a load of correlation not implying causation from that idea, as people who voted in the AV referendum may not have necessarily voted in the local council elections, and furthermore, people who did vote in both may not have voted consistently with the party that they were supporting. In other words, many people may have voted Lib Dem, the party which favours AV the most, and then voted against AV. Still, cross-comparing the results from the referendum and the local elections should, at an overall level, give some basic indication of the feeling and political vibe in different areas. Again, remember, this is all for fun. I&#8217;m more than happy to admit that I&#8217;m not an expert (or even a novice) in political science, if that&#8217;s what you even call this whole &#8220;running stats on votes&#8221; thing that I&#8217;m doing here.</p>
<p>I took the spreadsheet regarding AV from <a href="http://www.guardian.co.uk/news/datablog/2011/may/06/av-referendum-results-map">the Guardian</a> and then headed off in search for a similar spreadsheet containing local election results. The closest I could find was on the <a href="http://www.telegraph.co.uk/news/politics/local-elections/8497868/Local-elections-2011-council-by-council-results.html">Telegraph website</a>. This one only covers England I think. Most sites give a breakdown of the local election results in a format that isn&#8217;t easy to put into a spreadsheet (i.e., I&#8217;d have to sit here for hours cross-tabulating the ones that are missing), so I&#8217;m going with what I can get. I was surprised to find that our dear old government doesn&#8217;t retain a centralised copy of the results and put them on a website.</p>
<p>With two datasets in hand, one called <strong>av</strong> and one called <strong>les</strong> (<em>Local council ElectionS</em>), I was ready to start. I ran a merge of the two to get started:</p>
<pre>combined_base = merge(av, les)</pre>
<p>In both datasets, there is a column called <strong>name </strong>which is used to match everything up. As my AV dataset contains more rows than the local council elections dataset, I end up with only those areas in the AV dataset that also appeared in the local council elections dataset. This gave me 279 rows.</p>
<p>Next up: select only the local councils where the Conservatives, Labour or Lib Dems gained overall council control (indicated by the <strong>winner </strong>column). Then create a new column called <strong>win_label </strong>which is a textual version of the shortened names (these are C, Lab and LD) listed in <strong>winner</strong>.</p>
<pre>combined = combined_base[combined_base$winner=="C" | combined_base$winner=="Lab" | combined_base$winner=="LD",]</pre>
<pre>combined$win_label[combined$winner=="C"] = "Conservative"</pre>
<pre>combined$win_label[combined$winner=="Lab"] = "Labour"</pre>
<pre>combined$win_label[combined$winner=="LD"] = "Liberal Democrat"</pre>
<p>Next we can do a histogram of the number of councils where each party were victorious, compared to the proportion of the electorate in those councils who voted YES to AV:</p>
<pre>ggplot(combined)+</pre>
<pre>aes(x=yes_perc)+</pre>
<pre>geom_histogram()+</pre>
<pre>scale_x_continuous("Percentage of YES to AV votes")+</pre>
<pre>scale_y_continuous("Number of Local Councils")+</pre>
<pre>facet_wrap(facet=~win_label)</pre>
<p>The code gives us the following:</p>
<p><a href="http://www.psychwire.co.uk/wp-content/uploads/2011/05/av_hists.png"><img class="size-large wp-image-781 alignnone" title="av_hists" src="http://www.psychwire.co.uk/wp-content/uploads/2011/05/av_hists-927x1024.png" alt="" width="600" height="662" /></a></p>
<p>From the histograms, the defeat of the Lib Dems in the local elections is very clear. They hardly won anything.</p>
<p>Ok, so let&#8217;s take a look at it from a different angle. We have information available in the datasets regarding the percentage of people who voted in each area. Here&#8217;s the R code:</p>
<pre>ggplot(combined)+</pre>
<pre>aes(x=yes_perc, y=turnout_perc, colour=win_label)+</pre>
<pre>geom_point(size=4)+</pre>
<pre>scale_colour_manual(values = c("blue","red", "orange")) +</pre>
<pre>scale_x_continuous("Percentage of YES to AV votes")+</pre>
<pre>scale_y_continuous("Percentage of Electorate who Voted")</pre>
<p>Note the use of <strong>scale_colour_manual </strong>there to set each of the parties to their respective colours. I also resized the points within the <strong>geom_point</strong> command because the Lib Dem orange points were hard to see with the smaller default size.</p>
<p><a href="http://www.psychwire.co.uk/wp-content/uploads/2011/05/av_scat.png"><img class="size-large wp-image-785 alignnone" title="av_scat" src="http://www.psychwire.co.uk/wp-content/uploads/2011/05/av_scat-927x1024.png" alt="" width="600" height="662" /></a></p>
<p>Aside from the one rare instance where there was a high YES to AV vote and also a Lib Dem council being voted in (i.e., what would be expected), it seems there is a strong clustering towards a low proportion of YES votes.</p>
<p>One other point about this graph that stands out. Take a look at how the councils where Labour (red) were voted in tend to fall in areas where less of the electorate voted. When 45% or more voted, the Conservatives dominated, except for three Lib Dem wins.</p>
<p><span style="font-size: large;">Summary Stats</span></p>
<p>Finally, let&#8217;s look at some descriptive stats as a summary. Here&#8217;s the code.</p>
<pre>ddply(combined, c("Council"), summarize,</pre>
<pre>"Yes Percentage (mean)"=mean(yes_perc),</pre>
<pre>"Turnout Percentage (mean)"=mean(turnout_perc))</pre>
<p>And here&#8217;s the table:</p>
<p><a href="http://www.psychwire.co.uk/wp-content/uploads/2011/05/tabl.png"><img class="alignnone size-full wp-image-788" title="tabl" src="http://www.psychwire.co.uk/wp-content/uploads/2011/05/tabl.png" alt="" width="570" height="120" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychwire.co.uk/2011/05/charting-the-defeat-of-av-using-r-and-some-ggplot2-and-merge-operations-on-top/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Is R an ideal language to teach the fundamentals of programming to researchers with no experience in programming?</title>
		<link>http://www.psychwire.co.uk/2011/05/is-r-an-ideal-language-to-teach-the-fundamentals-of-programming-to-beginners/</link>
		<comments>http://www.psychwire.co.uk/2011/05/is-r-an-ideal-language-to-teach-the-fundamentals-of-programming-to-beginners/#comments</comments>
		<pubDate>Fri, 06 May 2011 11:00:00 +0000</pubDate>
		<dc:creator>hayward</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[psychology]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://www.psychwire.co.uk/?p=760</guid>
		<description><![CDATA[UPDATE: I&#8217;ve modified the title of this post a bit to clarify what I was really thinking when I wrote it. What I was really thinking was which programming language to choose to teach some fellow researchers how to get into the absolute basics of programming, out of the very limited set of languages I [...]]]></description>
				<content:encoded><![CDATA[<p>UPDATE: I&#8217;ve modified the title of this post a bit to clarify what I was really thinking when I wrote it. What I was really thinking was which programming language to choose to teach some fellow researchers how to get into the absolute basics of programming, out of the very limited set of languages I know. The tasks they need to do need only a minimal understanding of programming, and of R, so many of the issues that can be experienced won&#8217;t even come up for them. To put things into context, it only took two days for me to work out how to do everything that I need to do in R going from scratch, so it&#8217;s not as if I&#8217;m writing packages or doing anything particularly fancy myself, and these people who I will be teaching will be doing stuff that is less complicated than what I needed to do.</p>
<p>That being said, I&#8217;d like to thank those who commented for pointing out why R isn&#8217;t a great language for people starting out with programming. I&#8217;m still new to using R, so obviously don&#8217;t have the depth of experience with potential problems that others do, so it&#8217;s helpful to learn from others&#8217; experience (or should that be &#8220;misery&#8221;?)!  Python, Pascal and Ruby all sound like great options for getting into programming. I&#8217;m going to leave my initial post, with all it&#8217;s inaccuracies, intact below: first, because I think it&#8217;s good to have a record of what I have said so I can look back at how daft I was in the future, and second because, as people took the time to post comments, I don&#8217;t want the time they spent making comments and correcting me to have gone to waste. If I deleted most of what I said or removed the post, then their comments would seem odd or incorrect.</p>
<p>&#8212;</p>
<p><del><em>I&#8217;m helping out some colleagues learn programming from having zero experience with it in any shape or form. It&#8217;s quite a daunting task in some senses, because, well, it may not be easy! They are researchers, so they&#8217;ll need it for processing data and generating output, and perhaps processing BIG DATA at some point too.</em></del></p>
<p><del><em>After some debate about the best way to go ahead, I&#8217;ve settled with R as being my weapon of choice to train these lucky individuals. The choices were as follows &#8211; note that I don&#8217;t know that many programming languages, so it&#8217;s not a huge list. I thought it would be worth sharing the pros and cons of each.</em></del></p>
<p><del><em><span style="font-size: large;"><a href="http://www.php.net/">PHP</a></span></em></del></p>
<p><del><em><a href="http://www.psychwire.co.uk/wp-content/uploads/2011/05/php-med-trans-light.gif"><img class="size-full wp-image-766 alignright" title="php-med-trans-light" src="http://www.psychwire.co.uk/wp-content/uploads/2011/05/php-med-trans-light.gif" alt="" width="95" height="51" /></a>Pros: Dead easy to use. Nice and easy integration with databases which can be used to deal with data processing. Can be extended to, for example, generate images (a plus for these people who study visual cognition, so often need to make pretty pictures to show to participants in experiments). There&#8217;s also an immense number of tutorials and guides on the net, and people who aren&#8217;t into research can help you out just by knowing their PHP.</em></del></p>
<p>&nbsp;</p>
<p><del><em>Cons: Probably overkill. Running a webserver all the time can be a pain, even if XAMPP is used. It&#8217;s not easy (or even possible, as far as I am aware) to run statistical tests using PHP or any classes that can be added in.</em></del></p>
<p><span style="color: #c0c0c0;"><del><em><span style="font-size: large;"><a href="http://www.python.org/">Python</a></span></em></del></span></p>
<p><span style="color: #c0c0c0;"><del><em><span style="font-size: large;"><a href="http://www.psychwire.co.uk/wp-content/uploads/2011/05/python-logo.png"><img class="size-full wp-image-763 alignright" title="python-logo" src="http://www.psychwire.co.uk/wp-content/uploads/2011/05/python-logo.png" alt="" width="211" height="71" /></a></span></em></del></span></p>
<p><span style="color: #c0c0c0;"><del><em>Pros: Forces users to write clean code, and again it&#8217;s very easy to use. Possible to integrate with databases to churn through datasets. Like PHP, it can be used to generate images for use in experiments (pygame), and again there are plenty of examples and tutorials. Plenty of extensions to do stats and plot graphs (NumPy and Matplotlib). Oh, and it&#8217;s named after Monty Python. Ni.</em></del></span></p>
<p>&nbsp;</p>
<p><span style="color: #c0c0c0;"><del><em>Cons: again, probably overkill. Forcing people to worry about indentation can get horribly confusing when they are barely aware of what they are doing, and they can get tripped up. Just a personal issue I guess, but I&#8217;ve not quite managed to get to grips with OOP in python. Maybe that&#8217;s because I did it first in PHP and never could do more than crash my computer when trying to learn Java. Ho hum.</em></del></span></p>
<p><span style="color: #c0c0c0;"><del><em><span style="font-size: large;">Javascript</span></em></del></span></p>
<p><span style="color: #c0c0c0;"><del><em>Pros: Easy syntax, and its power is growing with the new HTML 5 specifications. I mention it because I recently saw <a href="http://lifehacker.com/5736011/learn-how-to-code-part-i-variables-and-basic-data-types">this illustration</a> of basic programming and it seemed worth considering. There&#8217;s no need to compile anything which is often good for beginners too.</em></del></span></p>
<p>&nbsp;</p>
<p><span style="color: #c0c0c0;"><del><em>Cons: not really intended for churning big datasets and the kind of things I have in mind. Quite a bit of the decent libraries out there need to be paid for to be used.</em></del></span></p>
<p><span style="color: #c0c0c0;"><del><em><span style="font-size: large;"><a href="http://www.r-project.org/">R</a></span></em></del></span></p>
<p><span style="color: #c0c0c0;"><del><em><a href="http://www.psychwire.co.uk/wp-content/uploads/2011/05/Rlogo.jpg"><img class="size-full wp-image-767 alignright" title="Rlogo" src="http://www.psychwire.co.uk/wp-content/uploads/2011/05/Rlogo.jpg" alt="" width="100" height="76" /></a>Pros: syntax is very simple, with few gotchas present in other languages (e.g., ending lines with a semicolon or forcing tabs in lines and so on). As it&#8217;s loosely typed, this can be both a blessing and a curse. It&#8217;s a blessing because users don&#8217;t have to worry about declaring variables. It&#8217;s a curse because they can slip into bad habits and not understand variable types properly. Oh, and I don&#8217;t need to say that it can work on all sorts of databases, churn through data very rapidly, generate images, run statistical tests and plot graphs that are of publication quality.</em></del></span></p>
<p><span style="color: #c0c0c0;"><del><em>Cons: Had to really think about this, but I guess that R is a nightmare to google for any kind of help when you&#8217;re stuck. I think it&#8217;s a fundamental issue relating to the fact that calling something a letter of the alphabet probably doesn&#8217;t help SEO rankings all that much. The official documentation would benefit from being a bit more like the <a href="http://uk2.php.net/manual/en/function.mysql-result.php">PHP documentation</a> (though maybe there is a site like that for R, I&#8217;ve just not found it), with users able to comment and give better examples than those provided initially. That being said, there are more blogs on R than you can shake even a very large proverbial stick at, which more than make up for it. I always search the legendary <a href="http://www.r-bloggers.com/">R-bloggers.com </a>search box before googling anything to do with R now. I&#8217;ve never had to look any further than that.</em></del></span></p>
<p><span style="color: #c0c0c0;"><del><em><span style="font-size: large;">Is R an ideal language to teach the fundamentals of programming to beginners?</span></em></del></span></p>
<p><span style="color: #c0c0c0;"><del><em>I think the answer is &#8220;yes&#8221;. The beginners I have in mind are researchers and have specific needs regarding data processing, and it would benefit them to learn how to run stats in R, opening up future possibilities as well (e.g., LMEs). I&#8217;ve not mentioned Matlab, which I know is a favourite for researchers, because (1) it&#8217;s a gigantic monster to download and install, (2) I don&#8217;t know it that well and (3) it&#8217;s prohibitively expensive. I was also tempted to evaluate the use of <a href="http://en.wikipedia.org/wiki/LOLCODE">LOLCODE</a> to see if there was any mileage in using it (&#8220;IM IN YR LOOP UPPIN YR VAR TIL BOTH SAEM VAR&#8221;).</em></del></span></p>
<p>&nbsp;</p>
<p><span style="color: #c0c0c0;"><del><em>I myself first dabbled in programming back when I had a <a href="http://en.wikipedia.org/wiki/File:ZX_Spectrum_Plus2.jpeg">Sinclair</a> back in the old days, and we did some very basic <a href="http://en.wikipedia.org/wiki/BASIC">BASIC</a> at primary school. Later on, I used BASIC to make emulators that mimicked my friends&#8217; phrases and behaviour. Some of them were spot on! I guess I&#8217;ve always been trying to model human behaviour. I&#8217;ll post up the material I use to teach my colleagues to help them out and have a permanent copy of the material we go through.</em></del></span></p>
<p>&nbsp;</p>
<p><span style="color: #c0c0c0;"><del><em>That&#8217;s it for now, please feel free to share any other languages you may have found to be good for beginners. I&#8217;m sure there are some things that I have missed.</em></del></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychwire.co.uk/2011/05/is-r-an-ideal-language-to-teach-the-fundamentals-of-programming-to-beginners/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Loops, Conditionals and Variables: A Basic Simulation in R</title>
		<link>http://www.psychwire.co.uk/2011/05/loops-conditionals-and-variables-a-basic-simulation-in-r/</link>
		<comments>http://www.psychwire.co.uk/2011/05/loops-conditionals-and-variables-a-basic-simulation-in-r/#comments</comments>
		<pubDate>Tue, 03 May 2011 12:00:23 +0000</pubDate>
		<dc:creator>hayward</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Eye Tracking]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[eye movements]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.psychwire.co.uk/?p=746</guid>
		<description><![CDATA[In this post, I&#8217;m going to go over some basics of using conditionals and loops in R. I&#8217;ll expand on the example I use here in future posts. The conditionals and loops will be used to create some dummy eye movement data. Background Before I get into the actual code itself, I should probably explain [...]]]></description>
				<content:encoded><![CDATA[<p>In this post, I&#8217;m going to go over some basics of using conditionals and loops in R. I&#8217;ll expand on the example I use here in future posts. The conditionals and loops will be used to create some dummy eye movement data.</p>
<p><span style="font-size: large;">Background</span></p>
<p>Before I get into the actual code itself, I should probably explain what eye movements are all about. It&#8217;s a pretty big topic but basically, the easy way to think about it is to consider that your eyes look at things in the environment that you&#8217;re interested in. Put in more scientific terminology, your eyes fixate (point to) objects or areas of the environment containing information that your brain and cognitive systems are trying to process in detail. This happens because, though you don&#8217;t realise it, the quality of the visual input from your eyes is actually very poor. You only have colour vision in the dead centre of your visual field (though never, ever notice it), and beyond the centre of your visual field, the input not only goes from being in colour to being in black-and-white, but the clarity and resolution drops off significantly as well.</p>
<p>The solution to the limited quality of visual input is to (1) utilise a load of systems that make you feel comfortable and safe, with everything neatly in colour and crystal clear and (2) to move your eyes around. A lot. You make 5-6 eye movements every second that you are awake, and, though you can of course have conscious control of them, most of the time, you let your eyes scoop up information in the outside world that is relevant so you can basically just get on with your life.</p>
<p>Your eyes are never truly still, though there are periods when they are still and information is taken in. These time periods are called <em>fixations </em>and are interesting because you take in information during fixations. The movements between fixations are called <em>saccades </em>(French for &#8216;jumps&#8217;), and, though you don&#8217;t realise it, you&#8217;re blind during these saccades. Saccades are short (around 60ms, though this depends on the task), and fixations are much longer (varies considerably, but here we&#8217;re talking about 200ms).</p>
<p><span style="font-size: large;">The Simulation</span></p>
<p>Here I&#8217;m going to simulate people looking at four different objects: a square, a circle, a star and a triangle. If you imagine a display is drawn out in front of a participant, and these four objects are present. The participant&#8217;s job is to locate a circle. Once they have done that, they press a button and the trial ends. However, if they don&#8217;t find the circle, they can also give up, but they won&#8217;t do that straight away.</p>
<p>Let&#8217;s begin!</p>
<p><span style="font-size: large;">The Code</span></p>
<p>We begin by creating a dataframe called <strong>fix_table</strong>. It has 10,000 rows, given by the <strong>seq</strong> function.</p>
<pre><span style="font-size: medium;">fix_table &lt;-data.frame(seq(1:10000))</span></pre>
<p>Next we set some default values and create some columns. <strong>Trial </strong>is the simulated trial number that we&#8217;re in. <strong>Object </strong>is the object the participant is looking at, be it the square, circle, star or triangle. <strong>Fix_index</strong> is the current index of the fixation &#8211; this gets reset to 1 at the start of each trial.</p>
<pre><span style="font-size: medium;">fix_table$trial = 0</span></pre>
<pre><span style="font-size: medium;">fix_table$object = "null"</span></pre>
<pre><span style="font-size: medium;">fix_table$fix_index = 0</span></pre>
<p>Now we set some defaults before running the main loop of the code. <strong>Objects </strong>is the list of different objects presented in each trial to look at. <strong>Fix_index </strong>begins at 1 because of it being the first fixation in a trial. <strong>Trial_index</strong> starts at one for the first trial:</p>
<pre><span style="font-size: medium;">objects = list("circle", "square", "star", "triangle")</span></pre>
<pre><span style="font-size: medium;">fix_index = 1</span></pre>
<pre><span style="font-size: medium;">trial_index = 1</span></pre>
<p><span style="font-size: large;">The Simulation Loop</span></p>
<p>Now for the actual loop that does the simulation itself. It&#8217;s a <strong>for </strong>loop that goes through each row of the <strong>fix_table </strong>dataframe, starting at 1 (the first row) and ending at the final row, determined by the number of rows function or <strong>nrow(fix_table)</strong>.</p>
<pre><span style="font-size: medium;">for (row in 1:nrow(fix_table)) {</span></pre>
<pre><span style="color: #ff0000;"><em><span style="font-size: medium;">... code goes here...</span></em></span></pre>
<pre><span style="font-size: medium;">}</span></pre>
<p>So, what code do we want to go into the loop? We begin by setting the basic information for that row, updating the fixation index and trial index values, like this:</p>
<pre><span style="font-size: medium;">fix_table[row,"fix_index"] = fix_index</span></pre>
<pre><span style="font-size: medium;">fix_index = fix_index + 1</span></pre>
<pre><span style="font-size: medium;">fix_table[row, "trial"] = trial_index</span></pre>
<p>Next we randomly sample one of the objects to be looked at by the participant:</p>
<pre><span style="font-size: medium;">current = sample(objects,1)[1]</span></pre>
<pre><span style="font-size: medium;">fix_table[row, "object"] = current</span></pre>
<p><strong>Sample </strong>randomly selects 1 object from the <strong>objects </strong>list, and then gets assigned to <strong>current</strong>. We then update the dataframe called <strong>fix_table </strong>with the name of the current object being looked at.</p>
<p>After this, we need to decide whether a trial is going to end with the current fixation:</p>
<pre><span style="font-size: medium;">p_end &lt;-rnorm(1, mean=1/fix_index, sd=0.3)</span></pre>
<pre><span style="font-size: medium;">if (p_end&gt;1 | current=="circle") { trial_end=TRUE }</span></pre>
<pre><span style="font-size: medium;">else {trial_end=FALSE}</span></pre>
<p>This is some made-up code that first of all creates a sort-of random probability value that the trial will end. The trial is more likely to end as more fixations are made, and we generate a normally distributed random number with mean of 1/fix_index and standard deviation of 0.3. If this value is greater than 1, the trial will end. Alternatively, as participants are searching for a circle, if they look at the circle, the trial will end. This is determined by the use of the <strong>or </strong>condtional, signified by the vertical pipe, <strong>|</strong>. Otherwise, the trial continues.</p>
<p>If the trial is set to end, we need to reset some important values for the new trial to begin. We do this via the following:</p>
<pre><span style="font-size: medium;">if (trial_end==TRUE){</span></pre>
<pre><span style="font-size: medium;">trial_index = trial_index + 1</span></pre>
<pre><span style="font-size: medium;">fix_index = 1</span></pre>
<pre><span style="font-size: medium;">}</span></pre>
<p>Simple!</p>
<p><span style="font-size: large;">The Full Code</span></p>
<p>Here we go:</p>
<pre><span style="font-size: medium;">fix_table &lt;-data.frame(seq(1:100))</span>
<span style="font-size: medium;">
fix_table$trial = 0</span></pre>
<pre><span style="font-size: medium;">fix_table$object = "null"</span></pre>
<pre><span style="font-size: medium;">fix_table$fix_index = 0</span></pre>
<pre><span style="font-size: medium;">objects = list("circle", "square", "star", "triangle")</span></pre>
<pre><span style="font-size: medium;">fix_index = 1</span></pre>
<pre><span style="font-size: medium;">trial_index = 1</span></pre>
<pre><span style="font-size: medium;">for (row in 1:nrow(fix_table)) {</span></pre>
<pre><span style="color: #ff0000;"><em><span style="font-size: medium;"># add basics</span></em></span></pre>
<pre><span style="font-size: medium;">fix_table[row,"fix_index"] = fix_index</span></pre>
<pre><span style="font-size: medium;">fix_index = fix_index + 1</span></pre>
<pre><span style="font-size: medium;">fix_table[row, "trial"] = trial_index</span></pre>
<pre><span style="color: #ff0000;"><em><span style="font-size: medium;"># decide which object we are on this time</span></em></span></pre>
<pre><span style="font-size: medium;">current = sample(objects,1)[1]</span></pre>
<pre><span style="font-size: medium;">fix_table[row, "object"] = current</span></pre>
<pre><span style="color: #ff0000;"><em><span style="font-size: medium;"># determine if trial ends!</span></em></span></pre>
<pre><span style="font-size: medium;">p_end &lt;-rnorm(1, mean=1/fix_index, sd=0.3)</span></pre>
<pre><span style="font-size: medium;">if (p_end&gt;1 | current=="circle") { trial_end=TRUE }</span></pre>
<pre><span style="font-size: medium;">else {trial_end=FALSE}</span></pre>
<pre><span style="color: #ff0000;"><em><span style="font-size: medium;"># if the trial ends, reset values</span></em></span></pre>
<pre><span style="font-size: medium;">if (trial_end==TRUE) {</span></pre>
<pre><span style="font-size: medium;">trial_index = trial_index + 1</span></pre>
<pre><span style="font-size: medium;">fix_index = 1   }</span></pre>
<pre><span style="font-size: medium;">}</span></pre>
<p>Finally, let&#8217;s make a histogram of how long it takes for a trial to end:</p>
<pre><span style="font-size: medium;">summary_table &lt;-ddply(fix_table, c("trial"), </span></pre>
<pre><span style="font-size: medium;">summarise, max=max(fix_index))hist(summary_table$max)</span></pre>
<p>Which gives the following:</p>
<p><a href="http://www.psychwire.co.uk/wp-content/uploads/2011/05/trial_length.png"><img class="size-full wp-image-751 alignleft" title="trial_length" src="http://www.psychwire.co.uk/wp-content/uploads/2011/05/trial_length.png" alt="" width="500" height="400" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>That&#8217;s it for now! More complex aspects will be added later.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychwire.co.uk/2011/05/loops-conditionals-and-variables-a-basic-simulation-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Aggregation in R: plyr, sqldf and data.table</title>
		<link>http://www.psychwire.co.uk/2011/04/data-aggregation-in-r-plyr-sqldf-and-data-table/</link>
		<comments>http://www.psychwire.co.uk/2011/04/data-aggregation-in-r-plyr-sqldf-and-data-table/#comments</comments>
		<pubDate>Thu, 28 Apr 2011 11:12:03 +0000</pubDate>
		<dc:creator>hayward</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[data.table]]></category>
		<category><![CDATA[plyr]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[sqldf]]></category>

		<guid isPermaLink="false">http://www.psychwire.co.uk/?p=719</guid>
		<description><![CDATA[I&#8217;ve previously put up a couple of posts about aggregating data in R. In this post, I&#8217;m going to be trying some other alternative methods for aggregating the dataset. Before I begin, I&#8217;d like to thank Matthew Dowle for highlighting these to me. It&#8217;s a bit daunting at first, deciding which method of aggregating data is best. [...]]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve previously put up a <a href="http://www.psychwire.co.uk/2011/04/pivot-tables-and-medians-in-r/" target="_blank">couple</a> of <a href="http://www.psychwire.co.uk/2011/04/aggregate-function-in-r-making-your-life-easier-one-mean-at-a-time/" target="_blank">posts</a> about aggregating data in R. In this post, I&#8217;m going to be trying some other alternative methods for aggregating the dataset. Before I begin, I&#8217;d like to thank Matthew Dowle for highlighting these to me. It&#8217;s a bit daunting at first, deciding which method of aggregating data is best. So I decided to give them all a go to see what they were like. Let&#8217;s go for it!</p>
<p>For this post, I&#8217;m going to be using the <strong>lexdec </strong>dataset that comes with the <strong>languageR </strong>package. For information see <a href="http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=languageR:lexdec" target="_blank">here</a>. I&#8217;ve called it full_list here, in order to play around with it. The details of the dataset are not that important; it&#8217;s just a case of getting hold of some data from human subjects (i.e., what I&#8217;m used to!).</p>
<p><span style="font-size: large;">The Target</span></p>
<p>Before we get into the functions themselves, let&#8217;s take a look at the aggregated data that we want. It has the mean, median and standard error of the <strong>RT</strong> variable (RT stands for Response Time, or time taken to press a button). I want to get some summary statistics of this variable for every level of each participant (<strong>Subject </strong>column) and for every level of the <strong>Class </strong>column. <em> </em>So, ultimately, the target is the following summary table (note that I&#8217;ve truncated this as there are lots of participants):</p>
<p><a href="http://www.psychwire.co.uk/wp-content/uploads/2011/04/target.png"><img class="aligncenter size-full wp-image-737" title="target" src="http://www.psychwire.co.uk/wp-content/uploads/2011/04/target.png" alt="" width="500" height="400" /></a></p>
<p><span style="font-size: large;">A tool by any other name: <em><strong>plyr</strong></em></span></p>
<p>Let&#8217;s begin with <strong><a href="http://had.co.nz/plyr/" target="_blank">plyr</a></strong>. The power of plyr comes from the fact that it splits up data, runs a function on the split-up data, and then sticks it all back together. It has a wide variety of useful aggregation functions, but here I&#8217;m going to use <strong>ddply</strong>. This function gives as it&#8217;s output a <strong>dataframe </strong>and gives as output another <strong>dataframe. </strong>The plyr functions are written in the syntax of <em>XYply </em>where X is the input object type and Y is the output object type. In this case, both ds of ddply stand for dataframe. Let&#8217;s look at some initial code:</p>
<pre>ddply(full_list, c("Subject","Class"), function(df)mean(df$RT))</pre>
<p>This is fine, and gives us mean DPS values for each class and spec. But there&#8217;s a problem. The &#8220;mean&#8221; column is labelled V1, which isn&#8217;t that helpful, especially if we have multiple columns computed (i.e., ending up with V1, V2, V3 makes it hard to remember which column is which). So let&#8217;s get the column labelled:</p>
<pre>ddply(full_list, c("Subject","Class"), function(df) return(c(AVERAGE=mean(df$RT))))</pre>
<p>Great! Now let&#8217;s add some more columns to output:</p>
<pre>ddply(full_list, c("Subject","Class"), function(df) return(c(AVERAGE=mean(df$RT),</pre>
<pre>MEDIAN=median(df$RT),SE=sqrt(var(df$RT)/length(df$RT)))))</pre>
<p>This then gives us the target aggregated table pictured above.</p>
<p><span style="font-size: large;">It needs no sequel: <strong><em>sqldf</em></strong></span></p>
<p>Next up is <strong><a href="http://code.google.com/p/sqldf/" target="_blank">sqldf</a></strong>. The name gives is away slightly: it&#8217;s a library for running SQL statements on data frames. SQL stands for Structured Query Language, with data stored on tables in a database. There are a number of SQL database types, which are all reasonably similar, and sqldf uses as default the incredibly popular <strong>SQLite</strong>. To get the target aggregated data using this, it&#8217;s a case of running a simple query:</p>
<pre>sqldf("SELECT SUBJECT, CLASS, AVG(RT) AS AVERAGE, MEDIAN(RT) AS MEDIAN,</pre>
<pre>SQRT((VARIANCE(RT)/COUNT(RT))) AS SE</pre>
<pre>FROM full_list</pre>
<pre>GROUP BY SUBJECT, CLASS")</pre>
<p>Note that to get the number of rows involved, we need to use <strong>COUNT </strong>rather than <strong>LENGTH</strong>. Easy!</p>
<p><span style="font-size: large;">How the tables have turned: <strong><em>data.table</em></strong></span></p>
<p>The last library to look at here is <a href="http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table">data.table</a>. This has the benefit of being considered the <em>roadrunner </em>of aggregation functions. It&#8217;s damn fast! This can be achieved as follows:</p>
<pre>dps_dt = data.table(full_list)
dps_dt[,list(AVERAGE=.Internal(mean(RT)), MEDIAN=median(RT),</pre>
<pre>SE= sqrt(var(RT)/length(RT))),by=list(Subject,Class)]</pre>
<p>Note that the first line takes our data.frame called <em>full_list </em>and casts it as a data.table object type. Here, two lists are used to do two things: (1) create the column names and (2) group the data by class and spec. The first list call sets up the column names and the calculations that need to be run. The second list gets fed to the <strong>by </strong>function which then aggregates by class and spec.</p>
<p><span style="font-size: large;">Summary</span></p>
<p>So, there we have three additional ways to aggregate data using R, to be added to <a href="http://www.psychwire.co.uk/2011/04/aggregate-function-in-r-making-your-life-easier-one-mean-at-a-time/" target="_blank">tapply()</a> and <a href="http://www.psychwire.co.uk/2011/04/pivot-tables-and-medians-in-r/" target="_blank">aggregate()</a> which I have covered previously. Whichever one you end up using will probably depend on your own experience with using them (or, for example, whether you are familiar with SQL in the cae of sqldf), what needs you have, and how fast you need your aggregation processing to be.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychwire.co.uk/2011/04/data-aggregation-in-r-plyr-sqldf-and-data-table/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Further Adventures in Visualisation with ggplot2</title>
		<link>http://www.psychwire.co.uk/2011/04/further-adventures-in-visualisation-with-ggplot2/</link>
		<comments>http://www.psychwire.co.uk/2011/04/further-adventures-in-visualisation-with-ggplot2/#comments</comments>
		<pubDate>Mon, 25 Apr 2011 12:00:52 +0000</pubDate>
		<dc:creator>hayward</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[boxplot]]></category>
		<category><![CDATA[correlation]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[scatterplot]]></category>

		<guid isPermaLink="false">http://www.psychwire.co.uk/?p=706</guid>
		<description><![CDATA[So I previously took a look at some data of player performance from a computer game. In this post, I&#8217;m going to do some further visualisations using ggplot2. The data consists of different types of player character, different roles for those characters, and their overall damage output (the unit here is damage per second, or [...]]]></description>
				<content:encoded><![CDATA[<p>So I <a href="http://www.psychwire.co.uk/2011/04/sexy-geeky-graphs-using-ggplot2-in-r/" target="_blank">previously</a> took a look at some data of player performance from a computer game. In this post, I&#8217;m going to do some further visualisations using ggplot2. The data consists of different types of player character, different roles for those characters, and their overall damage output (the unit here is damage per second, or DPS). To obtain the data, I took the top 40 highest scores from <a href="http://www.worldoflogs.com/" target="_blank">this website</a> and pasted them into a spreadsheet (i.e., I didn&#8217;t try to kill their server by scraping the data, I copied it all by hand. How nice!).</p>
<p>So let&#8217;s begin. First, I want to take a look at some boxplots. But I don&#8217;t want them to be ordinary boxplots: I want them to be ordered by how well the players were able to score. So, I begin by sorting them by their median, and then plotting them.</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://www.psychwire.co.uk/wp-content/plugins/wp-codebox/wp-codebox.php?p=706&amp;download=download.txt">download.txt</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p7065"><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code" id="p706code5"><pre class="rsplus" style="font-family:monospace;">ordered_spec <span style="color: #080;">=</span> <a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/with.html"><span style="color: #0000FF; font-weight: bold;">with</span></a><span style="color: #080;">&#40;</span>full_list, <span style="color: #0000FF; font-weight: bold;">reorder</span><span style="color: #080;">&#40;</span>spec, DPS, <span style="color: #0000FF; font-weight: bold;">median</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
ggplot<span style="color: #080;">&#40;</span>full_list, aes<span style="color: #080;">&#40;</span>ordered_spec, DPS, fill <span style="color: #080;">=</span> <a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/class.html"><span style="color: #0000FF; font-weight: bold;">class</span></a><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">+</span>
geom_boxplot<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">+</span>
opts<span style="color: #080;">&#40;</span>axis.<span style="">text</span>.<span style="">x</span> <span style="color: #080;">=</span> theme_text<span style="color: #080;">&#40;</span>angle <span style="color: #080;">=</span> <span style="color: #ff0000;">90</span>, hjust <span style="color: #080;">=</span> <span style="color: #ff0000;">0</span>, size<span style="color: #080;">=</span><span style="color: #ff0000;">7</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>

<p>The boxplot is produced from the simple <strong>geom_boxplot() </strong>command. To order the data, I used the <strong>reorder </strong>command, which reorders the <strong>spec </strong>factor according to the <strong>median </strong>of <strong>DPS </strong>. This then gets applied to the <strong>aes</strong>thetic mappings of the <strong>ggplot() </strong>command to reorder the output.</p>
<p>A quick note: initially when trying to reorder factors and output for plots, I tried to do this using ggplot itself. This was a mistake, as it&#8217;s not easy to do so. After much hunting around, I saw that it&#8217;s better to reorder you factors <em>before </em>you put them into ggplot, then the output will come out in the right order.</p>
<p>Anyway, here&#8217;s the graph:</p>
<p><a title="wow_boxplots by haywardgodwin, on Flickr" href="http://www.flickr.com/photos/62000959@N05/5643491411/"><img src="http://farm6.static.flickr.com/5047/5643491411_985f76d7c0.jpg" alt="wow_boxplots" width="383" height="500" /></a></p>
<p>You can see that there&#8217;s quite a range of performance. The poorer-performing groups are, for the most part, those who have other roles so shouldn&#8217;t be high on DPS. That is, all apart from <em>subtlety</em>, which is not doing so well. That too, really has another role, but it&#8217;s surprising to see it so low (I remember when it was quite good for DPS, about five years ago though now).</p>
<p>Next, let&#8217;s take a look at something slightly different. In the data, we also have the <strong>seconds </strong>column, which lets us know how many seconds a player was active for. Perhaps it&#8217;s the case that players get tired, so a plot of their performance by how long they were active for might be revealing. It may alternatively be the case that a shorter period of time will benefit players because they can use special abilities which increase their damage output &#8211; though these abilities can only be used every few minutes. This could mean that a player who uses all of their special abilities and then dies (so their time stops) may have a high DPS output.</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://www.psychwire.co.uk/wp-content/plugins/wp-codebox/wp-codebox.php?p=706&amp;download=download.txt">download.txt</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p7066"><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code" id="p706code6"><pre class="rsplus" style="font-family:monospace;">ggplot<span style="color: #080;">&#40;</span>full_list_dps<span style="color: #080;">&#41;</span>
<span style="color: #080;">+</span>aes<span style="color: #080;">&#40;</span>x<span style="color: #080;">=</span>seconds, y<span style="color: #080;">=</span>DPS, colour<span style="color: #080;">=</span><a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/class.html"><span style="color: #0000FF; font-weight: bold;">class</span></a><span style="color: #080;">&#41;</span>
<span style="color: #080;">+</span>geom_point<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">+</span>scale_colour_hue<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>

<p>Here, we just need to specify the <strong>x </strong>and <strong>y </strong>axis values. The points are plotted using the <strong>geom_point() </strong>command. Colours are added using <strong>scale_colour_hue()</strong>. There are a wide variety of colour options that can be used. Here&#8217;s the graph:</p>
<p><a title="scatter_all_by_time by haywardgodwin, on Flickr" href="http://www.flickr.com/photos/62000959@N05/5643491583/"><img src="http://farm6.static.flickr.com/5044/5643491583_ce0ec50af8.jpg" alt="scatter_all_by_time" width="383" height="500" /></a></p>
<p>There appears to be a large clustering together, though I guess it seems like there is a weak downwards trend. Let&#8217;s just run a correlation for the sake of it, shall we?</p>

<div class="wp_codebox_msgheader"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left2">Download <a href="http://www.psychwire.co.uk/wp-content/plugins/wp-codebox/wp-codebox.php?p=706&amp;download=download.txt">download.txt</a></span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p7067"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p706code7"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">cor.<span style="">test</span></span><span style="color: #080;">&#40;</span>full_list_dps$seconds, full_list_dps$DPS<span style="color: #080;">&#41;</span></pre></td></tr></table></div>

<p>The output says there is a significant (<em>p</em>&lt;.0001) negative correlation of -0.39.</p>
<p>Finally, let&#8217;s break it down and <strong>facet</strong> the output, so we can look at each class individually.</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p706code8'); return false;">View Code</a> RSPLUS</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p7068"><td class="code" id="p706code8"><pre class="rsplus" style="font-family:monospace;">ggplot<span style="color: #080;">&#40;</span>full_list_dps<span style="color: #080;">&#41;</span><span style="color: #080;">+</span>aes<span style="color: #080;">&#40;</span>x<span style="color: #080;">=</span>seconds, y<span style="color: #080;">=</span>DPS, colour<span style="color: #080;">=</span>spec<span style="color: #080;">&#41;</span><span style="color: #080;">+</span>
facet_wrap<span style="color: #080;">&#40;</span>facet<span style="color: #080;">=</span>~<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/class.html"><span style="color: #0000FF; font-weight: bold;">class</span></a><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>
geom_point<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span> scale_colour_hue<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>

<p>That gives us this:</p>
<p><a title="scatter_by_time_faceted by haywardgodwin, on Flickr" href="http://www.flickr.com/photos/62000959@N05/5644059436/"><img src="http://farm6.static.flickr.com/5226/5644059436_e7b42795d9.jpg" alt="scatter_by_time_faceted" width="383" height="500" /></a></p>
<p>That&#8217;s all for now &#8211; up next will be some methods for summarising the data, followed by statistical tests (starting with ANOVAs, then moving onto LMEs). Again, just note that this is for fun, and not intended to be an accurate account of player performance by any remote stretch of the imagination.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychwire.co.uk/2011/04/further-adventures-in-visualisation-with-ggplot2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Sexy, Geeky Graphs using ggplot2 in R</title>
		<link>http://www.psychwire.co.uk/2011/04/sexy-geeky-graphs-using-ggplot2-in-r/</link>
		<comments>http://www.psychwire.co.uk/2011/04/sexy-geeky-graphs-using-ggplot2-in-r/#comments</comments>
		<pubDate>Fri, 22 Apr 2011 08:00:15 +0000</pubDate>
		<dc:creator>hayward</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.psychwire.co.uk/?p=691</guid>
		<description><![CDATA[So I&#8217;ve been looking for some data to play with while learning R, other than the data I&#8217;m analysing for various experiments and papers I&#8217;m working on. I thought to myself, &#8220;Hey, this R stuff is pretty geeky. Can I engage in a higher level of geekiness?&#8221; And I think I&#8217;ve found a way: using [...]]]></description>
				<content:encoded><![CDATA[<p>So I&#8217;ve been looking for some data to play with while learning R, other than the data I&#8217;m analysing for various experiments and papers I&#8217;m working on. I thought to myself, &#8220;Hey, this R stuff is pretty geeky. Can I engage in a higher level of geekiness?&#8221; And I think I&#8217;ve found a way: using R to analyse player performance in a computer game.</p>
<p><span style="font-size: medium;">Background: The Data</span></p>
<p>The game in question is the epic cash cow known as <a href="http://us.battle.net/wow/en/" target="_blank">World of Warcraft</a> (otherwise known as WoW to some, or pronounced &#8220;Woo&#8221; by hilarious people), made by dear old Blizzard Entertainment. I&#8217;ve been a long-time player of Blizzard games, starting with a demo of Warcraft 2 that came on a CD with a magazine (hey, CDs, remember when games came on CDs?). Since then it&#8217;s been the works&#8230; Warcraft 3, Starcraft (1 and a tiny bit of 2), Diablo 2 (for far too long). I also have in my house a copy of Lost Vikings on the SNES (my other half&#8217;s, she&#8217;s as bad with this stuff as I am, though it does mean we have two SNES machines). Sadly, I don&#8217;t get time to play games these days &#8211; though I did used to raid a lot when I was an undergrad, I don&#8217;t really have time now.</p>
<div class="wp-caption alignnone" style="width: 510px"><a title="WoWScrnShot_082810_130129 by haywardgodwin, on Flickr" href="http://www.flickr.com/photos/62000959@N05/5641328349/"><img src="http://farm6.static.flickr.com/5056/5641328349_7427404ec0.jpg" alt="WoWScrnShot_082810_130129" width="500" height="380" /></a><p class="wp-caption-text">Fighting the good fight: taking on a Pome Wraith - a zombie with an apple on its head</p></div>
<p>Anyway, in plain English &#8211; for those of you who haven&#8217;t heard of this game before now- the point of a large part of the game is to take your character that you have control of and go and bash large, unpleasant creatures on the noggin. After a while, those creatures die and leave you with shiny prizes and loot. It might sound a bit simplistic, but actually it gets quite complex: there are a large number of decisions you need to make in order to maximise your performance, you need to be very fast to react to changing circumstances in the environment, and you need to work with a set of other people in parallel to get the job done. For an example of stuff people need to learn in order to do a decent job, take a look <a href="http://fluiddruid.net/the-fluid-druid-4-0-1-cataclysm-feral-dps-guide/" target="_blank">here</a>.</p>
<p>All of this (and the fact that there is an enormous players numbering many millions across the globe) has meant that there has been a drive to get the most out of what players can do. There&#8217;s a sizeable community of players who run various models and simulations to work out the best ways to do things. This has made me often wonder if the player performance could also benefit from being analysed in a post-hoc manner. Rather than using models and simulations, why not take actual player performance and see how people fare?</p>
<p>Well, there are problems with that: not everyone is very good at the game. Plus, that would involve a lot of data collection (which I assume Blizzard do in some shape or form, by the way, from comments they have made at various times). So, let&#8217;s go for a different approach. Let&#8217;s pick the <em>best </em>players and see how they manage. These best players will serve as an approximation to the <em>ideal </em>maximum of what can be achieved. Now, here&#8217;s where you may be thinking &#8220;hrmmmm&#8221;, but please, stick with it. This is more meant to be an entertaining illustration to what various functions in R can do, rather than a set of data being analysed that I intend to stand by and be certain can be trusted. It&#8217;s all a bit of fun.</p>
<p>Fortunately, there&#8217;s an easy way to get the best scores that players have achieved: <a href="http://www.worldoflogs.com/" target="_blank">World of Logs</a> has a ranking system for the best scores on various fights in the game. So, I went there, found an encounter, and started copying and pasting the ranks into a spreadsheet. I picked the top 40 scores for <a href="http://www.wowhead.com/npc=41376#." target="_blank">Nefarian</a>. He&#8217;s a big dragon who was killed in a previous version of the game, but is back now with a headache or something. Actually, I remember him toasting me a few times (I was a rogue back then, and our tank didn&#8217;t understand the whole &#8216;rotate the giant puppy&#8217; part of the rogue class call).</p>
<p><span style="font-size: medium;">Getting into ggplot2</span></p>
<p>Now that I have my data set up, I&#8217;m going to do some basic graphs using ggplot2. Now, if you&#8217;re like me and have seen some examples of what ggplot2 can do, you might have thought &#8220;oh my, that looks sexy!&#8221;. And then you tried to work out how to make nice-looking graphs and became somewhat unstuck. Trust me, though, it&#8217;s worth persevering with, because ggplot&#8217;s power comes from its flexibility. I used to make my graphs using Sigmaplot, but now I have a graph format that I like, it&#8217;s a case of copy and pasting things around to get very nice graphs instantly.</p>
<p>I initially started trying to use the qplot() funciton, but, as I understand, it is limited in various ways compared to what the mighty ggplot() function can do. So let&#8217;s stick with ggplot(), or else you&#8217;ll have to learn how to do things twice, and that&#8217;s no fun at all.</p>
<p>The basic way that ggplot() works is very similar to a number of other programming languages when it comes to putting together images (e.g., pygame images and image creation in PHP &#8211; I&#8217;m sure it&#8217;s similar to others too, but those are all I&#8217;ve used). Essentially, you stack a set of options and commands on top of a blank canvas. So you start with nothing, then you say, &#8220;right, let&#8217;s make a plot&#8221;, then you start building things into it. You want points drawn? Stack them on the canvas. You want error bars? Stack them on, too. If you don&#8217;t tell it what to do, it will, in some cases, make assumptions about what you want, and go with the defaults. For some functions and programs, the defaults are horrible. This is not the case with ggplot: the defaults are <em>awesome</em>.</p>
<p>So here I&#8217;m just going to do something very simple to illustrate how you can build up options and commands to make a set of graphs. I&#8217;m basing this on an example from the <a href="http://had.co.nz/ggplot2/" target="_blank">ggplot documentation</a>. I&#8217;m going to make a series of density plots of player Damage Per Second (DPS, the standard indicator of performance, and the more the better!) and compare the various <em>specialisations </em>(specs) which are sub-components of the various <em>classes </em>in the game. Depending on what you want to do, you might choose one spec over another. Similarly, depending on what you want to do, you might pick one specific class. Say you want to turn into a bear: you&#8217;d be a druid. If you want to be skirt-wearing magician: you&#8217;d be a mage. And so on! Anyway, on with the code:</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p691code10'); return false;">View Code</a> RSPLUS</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p69110"><td class="code" id="p691code10"><pre class="rsplus" style="font-family:monospace;">ggplot<span style="color: #080;">&#40;</span>full_list, aes<span style="color: #080;">&#40;</span>DPS, fill <span style="color: #080;">=</span> spec<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">+</span>
facet_wrap<span style="color: #080;">&#40;</span>facet<span style="color: #080;">=</span>~<a href="http://astrostatistics.psu.edu/su07/R/html/graphics/html/class.html"><span style="color: #0000FF; font-weight: bold;">class</span></a><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>
geom_density<span style="color: #080;">&#40;</span>alpha <span style="color: #080;">=</span> <span style="color: #ff0000;">0.2</span><span style="color: #080;">&#41;</span> <span style="color: #080;">+</span>
scale_x_continuous<span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Damage Per Second (DPS)&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">+</span>
opts<span style="color: #080;">&#40;</span>axis.<span style="">text</span>.<span style="">x</span> <span style="color: #080;">=</span> theme_text<span style="color: #080;">&#40;</span>angle <span style="color: #080;">=</span> <span style="color: #ff0000;">90</span>, hjust <span style="color: #080;">=</span> <span style="color: #ff0000;">0</span>, size<span style="color: #080;">=</span><span style="color: #ff0000;">7</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>

<p>Note the &#8220;+&#8221; symbols at the end of each line. The + is used to add additional options to the ggplot command, but, if you are running them from a script, you&#8217;ll need to ensure that, if you have multiple +options on multiple lines, you need to add the + symbol at the end of a line, not the start of a line, or it won&#8217;t run. That took me a while to work out!</p>
<p>Anyway, the first line tells ggplot to use the dataset I have called <strong>full_list</strong>. The next command <strong>aes</strong>, starts outlining aesthetic mappings for the plot to use. Here I define my x-axis by entering <strong>DPS</strong>. Next I tell it to colour the different plots by <strong>spec </strong>by using the <strong>fill </strong>command.</p>
<p>Next comes <strong>facet_wrap </strong>which splits up the graphs like the <strong>lattice </strong>function by the <strong>class </strong>factor. This will produce one graph for each class.</p>
<p>The third line adds a <strong>geom_density </strong>or density plot element. The transparency (<strong>alpha</strong>) is set to 0.2 to enable you to see how the density plots overlap.</p>
<p>The fourth line sets the x-axis title using the <strong>scale_x_continuous </strong>command. Note that if your x-axis is a factor you need to use <strong>scale_x_discrete </strong>instead.</p>
<p>Finally we have the <strong>opts</strong> or options. There are a huge number of options, the best list of which I&#8217;ve found is <a href="https://github.com/hadley/ggplot2/wiki/%2Bopts%28%29-List" target="_blank">here</a>. Here I&#8217;ve set the x-axis text to be angled and therefore easier to read without overlapping.</p>
<p>Now, let&#8217;s take a look at the output:</p>
<div class="wp-caption alignnone" style="width: 393px"><a title="wow by haywardgodwin, on Flickr" href="http://www.flickr.com/photos/62000959@N05/5641328541/"><img src="http://farm6.static.flickr.com/5264/5641328541_9f42243cd5.jpg" alt="wow" width="383" height="500" /></a><p class="wp-caption-text">The full set of plots. There may be too many specs and colours here!</p></div>
<p>You can see that some specs of different classes do better than others. Some aren&#8217;t supposed to do much damage, as they have other roles (e.g., the ones with &#8220;prot&#8221; in the name). Again, please don&#8217;t take this as a serious attempt at comparing the specs and classes, it&#8217;s just some data to play around with and explore for illustrative purposes.</p>
<p>The next steps will be to try out various ways of summarising the data (e.g. data.table, aggregate, plyr), after which I&#8217;ll start running some statistical tests.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychwire.co.uk/2011/04/sexy-geeky-graphs-using-ggplot2-in-r/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Aggregate Function in R: Making your life easier, one mean at a time</title>
		<link>http://www.psychwire.co.uk/2011/04/aggregate-function-in-r-making-your-life-easier-one-mean-at-a-time/</link>
		<comments>http://www.psychwire.co.uk/2011/04/aggregate-function-in-r-making-your-life-easier-one-mean-at-a-time/#comments</comments>
		<pubDate>Wed, 20 Apr 2011 08:00:40 +0000</pubDate>
		<dc:creator>hayward</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.psychwire.co.uk/?p=677</guid>
		<description><![CDATA[I previously posted about calculating medians using R. I used tapply to do it, but I&#8217;ve since found something that feels easier to use (at least to me). ?View Code RSPLUSaggregated_output = aggregate&#40;DV ~ IV1 * IV2, data=data_to_aggregate, FUN=median&#41; aggregated_output The above code saves an aggregated dataset to aggregated_output and gives you the median in [...]]]></description>
				<content:encoded><![CDATA[<p>I <a href="http://www.psychwire.co.uk/2011/04/pivot-tables-and-medians-in-r/">previously posted</a> about calculating medians using R. I used <strong>tapply </strong>to do it, but I&#8217;ve since found something that feels easier to use (at least to me).</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p677code13'); return false;">View Code</a> RSPLUS</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p67713"><td class="code" id="p677code13"><pre class="rsplus" style="font-family:monospace;">aggregated_output <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">aggregate</span><span style="color: #080;">&#40;</span>DV ~ IV1 <span style="color: #080;">*</span> IV2,
                <span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">=</span>data_to_aggregate, FUN<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">median</span><span style="color: #080;">&#41;</span>
aggregated_output</pre></td></tr></table></div>

<p>The above code saves an aggregated dataset to <strong>aggregated_output </strong>and gives you the median in a column. The median (or mean, or whatever function you want to apply) is specified by <strong>FUN=</strong>. The value to create a median for is specified by <strong>DV </strong>(dependent variable).</p>
<p>The aggregate function also gives additional columns for each <strong>IV </strong>(independent variable). You can have as many of these as you like. Here, I have two, and these are specified by <strong>IV1 * IV2</strong>.</p>
<p>Those of you who are familiar with relational databases will see immediately that this function is somewhat similar to <strong>GROUP BY </strong>(in MySQL). The bonus is that you don&#8217;t need to <strong>SELECT </strong>the IV columns that you want to be provided; those are done automatically. For example, take a look at this:</p>

<div class="wp_codebox_msgheader wp_codebox_hide"><span class="right"><sup><a href="http://www.ericbess.com/ericblog/2008/03/03/wp-codebox/#examples" target="_blank" title="WP-CodeBox HowTo?"><span style="color: #99cc00">?</span></a></sup></span><span class="left"><a href="javascript:;" onclick="javascript:showCodeTxt('p677code14'); return false;">View Code</a> MYSQL</span><div class="codebox_clear"></div></div><div class="wp_codebox"><table><tr id="p67714"><td class="code" id="p677code14"><pre class="mysql" style="font-family:monospace;"><a href="http://search.mysql.com/search?site=refman-%35%31&amp;q=SELECT"><span style="color: #990099; font-weight: bold;">SELECT</span></a> IV1<span style="color: #000033;">,</span> IV2<span style="color: #000033;">,</span> <a href="http://dev.mysql.com/doc/refman/%35%2E%31/en/group-by-functions-and-modifiers.html"><span style="color: #000099;">AVG</span></a><span style="color: #FF00FF;">&#40;</span>DV<span style="color: #FF00FF;">&#41;</span> <a href="http://search.mysql.com/search?site=refman-%35%31&amp;q=FROM"><span style="color: #990099; font-weight: bold;">FROM</span></a> data_to_aggregate <a href="http://search.mysql.com/search?site=refman-%35%31&amp;q=GROUP%20BY"><span style="color: #990099; font-weight: bold;">GROUP BY</span></a> IV1<span style="color: #000033;">,</span> IV2</pre></td></tr></table></div>

<p>There is apparently more than one way to skin a cat (even if it&#8217;s a cat that&#8217;s made of data).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychwire.co.uk/2011/04/aggregate-function-in-r-making-your-life-easier-one-mean-at-a-time/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>RStudio, Revolution Analytics and Deducer: A Tale of Three GUIs</title>
		<link>http://www.psychwire.co.uk/2011/04/rstudio-revolution-analytics-and-deducer-a-tale-of-three-guis/</link>
		<comments>http://www.psychwire.co.uk/2011/04/rstudio-revolution-analytics-and-deducer-a-tale-of-three-guis/#comments</comments>
		<pubDate>Tue, 19 Apr 2011 11:59:58 +0000</pubDate>
		<dc:creator>hayward</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[GUI]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.psychwire.co.uk/?p=668</guid>
		<description><![CDATA[I&#8217;m in the process of moving from SPSS to R at the moment. It&#8217;s not been the easiest of rides, but then learning how to do a core part of your job never really should be. It&#8217;s been fun, though &#8211; don&#8217;t get me wrong &#8211; it&#8217;s definitely been an adventure!! Here I&#8217;m going to [...]]]></description>
				<content:encoded><![CDATA[<p>I&#8217;m in the process of moving from SPSS to R at the moment. It&#8217;s not been the easiest of rides, but then learning how to do a core part of your job never really should be. It&#8217;s been fun, though &#8211; don&#8217;t get me wrong &#8211; it&#8217;s definitely been an adventure!! Here I&#8217;m going to review my (limited) experience with some of the GUIs available for R. Don&#8217;t shout at me if I haven&#8217;t fully tested them &#8211; these are the views of a newbie (n00b). This is by no means intended to be a fully-detailed or fully-researched account of the programs here. I actually think they are all great and have been using them interchangeably during the learning process. Best of all, as they are all free, it has meant that changing between the three as I learn has cost me nothing, and I&#8217;ve picked up bits and pieces of new ideas from each of them. I&#8217;m writing this in the hope that others will give them all a go and learn something too.</p>
<p>One of the interesting parts of my time learning R has been the increasing realisation that it&#8217;s turning into something new. When I first tried it out several years back, the first load of the default R installation was, well, not pleasant, consisting of little more than the most basic of interfaces, coupled with a console that was basically worthless to a beginner. It was about as much fun as trying to install Linux about 10 years ago: inevitably, you wish you had stayed at home, so scurry off and hide.</p>
<p><span style="font-size: medium;">And then came Revolution&#8230;</span></p>
<p>But things have changed since that time. R is turning into a new beast, full of potential and possibilities. Imagine my surprise &#8211; nay, joy &#8211; when I discovered Revolution Analytics. It&#8217;s a powerful beast built on the R code base. Users are presented with an IDE that actually makes life considerably easier. The IDE contains a console and a scripting window, which means you do get the best of all worlds &#8211; code, console, and pretty buttons which make life easier. Great. Some of the chief people behind Revolution Analytics were heavily involved in SPSS before moving to R &#8211; so they know what they are doing. I&#8217;ve seen users give the company some grief over the fact that it&#8217;s not open source. That&#8217;s not a debate I want to get into, though I am pleased that they have a free academic license, which is definitely a good thing. I think they charge business users for it, particularly for it&#8217;s optimisations for churning through large datasets.</p>
<p>&lt;aside&gt;Maybe big businesses like using it to predict how much money they are going to make, or what is going to happen in the future. I don&#8217;t know. All I remember from having normal jobs in the past is that making more money compared to previous time points is a Big Thing to management. I have memories of endless early-morning pep-talks about how much money the store was bringing in during the current week compared to (1) the previous week and (2) the same week the previous year. The management types would get upset if &#8220;we&#8221; (as though &#8220;we&#8221; had anything to do with it) brought in less money than the same week in the previous year. I&#8217;d then pipe up and start asking about the error / variance in their predictions, and they&#8217;d get confused. It was fun being a dick.&lt;/aside&gt;</p>
<p>Anyway, I digress. <a href="http://www.revolutionanalytics.com/" target="_blank">Revolution Analytics</a> is great. Download it and give it a go.</p>
<p><span style="font-size: medium;">What the hell is a Deducer?</span></p>
<p>Deducer is a slightly different beast to Revolution Analytics. The point of Deducer, it seems, is to replace the functionality of full-GUI statistical packages (hello, SPSS&#8230; PASW&#8230; or whatever you are called now). This is a brilliant goal and Deducer is making masses of headway in terms of becoming an awesome package. It has built-in functionality and buttons for producing sexy graphs using ggplot2. Keep an eye on this one. It can also do some forms of analyses already, and I&#8217;d predict that it won&#8217;t be long before it can do pretty much anything.</p>
<p>Deducer is also great  - <a href="http://www.deducer.org/pmwiki/index.php?n=Main.DownloadingAndInstallingDeducer" target="_blank">download it and also give it a go</a>. It has a Data View and Variable View (like in SPSS) which eliminates the usual annoyances of R assuming what is a factor and what is a number.</p>
<p>I guess I should have been calling it DeduceR. Should I? No ideaR. Oh my, this R stuff is getting out of contRol&#8230;</p>
<p><span style="font-size: medium;">Back to the RStudio</span></p>
<p>The other GUI I&#8217;ve been using is RStudio. This is my personal favourite. It&#8217;s the fastest to install, and the fastest to load out of the three I&#8217;m reviewing. I know, I know, loading times don&#8217;t matter, right? If something takes 10 seconds to load, that just means you&#8217;ll spend ten less seconds on Facebook, surely? Well, maybe. Loading times are often a good sign of how much bloat there is in a program, as well as how much effort has been put into optimising the program to make it obscenely fast.</p>
<p>There are several reasons why RStudio is my current favourite. It has options for re-colouring the editor window to a dark colour scheme (plus points for me, I like dark schemes). As with Deducer, it is easy to import files into datasets. Packages are easy to manage, graphics are easier to take a look at and export, and datasets are easy to inspect (though you can&#8217;t edit variable types when viewing datasets, at least as far as I know). Together, these three points make it feel like a qualitative and quantitative shift towards something where you can still learn how to do the headache-inducing scripting stuff, but without the kind of headaches that drive you back to SPSS. Oh, and it can comment out multiple lines in a script with a single click of a button. How cool is that?</p>
<p><a href="http://www.rstudio.org/" target="_blank">Go and give RStudio a go, now, now!</a></p>
<p><span style="font-size: medium;">Summary</span></p>
<p>That&#8217;s my experience so far &#8211; I&#8217;m sure it will change as I learn more ! Beyond the differences between these various GUIs, there is a clear point that needs to be considered. The fact that many different people are now working to bring R to becoming something that can be used more widely can only be  good thing (TM). These GUIs, and others like them, will encourage developers to work harder to produce even better alternatives to the R base installation, so I expect, even a year from now, the landscape will be entirely different.</p>
<p>UPDATE: Thanks to <a href="http://www.r-statistics.com/">Tal Galili</a>, it&#8217;s just been pointed out to me that Deducer and RStudio can be used together. It&#8217;s just a case of running library(Deducer) then JGR(), after which you call Deducer again from within the JGR console (this ensures everything is installed for Deducer and ready to go). I had assumed from the documentation for Deducer that it wasn&#8217;t possible to do this (no idea I didn&#8217;t try it, my bad). Great!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychwire.co.uk/2011/04/rstudio-revolution-analytics-and-deducer-a-tale-of-three-guis/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Pivot Tables and Medians in R</title>
		<link>http://www.psychwire.co.uk/2011/04/pivot-tables-and-medians-in-r/</link>
		<comments>http://www.psychwire.co.uk/2011/04/pivot-tables-and-medians-in-r/#comments</comments>
		<pubDate>Sat, 16 Apr 2011 18:46:27 +0000</pubDate>
		<dc:creator>hayward</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[median]]></category>
		<category><![CDATA[pivot tables]]></category>

		<guid isPermaLink="false">http://www.psychwire.co.uk/?p=627</guid>
		<description><![CDATA[Pivot Tables are a useful way of aggregating data into the format that you&#8217;re after. In this example, I&#8217;m going to be using R to pivot some data and calculate medians for me. This is useful because Excel can calculate medians (the =MEDIAN(values)) function, but what it can&#8217;t do is calculate medians for Pivot Tables. [...]]]></description>
				<content:encoded><![CDATA[<p>Pivot Tables are a useful way of aggregating data into the format that you&#8217;re after. In this example, I&#8217;m going to be using R to pivot some data and calculate medians for me. This is useful because Excel can calculate medians (the =MEDIAN(values)) function, but what it <em>can&#8217;t </em>do is calculate medians for Pivot Tables. I assume that it can&#8217;t do this because calculating the median of large groups of aggregated data can be very computationally intensive, and may take longer than you would expect.</p>
<p>The good news, however, is that R can do this with problems. Say that you have run an experiment and are left with the following:</p>
<table border="0">
<tbody>
<tr style="background-color: #c6c6c7;">
<td>participant</td>
<td>condition</td>
<td>score</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>95</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>90</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>105</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>110</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>64</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>80</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
<td>90</td>
</tr>
<tr>
<td>etc.</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p>But that&#8217;s now what you want &#8211; instead, say that you want the following:</p>
<table border="0">
<tbody>
<tr style="background-color: #c6c6c7;">
<td>Participant</td>
<td>condition_1</td>
<td>condition_2</td>
</tr>
<tr>
<td>1</td>
<td>median of score</td>
<td>median of score</td>
</tr>
<tr>
<td>2</td>
<td>median of score</td>
<td>median of score</td>
</tr>
<tr>
<td>etc.</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p>Here&#8217;s the code I used to sort this out:</p>
<div id='stb-box-6084' class='stb-grey_box' style="background-image: url(none); min-height: 20px; padding-left: 5px; "></p>
<p>datafile = read.table(file.choose(), header= TRUE)</p>
<p>median_output &lt;- tapply(as.numeric(datafile$score),  list(datafile$participant, 	datafile$condition), median)</p>
<p>write.table(median_output, file.choose())</div>
<p>Using <strong>file.choose()</strong> presents you with a pop-up window asking which file to load in to use as your <strong>datafile </strong>and also asks you, at the end of the script, where you want to save your pivoted data. At this point, you can call it a text file (e.g., &#8220;medians.txt&#8221;) and save it to wherever you want.</p>
<p>To Pivot more complex datasets, all you need to do is add more columns from your dataset to the <strong>list </strong>function. You&#8217;ll then get the fully pivoted data out instead.</p>
<p>Don&#8217;t forget that you can run this using funcitons other than the median (e.g., mean) &#8211; just replace <strong>median </strong>with whatever you need.</p>
<p>Note finally that I ran <strong>as.numeric() </strong>on the <strong>score </strong>column. This was done because, when reading in the raw data, R sometimes assumes that the column is a <strong>factor </strong>rather than a <strong>numeric </strong>column. If it&#8217;s assumed the wrong thing, you&#8217;ll probably get an error saying &#8220;Error in tapply&#8230;  arguments must have same length&#8221;. If this happens, make sure that all of your columns which should be a factor <em>are </em>a factor and all of your columns which should be numeric <em>are </em>numeric.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.psychwire.co.uk/2011/04/pivot-tables-and-medians-in-r/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
