<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0" xml:base="http://www.programmingr.com">
<channel>
 <title>Programming R</title>
 <link>http://www.programmingr.com</link>
 <description />
 <language>en</language>
<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/ProgrammingR" /><feedburner:info uri="programmingr" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId>ProgrammingR</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><item>
 <title>Animations in R</title>
 <link>http://feedproxy.google.com/~r/ProgrammingR/~3/hpJnjpWR1rA/animations-r</link>
 <description>&lt;!-- google_ad_section_start --&gt;&lt;p&gt;Animated charts can be very helpful in illustrating concepts or discovering relationships, which makes them very helpful in teaching and exploratory research. Fortunately, creating animated graphs in R is fairly straightforward, once you have the right tools and understand a few basic principles about how the animations are created.&lt;/p&gt;
&lt;p&gt;In this article I'll provide an example of how to use the &lt;span style="font-family: courier new,courier;"&gt;animation&lt;/span&gt; package to create an animated chart with a couple of bells and whistles.&lt;/p&gt;
&lt;p&gt;The package installs out-of-the-box with several animations that are tailored for instruction. The examples are of varying complexity ranging from a simple coin flip simulation to illustrations of mathematical problems such as Buffon's needle problem. In most scenarios, however, you'll want to create your own animations, so let's look at how to do that. &lt;/p&gt;
&lt;p&gt;First, there are several different formats in which you can create your animations - GIF, HTML, LaTeX, SWF and mp4. The &lt;span style="font-family: courier new,courier;"&gt;saveGIF()&lt;/span&gt; function call below illustrates the generic format for each of the calls:&lt;/p&gt;
&lt;p&gt;&lt;span style="font-family: courier new,courier;"&gt;saveGIF({&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (i in 1:10) plot(runif(10), ylim = 0:1)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;}) &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Understanding that the package creates animations by generating and then compiling many graphs is central to creating polished custom animations. As you can see, the syntax looks a little unfamiliar at first because the inside of the function call is a custom loop that creates the individual graphs. (Note: If you're familiar with with the way the &lt;span style="font-family: courier new,courier;"&gt;boot()&lt;/span&gt; function works, this is somewhat similar.) Once those individual graphs are created, the function compiles the images in the format specified by the function call. As you might have guessed, most of the animation types require that you install 3rd party libraries for R to be able to do the compilations. The installation of these libraries is covered in the package help.&lt;/p&gt;
&lt;p&gt;Basic use of the animation functions is covered in the package help, but the application of the functions to novel tasks can still be a little difficult. As a result, I've created an example that illustrates how to use the functions to create animations with a couple of bells and whistles. &lt;/p&gt;
&lt;p&gt;This animation plots the density functions of 150 draws of 100 values from a normally distributed random variable. To make things a little more interesting (i.e., make the distribution move), a constant that varies based on the iteration count is added to the 100 values. The chart also includes a slightly stylized frame tracker (or draw counter) along the top of the chart and a horizontal bar that notes the current position and previous two positions of the sample mean. Finally, the foreground color of the chart changes based on the mean of the distribution. &lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;###################################&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;library(animation)&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-family: courier new,courier;"&gt;#Set delay between frames when replaying&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;ani.options(interval=.05)&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-family: courier new,courier;"&gt;# Set up a vector of colors for use below &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;col.range &amp;lt;- heat.colors(15)&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-family: courier new,courier;"&gt;# Begin animation loop&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;# Note the brackets within the parentheses&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;saveGIF({&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp; # For the most part, it's safest to start with graphical settings in &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp; # the animation loop, as the loop adds a layer of complexity to &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp; # manipulating the graphs. For example, the layout specification needs to &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp; # be within animation loop to work properly.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp; layout(matrix(c(1, rep(2, 5)), 6, 1))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp; # Adjust the margins a little&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp; par(mar=c(4,4,2,1) + 0.1)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Begin the loop that creates the 150 individual graphs&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (i in 1:150) {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Pull 100 observations from a normal distribution&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # and add a constant based on the iteration to move the distribution&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chunk &amp;lt;- rnorm(100)+sqrt(abs((i)-51))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Reset the color of the top chart every time (so that it doesn't change as the &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # bottom chart changes)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; par(fg=1)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Set up the top chart that keeps track of the current frame/iteration&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Dress it up a little just for fun&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; plot(-5, xlim = c(1,150), ylim = c(0, .3), axes = F, xlab = "", ylab = "", main = "Iteration")&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; abline(v=i, lwd=5, col = rgb(0, 0, 255, 255, maxColorValue=255))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; abline(v=i-1, lwd=5, col = rgb(0, 0, 255, 50, maxColorValue=255))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; abline(v=i-2, lwd=5, col = rgb(0, 0, 255, 25, maxColorValue=255))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Bring back the X axis&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; axis(1)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Set the color of the bottom chart based on the distance of the distribution's mean from 0&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; par(fg = col.range[mean(chunk)+3])&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Set up the bottom chart&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; plot(density(chunk), main = "", xlab = "X Value", xlim = c(-5, 15), ylim = c(0, .6))&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Add a line that indicates the mean of the distribution. Add additional lines to track&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # previous means&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; abline(v=mean(chunk), col = rgb(255, 0, 0, 255, maxColorValue=255))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (exists("lastmean")) {abline(v=lastmean, col = rgb(255, 0, 0, 50, maxColorValue=255)); prevlastmean &amp;lt;- lastmean;}&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (exists("prevlastmean")) {abline(v=prevlastmean, col = rgb(255, 0, 0, 25, maxColorValue=255))}&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; #Fix last mean calculation&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; lastmean &amp;lt;- mean(chunk)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new,courier;"&gt;})&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-family: courier new,courier;"&gt;########################&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;And the final product:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://imgur.com/GLByH"&gt;&lt;img title="Hosted by imgur.com" src="http://i.imgur.com/GLByH.gif" alt="" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A couple of closing notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Because there are external programs involved (e.g., SWF Tools, ImageMagick, FFmpeg), the setup for this package is slightly more difficult than the average package and things will likely seem less polished than normal. Things may also not work as well; you'll need to be prepared to be flexible with your animation formats and graph layouts.&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;Animation works exceptionally well when smaller numbers of individual graphs are being compiled, but as the number of individual graphs grows, so does your likelihood of hitting a problem. E.g., although GIF is a very exportable and transportable format, and therefore ideal for many situations, I found that animations with more than ~500 source graphs just didn't compile. The limit for HTML was similar. Your mileage may vary, but again, be prepared to be flexible.&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;If you do not need to transport your animation and it will have less than a few hundred individual images, you can avoid installing 3rd party software by using the saveHTML function. This output also includes an interface that allows you to pause and move within the animation easily.&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;As mentioned in the code above, if you're having trouble getting a particular graphical parameter to work, make sure that it is in the internal loop. For efficiency, you want to keep the loop as clean as possible of course, but some things need to be specified each time a new chart is plotted, and therefore need to be inside the loop.&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;Animations aren't very common in research presentations, but can provide extensive insight beyond static images. Given R's advanced graphing capabilities, it's possible to create very nice animations without needing to learn a completely different software package.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you've created an animation you'd like to share or have additional tips, feel free add them to the comments.&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/oF2WD_M2Gqw0WP_6DrQetKNFxEc/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/oF2WD_M2Gqw0WP_6DrQetKNFxEc/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/oF2WD_M2Gqw0WP_6DrQetKNFxEc/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/oF2WD_M2Gqw0WP_6DrQetKNFxEc/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ProgrammingR/~4/hpJnjpWR1rA" height="1" width="1"/&gt;</description>
 <comments>http://www.programmingr.com/content/animations-r#comments</comments>
 <category domain="http://www.programmingr.com/category/stype/articles">Articles</category>
 <pubDate>Tue, 24 Apr 2012 15:02:14 +0000</pubDate>
 <dc:creator>bryan</dc:creator>
 <guid isPermaLink="false">8456 at http://www.programmingr.com</guid>
<feedburner:origLink>http://www.programmingr.com/content/animations-r</feedburner:origLink></item>
<item>
 <title>RStudio Development Environment</title>
 <link>http://feedproxy.google.com/~r/ProgrammingR/~3/k41Xctaf6PU/rstudio-development-environment</link>
 <description>&lt;!-- google_ad_section_start --&gt;&lt;a href="http://imgur.com/CMBrt"&gt;&lt;img src="http://i.imgur.com/CMBrtl.jpg" title="RStudio Layout" alt="RStudio Layout" style = "float: right; padding: 3px;" height = "200" /&gt;&lt;/a&gt;Compared to many other languages of equal popularity, there are realtively few development environments for R. In fact, the total number of production ready R IDEs could probably be counted on one hand. That deficiency is a small price to pay to use R and if you're not already accustomed to using IDEs for other languages, you probably haven't missed it too much. But RStudio goes a long way toward providing a full-featured R development platform, that, once you've used it, quickly becomes hard to give up again. &lt;br /&gt;&lt;br /&gt;

RStudio has some nice graphical features and the layout is clean and logical for the most part. Functionally, some of the best features are:
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Plot caching (allows you to flip back through previous graphs without rerunning them, making it much easier to review your graphical output)&lt;/li&gt;
&lt;li&gt;Function, object and &lt;em&gt;parameter&lt;/em&gt; listing and completion that works even with user-defined functions&lt;/li&gt;
&lt;li&gt;Shortcuts for quickly drilling down into functions&lt;/li&gt;
&lt;/ul&gt;&lt;br /&gt;
&lt;a href="http://imgur.com/o2QAF"&gt;&lt;img src="http://i.imgur.com/o2QAF.jpg" title="RStudio paramater completion" alt="RStudio paramater completion" alt="" height = "150" /&gt;&lt;/a&gt;
&lt;br /&gt;&lt;br /&gt;
RStudio also provides version control integration (Git, SVN) which could prove to be very helpful, but I haven't yet tested it. I can't speak to how well it works, just that it is available. 
&lt;br /&gt;&lt;br /&gt;
In addition to these positives, RStudio has an active support system with developer participation via the &lt;a href="http://support.rstudio.org/" title="RStudio Support" alt = "RStudio Support"&gt;RStudio support site&lt;/a&gt;. 
&lt;br /&gt;&lt;br /&gt;
Overall, I've been very impressed with RStudio over the past few weeks. If you haven't yet tested it, I suggest you give it a try. Given the growth of R over recent years, I think it's time we expected development tools to mature to the level that they have for other programming languages with similar levels of adoption. The only way that will produce sustainable, mature products is if there is a constant demand in the market. 
&lt;br /&gt;&lt;br /&gt;
Already using something else? Feel free to mention your favorite R IDE in the comments.&lt;!-- google_ad_section_end --&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/YKemqJb16NqWKKbQl3cBscPseZM/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/YKemqJb16NqWKKbQl3cBscPseZM/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/YKemqJb16NqWKKbQl3cBscPseZM/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/YKemqJb16NqWKKbQl3cBscPseZM/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ProgrammingR/~4/k41Xctaf6PU" height="1" width="1"/&gt;</description>
 <comments>http://www.programmingr.com/content/rstudio-development-environment#comments</comments>
 <category domain="http://www.programmingr.com/category/stype/articles">Articles</category>
 <pubDate>Fri, 23 Mar 2012 17:55:55 +0000</pubDate>
 <dc:creator>bryan</dc:creator>
 <guid isPermaLink="false">8416 at http://www.programmingr.com</guid>
<feedburner:origLink>http://www.programmingr.com/content/rstudio-development-environment</feedburner:origLink></item>
<item>
 <title>Installing quantstrat from R-forge and source</title>
 <link>http://feedproxy.google.com/~r/ProgrammingR/~3/RYjWsNCd_Vs/installing-quantstrat-r-forge-and-source</link>
 <description>&lt;!-- google_ad_section_start --&gt;&lt;p&gt;R is used extensively in the financial industry; many of my recent clients have been working in or developing products for the financial sector. Some common applications are to use R to analyze market data and evaluate quantitative trading strategies. Custom solutions are almost always the best way to do this, but the &lt;code&gt;quantstrat&lt;/code&gt; package can make it easy to quickly get a high-level understanding of a strategy's potential. However, &lt;code&gt;quantstrat&lt;/code&gt; is still under development, and this, combined with a lack of documentation and the complex nature of the tasks involved, make it difficult to work with. This article addresses one of the most basic issues with &lt;code&gt;quantstrat&lt;/code&gt; - getting it installed. &lt;code&gt;quantstrat&lt;/code&gt; and it's required packages currently aren't available on CRAN - you have to get them from R-forge. As a result, the installation is slightly less straightforward than other packages and provides an opportunity to discuss how to install packages from R-forge and locally from source. Although this article focuses on installing &lt;code&gt;quantstrat&lt;/code&gt;, these instructions will help with any R-package that you need to build from source.&lt;br /&gt;
&lt;/br&gt;&lt;br /&gt;
If you're installing from R-forge, the process is only moderately different than installing from CRAN; simply change the &lt;code&gt;install.packages&lt;/code&gt; command to point to the R-forge repository:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
install.packages("FinancialInstrument", repos="http://R-Forge.R-project.org")&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;install.packages("blotter", repos="http://R-Forge.R-project.org")&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;install.packages("quantstrat", repos="http://R-Forge.R-project.org")&lt;/code&gt;&lt;br /&gt;
&lt;/br&gt;&lt;br /&gt;
Since the &lt;code&gt;FinancialInstrument&lt;/code&gt; and &lt;code&gt;blotter&lt;/code&gt; packages are dependencies for &lt;code&gt;quantstrat&lt;/code&gt;, you can download and install all three at once with just the last line.&lt;br /&gt;
&lt;/br&gt;&lt;br /&gt;
In some cases, you may need to build the packages yourself. You'll need to set your system up to compile R source code if it isn't already. To do so, follow steps 1-3 below. If your system is already set up to compile R source code, you can skip to step 4.&lt;br /&gt;
&lt;/br&gt;&lt;br /&gt;
# 1) Install &lt;code&gt;Rtools&lt;/code&gt; package (must be done manually from &lt;a href="http://www.murdoch-sutherland.com/Rtools/" title="http://www.murdoch-sutherland.com/Rtools/"&gt;http://www.murdoch-sutherland.com/Rtools/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;# 2) Install LaTex from &lt;a href="http://www.miktex.org/" title="www.miktex.org/"&gt;www.miktex.org/&lt;/a&gt; &lt;/p&gt;
&lt;p&gt;# 3) Install InnoSetup &lt;a href="http://www.jrsoftware.org/" title="http://www.jrsoftware.org/"&gt;http://www.jrsoftware.org/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;# 4) Download the three package source files available from R-forge &lt;a href="http://r-forge.r-project.org/R/?group_id=316" title="http://r-forge.r-project.org/R/?group_id=316"&gt;http://r-forge.r-project.org/R/?group_id=316&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;# 5) Install the packages using the commands below (substituting the appropriate version numbers):&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
install.packages("C:/yourpath/FinancialInstrument_0.9.18.tar.gz", repos = NULL, type="source")&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;install.packages("C:/yourpath/blotter_0.8.4.tar.gz", repos = NULL, type="source")&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;install.packages("C:/yourpath/quantstrat_0.6.1.tar.gz", repos = NULL, type="source")&lt;/code&gt;&lt;br /&gt;
&lt;/br&gt;&lt;br /&gt;
Note that these directions are relevant until the packages are available on CRAN, after which, you'll be able to download and install them like any other package (I'll make a note on this post once that happens). Also note that since these packages are under heavy development, you'll want to update them often.&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/hg_Pxg7C6pHKhVHTRCWjoXeu_Ho/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/hg_Pxg7C6pHKhVHTRCWjoXeu_Ho/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/hg_Pxg7C6pHKhVHTRCWjoXeu_Ho/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/hg_Pxg7C6pHKhVHTRCWjoXeu_Ho/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ProgrammingR/~4/RYjWsNCd_Vs" height="1" width="1"/&gt;</description>
 <comments>http://www.programmingr.com/content/installing-quantstrat-r-forge-and-source#comments</comments>
 <category domain="http://www.programmingr.com/category/stype/articles">Articles</category>
 <pubDate>Tue, 10 Jan 2012 22:52:10 +0000</pubDate>
 <dc:creator>bryan</dc:creator>
 <guid isPermaLink="false">8356 at http://www.programmingr.com</guid>
<feedburner:origLink>http://www.programmingr.com/content/installing-quantstrat-r-forge-and-source</feedburner:origLink></item>
<item>
 <title>Subscriptions Feature Added</title>
 <link>http://feedproxy.google.com/~r/ProgrammingR/~3/NcFEEvEVrgA/subscriptions-feature-added</link>
 <description>&lt;!-- google_ad_section_start --&gt;&lt;p&gt;You can now subscribe to almost any content on the ProgrammingR website, including the job listings. &lt;/p&gt;
&lt;p&gt;To be notified of job listings as soon as they are posted, click the "R Jobs" link above and follow the instructions on that page to add the jobs feed to your feed reader.Because of this change, I will now be reposting jobs from the R Jobs listserv and other sources, so that you only need to subscribe to one service. If you have a source for R Jobs listings that you would like to see represented in the Job Listings section, send me a &lt;a href="http://www.programmingr.com/contact" title = "Contact page"&gt;message&lt;/a&gt; and I will attempt to include it.&lt;/p&gt;
&lt;p&gt;If you would like to submit a job or consultant listing, you can do so via the contact page.&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/OaJ8e_UwjWiK7hYD2tjj-ZaJ9lg/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/OaJ8e_UwjWiK7hYD2tjj-ZaJ9lg/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/OaJ8e_UwjWiK7hYD2tjj-ZaJ9lg/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/OaJ8e_UwjWiK7hYD2tjj-ZaJ9lg/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ProgrammingR/~4/NcFEEvEVrgA" height="1" width="1"/&gt;</description>
 <category domain="http://www.programmingr.com/category/stype/articles">Articles</category>
 <category domain="http://www.programmingr.com/category/stype/site-news">Site news</category>
 <pubDate>Wed, 07 Dec 2011 20:26:28 +0000</pubDate>
 <dc:creator>bryan</dc:creator>
 <guid isPermaLink="false">8342 at http://www.programmingr.com</guid>
<feedburner:origLink>http://www.programmingr.com/content/subscriptions-feature-added</feedburner:origLink></item>
<item>
 <title>Bayesian Computation with R - Albert (2009)</title>
 <link>http://feedproxy.google.com/~r/ProgrammingR/~3/u6qAIJcCZJI/bayesian-computation-r-albert-2009</link>
 <description>&lt;!-- google_ad_section_start --&gt;&lt;div style="float:right; background-color: lightblue; border: 1px solid darkblue; padding:5px; margin: 2px; width: 325px;"&gt;&lt;strong&gt;Title:&lt;/strong&gt; Bayesian Computation with R&lt;br /&gt;&lt;strong&gt;Author(s):&lt;/strong&gt; Jim Albert&lt;br /&gt;&lt;strong&gt;Publisher/Date:&lt;/strong&gt; Springer/2009&lt;br /&gt;&lt;strong&gt;Statistics level:&lt;/strong&gt; High &lt;br /&gt;&lt;strong&gt;Programming level:&lt;/strong&gt; Low &lt;br /&gt;&lt;strong&gt;Overall recommendation:&lt;/strong&gt;  Recommended&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Bayesian Computation with R&lt;/em&gt; focuses primarily on providing the reader with a basic understanding of Bayesian thinking and the relevant analytic tools included in R. It does not explore either of those areas in detail, though it does hit the key points for both.&lt;/p&gt;
&lt;p&gt;As with many R books, the first chapter is devoted to an introduction of data manipulation and basic analyses in R. This introductory chapter focuses more heavily on analyses that many of the other similarly focused chapters in other texts. The new R user who hasn't yet built up a library of these chapters will find it useful, but for experienced R users or those with multiple R texts, there is little new information.&lt;/p&gt;
&lt;p&gt;Albert's introduction to the foundational Bayesian concepts (e.g., Bayes' theorem) is concise and will be clear to those with a statistical background, but others may need to refresh their statistical knowledge before they can fully grasp the content in the second chapter. Those from programming backgrounds without extensive statistical knowledge may be better off beginning with a text that deals specifically with Bayesian analysis.&lt;/p&gt;
&lt;p&gt;Many of the topics discussed in this text have limited application, but possibly the most broadly applicable chapter deals with Bayesian regression. Those interested in learning how to run and diagnose Bayesian regression in R will find almost everything they need to know here.&lt;/p&gt;
&lt;p&gt;As with many R texts, &lt;em&gt;Bayesian Computation with R&lt;/em&gt; has an accompanying package of functions available on CRAN ("LearnBayes"). The functions in this package are focused mainly on teaching Bayesian analysis, but also include some useful basic implementations.&lt;/p&gt;
&lt;p&gt;This book straddles the line between introductory theory and intermediate-level statistical programming. Because of the omissions of information on each side of that line, the reader will get the most mileage from the text if he or she has access to resources (i.e., other texts, colleagues, or previous knowledge) that can fill in those omissions. For that reason, it would work well as a text for an upper-level course on Bayesian statistics and their application, but it is not well suited as a reference text, or as a guide for real-world analysis.&lt;/p&gt;
&lt;p&gt;Overall, I recommend this book, with the caveat that interested readers should review the sample pages available on the Springer website &lt;a href = "http://www.springerlink.com/content/978-0-387-92298-0" title = "" alt = "http://www.springerlink.com/content/978-0-387-92298-0"&gt;here&lt;/a&gt; and the functions in the "LearnBayes" package prior to purchasing. The text is currently available for approximately $50 in paperback and $40 for the Kindle version.&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/-uBNuwVP4UYukDEYqDr0WrF1jTw/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/-uBNuwVP4UYukDEYqDr0WrF1jTw/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/-uBNuwVP4UYukDEYqDr0WrF1jTw/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/-uBNuwVP4UYukDEYqDr0WrF1jTw/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ProgrammingR/~4/u6qAIJcCZJI" height="1" width="1"/&gt;</description>
 <comments>http://www.programmingr.com/content/bayesian-computation-r-albert-2009#comments</comments>
 <category domain="http://www.programmingr.com/category/stype/book-reviews">Book reviews</category>
 <pubDate>Tue, 04 Oct 2011 20:30:52 +0000</pubDate>
 <dc:creator>bryan</dc:creator>
 <guid isPermaLink="false">8314 at http://www.programmingr.com</guid>
<feedburner:origLink>http://www.programmingr.com/content/bayesian-computation-r-albert-2009</feedburner:origLink></item>
<item>
 <title>Building Scoring and Ranking Systems in R</title>
 <link>http://feedproxy.google.com/~r/ProgrammingR/~3/Ght26Dct4IA/building-scoring-and-ranking-systems-r</link>
 <description>&lt;!-- google_ad_section_start --&gt;&lt;p&gt;&lt;em&gt;This guest article was written by &lt;a href="http://www.amazon.com/Enhanced-Indexing-Strategies-Utilizing-Performance/dp/0470259256" title = "Enhanced Indexing Strategies"&gt;author&lt;/a&gt; and &lt;a href="http://www.yates-mgt.com" title = "Yates Management"&gt;consultant&lt;/a&gt; Tristan Yates (see his bio below). It emphasizes R's data object manipulation and scoring capabilities via a detailed financial analysis example.&lt;/em&gt;&lt;br /&gt;
&lt;br /&gt;
Scoring and ranking systems are extremely valuable management tools.  They can be used to predict the future, make decisions, and improve behavior – sometimes all of the above.  Think about how the simple grade point average is used to motivate students and make admissions decisions.&lt;br /&gt;
&lt;br /&gt;
R is a great tool for building scoring and ranking systems.  It’s a programming language designed for analytical applications with statistical capabilities.  The capability to store and manipulate data in list and table form is built right into the core language.&lt;br /&gt;
&amp;nbsp;&lt;/p&gt;
&lt;!--break--&gt;&lt;!--break--&gt;&lt;p&gt;But there’s also some validity to the criticism that R provides too many choices and not enough guidance.  The best solution is to share your work with others, so in this article we show a basic design workflow for one such scoring and ranking system that can be applied to many different types of projects.&lt;br /&gt;
&lt;br /&gt;
&lt;strong&gt;The Approach&lt;/strong&gt;&lt;br /&gt;
For a recent article in Active Trader, we analyzed the risk of different market sectors over time with the objective of building less volatile investment portfolios.  Every month, we scored each sector by its risk, using its individual ranking within the overall population, and used these rankings to predict future risk.&lt;br /&gt;
&lt;br /&gt;
Here’s the workflow we used, and it can be applied to any scoring and ranking system that must perform over time (which most do):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Load in the historical data for every month and ticker symbol.&lt;/li&gt;
&lt;li&gt;Load in the performance data for every month and ticker symbol.&lt;/li&gt;
&lt;li&gt;Generate scores and rankings for every month and ticker symbol based upon their relative position in the population on various indicators.&lt;/li&gt;
&lt;li&gt;Review the summary and look for trends.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In these steps, we used four data frames, as shown below:&lt;br /&gt;
&lt;br &gt;&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Contents&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;my.history&lt;/td&gt;
&lt;td&gt;historical data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;my.scores&lt;/td&gt;
&lt;td&gt;scoring components, total scores, rankings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;my.perf&lt;/td&gt;
&lt;td&gt;performance data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;my.summary&amp;nbsp;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;summary or aggregate data&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&amp;nbsp;&lt;br /&gt;
One of my habits is to prefix my variables – it helps prevent collisions in the R namespace.&lt;br /&gt;
&lt;br /&gt;
Some people put all of their data in the same data.frame, but keeping it separate reinforces good work habits.  First, the historical data and performance data should never be manipulated, so it makes sense to keep it away from the more volatile scoring data.&lt;br /&gt;
&lt;br /&gt;
Second, it helps draw a clear distinction between what we know at one point in time – which is historical data - and what we will know later – which is the performance data.  That’s absolutely necessary for the integrity of the scoring system.&lt;br /&gt;
&lt;br /&gt;
My.history, my.scores, and my.perf are organized like this:&lt;br /&gt;
&lt;br &gt;&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;th&gt;&amp;nbsp;yrmo&amp;nbsp;&lt;/th&gt;
&lt;th&gt;&amp;nbsp;ticker&amp;nbsp;&lt;/th&gt;
&lt;th&gt;&amp;nbsp;&amp;nbsp;var1&amp;nbsp;&amp;nbsp;&lt;/th&gt;
&lt;th&gt;&amp;nbsp;&amp;nbsp;var2&amp;nbsp;&amp;nbsp;&lt;/th&gt;
&lt;th&gt;&amp;nbsp;&amp;nbsp;etc...&amp;nbsp;&amp;nbsp;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;200401&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&amp;nbsp;XLF&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;200401&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&amp;nbsp;XLB&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;etc...&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&amp;nbsp;&lt;br /&gt;
yrmo is the year and month and ticker is the item to be scored.   We maintain our own list of dates (in yrmo format) and items in my.dates and my.items.  Both these lists are called drivers, as they can help iterate through the data.frame, and we also have a useful data.frame called my.driver which has only the yrmo and ticker.&lt;br /&gt;
&lt;br /&gt;
One trick – we keep the order the same for all of these data.frames.  That way we can use indexes on one to query another.  For example, this works just fine:&lt;br /&gt;
&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;Vol.spy &lt;- my.history$vol.1[my.score$rank==1]&lt;/code&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;strong&gt;Loading Data&lt;/strong&gt;&lt;br /&gt;
First, we get our driver lists and my.driver data.frame set up. We select our date range and items from our population, and then build a data.frame using the rbind command.&lt;br /&gt;
&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;#this is based on previous analysis&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;my.dates &lt;- m2$yrmo[13:(length(m2$yrmo)-3)]&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;my.items &lt;- ticker.list[2:10]&lt;/code&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;#now the driver&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;my.driver &lt;- data.frame()&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;for (z.date in my.dates) {&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;my.driver &lt;- rbind(my.driver,data.frame(ticker=my.items,yrmo=z.date))&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;}&lt;/code&gt;&lt;br /&gt;
&lt;br /&gt;
Next, let’s get our historical and performance data.  We can make a function that can be called once for each row in my.driver that then loads any data needed.&lt;br /&gt;
&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;my.seq &lt;- 1:length(my.driver[,1])&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;my.history &lt;- data.frame(ticker=my.driver$ticker,yrmo=my.driver$yrmo,&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;vol.1=sapply(my.seq,calc.sd.fn,-1,-1))&lt;/code&gt;&lt;br /&gt;
&lt;br /&gt;
Each variable can be loaded by a function called with the sapply command.  The calc.sd.fn function first looks up the ticker and yrmo from my.driver using the index provided, and then returns the data.  You would have one function for each indicator that you want to load.  My.perf, which holds the performance data, is built in the exact same way.&lt;br /&gt;
&lt;br /&gt;
The rbind command is slow unfortunately, but loading the historical and performance data only needs to be done once.&lt;br /&gt;
&lt;br /&gt;
&lt;strong&gt;Scoring The Data&lt;/strong&gt;&lt;br /&gt;
This is where R really shines.  Let’s look at the highest level first.&lt;br /&gt;
&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;my.scores &lt;- data.frame()&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;for (z.yrmo in my.dates) {&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;my.scores &lt;- rbind(my.scores,calc.scores.fn(z.yrmo))&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;my.scores$p.tot &lt;- (my.scores$p.vol.1)&lt;/code&gt;&lt;br /&gt;
&lt;br /&gt;
Every indicator gets its own score, and then that can be combined in any conceivable way to create total score.  In this very simple case, we’re only scoring one indicator, so we just use that score as the total score.&lt;br /&gt;
&lt;br /&gt;
For more complex applications, the ideal strategy is to use multiple indicators from multiple data sources to tell the same story.  Ignore those who advocate reducing variables and cross-correlations.  Instead, think like a doctor that wants to run just one more test and get that independent confirmation.&lt;br /&gt;
&lt;br /&gt;
Now the calc functions:&lt;br /&gt;
&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;scaled.score.fn &lt;- function(z.raw)&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{pnorm(z.raw,mean(z.raw),sd(z.raw))*100}&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;scaled.rank.fn &lt;- function(z.raw) {rank(z.raw)}&lt;/code&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;calc.scores.fn &lt;- function(z.yrmo) {&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;z.df &lt;- my.history[my.history$yrmo==z.yrmo,]&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;z.scores &lt;- data.frame(ticker=z.df$ticker,yrmo=z.df$yrmo,&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;p.vol.1=scaled.score.fn(z.df$vol.1),r.vol.1=scaled.rank.fn(z.df$vol.1))&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;z.scores&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;/code&gt;&lt;br /&gt;
&lt;br /&gt;
The calc.scores.fn function queries the data.frame to pull the population data for just a single point in time.  Then, each indicator is passed to the scaled.score.fn and scaled.rank.fn function, returning a list of scores and ranks.&lt;br /&gt;
&lt;br /&gt;
Here, we use the pnorm function to calculate a statistical Z-score, which is a good practice for ensuring that a scoring system isn’t dominated by a single indicator.&lt;br /&gt;
&lt;br /&gt;
&lt;strong&gt;Checking the Scores&lt;/strong&gt;&lt;br /&gt;
At this point, we create a new data.frame for summary analysis. We use the always useful and always confusing aggregate function and combine by rank.  Notice how we easily we can combine data from my.history, my.scores and my.perf.&lt;br /&gt;
&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;data.frame(rank=1:9,p.tot=aggregate(my.scores$p.tot,&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;list(rank=my.scores$r.vol.1),mean)$x,ret.1=aggregate(my.perf$ret.1,&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;list(rank=my.scores$r.vol.1),mean)$x,sd.1=aggregate(my.perf$ret.1,&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;list(rank=my.scores$r.vol.1),sd)$x,vol.1=aggregate(my.history$vol.1,&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;list(rank=my.scores$r.vol.1),mean)$x,vol.p1=aggregate(my.history$vol.1,&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;list(rank=my.scores$r.vol.1),mean)$x)&lt;/code&gt;&lt;br /&gt;
&lt;br /&gt;
Here’s the result.  We could check plots or correlations, but the trend – higher relative volatility in the past (vol.p1, p.tot) is more likely to mean higher relative volatility in the future (vol.1, sd.1) - is crystal clear.&lt;br /&gt;
&lt;br &gt;&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;th&gt;&lt;FONT FACE='Courier' SIZE=1&gt;rank&amp;nbsp;&lt;/FONT&gt;&lt;/th&gt;
&lt;th&gt;&lt;FONT FACE='Courier' SIZE=1&gt;p.tot&amp;nbsp;&lt;FONT FACE='Courier' SIZE=1&gt;&lt;/th&gt;
&lt;th&gt;&lt;FONT FACE='Courier' SIZE=1&gt;ret.1&amp;nbsp;&amp;nbsp;&lt;FONT FACE='Courier' SIZE=1&gt;&lt;/th&gt;
&lt;th&gt;&lt;FONT FACE='Courier' SIZE=1&gt;sd.1 &amp;nbsp;&amp;nbsp;&lt;FONT FACE='Courier' SIZE=1&gt;&lt;/th&gt;
&lt;th&gt;&lt;FONT FACE='Courier' SIZE=1&gt;vol.1&amp;nbsp;&amp;nbsp;&lt;FONT FACE='Courier' SIZE=1&gt;&lt;/th&gt;
&lt;th&gt;&lt;FONT FACE='Courier' SIZE=1&gt;vol.p1&amp;nbsp;&amp;nbsp;&lt;FONT FACE='Courier' SIZE=1&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;1&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;12.1&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;0.131&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;4.03&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;16.5&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;13.8&lt;/FONT&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;2&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;19.4&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;0.0872&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;4.82&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;16.6&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;16.1&lt;/FONT&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;3&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;27.1&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;0.2474&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;4.96&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;20.1&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;18&lt;/FONT&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;4&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;35.6&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;0.4247&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;5.31&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;20.9&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;19.9&lt;/FONT&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;5&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;44.9&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;0.6865&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;5.98&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;22.1&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;21.7&lt;/FONT&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;6&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;53&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;0.3235&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;5.84&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;21.5&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;23.2&lt;/FONT&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;7&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;65.1&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;1.019&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;5.86&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;24.6&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;25.4&lt;/FONT&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;8&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;78&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;0.7276&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;6.04&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;26.9&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;28.4&lt;/FONT&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;9&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;96.4&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;0.0837&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;9.34&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;35.2&lt;/FONT&gt;&lt;/td&gt;
&lt;td&gt;&lt;FONT FACE='Courier' SIZE=1&gt;38.3&lt;/FONT&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&amp;nbsp;&lt;br /&gt;
In the case of our analysis, the scores aren’t really necessary – we’re only ranking nine items every month.  If we did have a larger population, we could use code like this to create subgroups (six groups shown here), and then use the above aggregate function with the new my.scores$group variable.&lt;br /&gt;
&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;my.scores$group &lt;- cut(my.scores$p.tot,&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;breaks=quantile(my.scores$p.tot,(0:6)/6),include.lowest=TRUE,labels=1:6)&lt;/code&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;strong&gt;Wrap-up&lt;/strong&gt;&lt;br /&gt;
We ultimately only ended up scoring one variable, but it’s pretty easy to see how this framework could be expanded to dozens or more.  Even so, it’s an easy system to describe – we grade each item by its ranking within the population.  People don’t trust scoring systems that can’t be easily explained, and with good reason.&lt;br /&gt;
&lt;br /&gt;
There’s not a lot of code here, and that’s a testimony to R’s capabilities.  A lot of housekeeping work is done for you, and the list operations eliminate confusing nested loops.  It can be a real luxury to program in R after dealing with some other “higher level” language.&lt;br /&gt;
&lt;br /&gt;
We hope you find this useful and encourage you to share your own solutions as well.&lt;br /&gt;
&lt;br /&gt;
&lt;em&gt;Tristan Yates is the Executive Director of &lt;a href="http://www.yates-mgt.com" title = "Yates Management"&gt;Yates Management&lt;/a&gt;, a management and analytical consulting firm serving financial and military clients.  He is also the author of &lt;a href="http://www.amazon.com/Enhanced-Indexing-Strategies-Utilizing-Performance/dp/0470259256" title = "Enhanced Indexing Strategies"&gt;Enhanced Indexing Strategies&lt;/a&gt; and his writing and research have appeared in publications including the Wall Street Journal and Forbes/Investopedia.&lt;/em&gt;&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/OZffUWtYPbtRxNzc1OuFZxyUcrk/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/OZffUWtYPbtRxNzc1OuFZxyUcrk/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/OZffUWtYPbtRxNzc1OuFZxyUcrk/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/OZffUWtYPbtRxNzc1OuFZxyUcrk/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ProgrammingR/~4/Ght26Dct4IA" height="1" width="1"/&gt;</description>
 <comments>http://www.programmingr.com/content/building-scoring-and-ranking-systems-r#comments</comments>
 <category domain="http://www.programmingr.com/category/stype/articles">Articles</category>
 <pubDate>Tue, 04 May 2010 00:17:40 +0000</pubDate>
 <dc:creator>bryan</dc:creator>
 <guid isPermaLink="false">67 at http://www.programmingr.com</guid>
<feedburner:origLink>http://www.programmingr.com/content/building-scoring-and-ranking-systems-r</feedburner:origLink></item>
<item>
 <title>Data Manipulation with R - Spector (2008)</title>
 <link>http://feedproxy.google.com/~r/ProgrammingR/~3/0DvfA9Ju-BA/data-manipulation-r-spector-2008</link>
 <description>&lt;!-- google_ad_section_start --&gt;&lt;div style="float:right; background-color: lightblue; border: 1px solid darkblue; padding:5px; margin: 2px; width: 325px;"&gt;&lt;strong&gt;Title:&lt;/strong&gt; Data Manipulation with R&lt;br /&gt;&lt;strong&gt;Author(s):&lt;/strong&gt; Phil Spector&lt;br /&gt;&lt;strong&gt;Publisher/Date:&lt;/strong&gt; Springer/2008&lt;br /&gt;&lt;strong&gt;Statistics level:&lt;/strong&gt; N/A &lt;br /&gt;&lt;strong&gt;Programming level:&lt;/strong&gt; Intermediate &lt;br /&gt;&lt;strong&gt;Overall recommendation:&lt;/strong&gt;  Highly recommended&lt;/div&gt;
&lt;p&gt; If there is one book that every beginning R user coming from a &lt;strong&gt;programming&lt;/strong&gt; background should have, it is Spector’s &lt;em&gt;Data Manipulation with R&lt;/em&gt;.  New R users with &lt;strong&gt;analytic&lt;/strong&gt; backgrounds and experience with software packages such as SAS and SPSS will do well to start with Muenchen’s &lt;em&gt;R for SPSS and SAS users&lt;/em&gt;, especially given that a free abbreviated version is &lt;a href = "http://rforsasandspssusers.com/" title = "http://rforsasandspssusers.com/" alt = "http://rforsasandspssusers.com/"&gt;available&lt;/a&gt;, but those users should also make &lt;em&gt;Data Manipulation with R&lt;/em&gt; a quick second addition to their library.&lt;/p&gt;
&lt;!--break--&gt;&lt;!--break--&gt;&lt;p&gt;The text of this book is as concise and to the point as its title. It covers almost every relevant data manipulation topic in R, from modes and classes, through accessing data via database connections, to complex reshaping and aggregating functions. It has copious examples and the text hits just the right level of sophistication for the individual who has some experience with programming, but little experience with R idioms and data manipulation techniques. &lt;/p&gt;
&lt;p&gt;My only critique of this book is that it skips over the basics of creating user-defined functions for data manipulation tasks. Spector addresses mapping functions to various data structures, but it seems likely that, at this level, the average R analyst would be better served by a discussion of how to simply create a function in R. Keep in mind that if you are looking for that type of information, you will need to look elsewhere. The same is true if you are looking for any sort of statistical instruction, as &lt;em&gt;Data Manipulation with R&lt;/em&gt; focuses almost exclusively on programming.&lt;/p&gt;
&lt;p&gt;Overall, I highly recommend this book. At around $45 USD, it is well worth the price. You’ll breeze through it on your first pass, but if you're new to R you will get your money’s worth out of it as a reference text.&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/a_fgnoY75pd7cdZGVw_ZSGxJ9kE/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/a_fgnoY75pd7cdZGVw_ZSGxJ9kE/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/a_fgnoY75pd7cdZGVw_ZSGxJ9kE/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/a_fgnoY75pd7cdZGVw_ZSGxJ9kE/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ProgrammingR/~4/0DvfA9Ju-BA" height="1" width="1"/&gt;</description>
 <comments>http://www.programmingr.com/content/data-manipulation-r-spector-2008#comments</comments>
 <category domain="http://www.programmingr.com/category/stype/book-reviews">Book reviews</category>
 <pubDate>Fri, 30 Apr 2010 17:31:40 +0000</pubDate>
 <dc:creator>bryan</dc:creator>
 <guid isPermaLink="false">65 at http://www.programmingr.com</guid>
<feedburner:origLink>http://www.programmingr.com/content/data-manipulation-r-spector-2008</feedburner:origLink></item>
<item>
 <title>Webscraping using readLines and RCurl</title>
 <link>http://feedproxy.google.com/~r/ProgrammingR/~3/TRcTSX9JBeY/webscraping-using-readlines-and-rcurl</link>
 <description>&lt;!-- google_ad_section_start --&gt;&lt;p&gt;There is a massive amount of data available on the web.  Some of it is in the form of precompiled, downloadable datasets which are easy to access.  But the majority of online data exists as web content such as blogs, news stories and cooking recipes.  With precompiled files, accessing the data is fairly straightforward; just download the file, unzip if necessary, and import into R.  For "wild" data however, getting the data into an analyzeable format is more difficult.  Accessing online data of this sort is sometimes reffered to as "webscraping".  Two R facilities, &lt;code&gt;readLines()&lt;/code&gt; from the base package and &lt;code&gt;getURL()&lt;/code&gt; from the RCurl package make this task possible.&lt;br /&gt;
&lt;br/&gt;&lt;/p&gt;
&lt;h2&gt;readLines&lt;/h2&gt;
&lt;p&gt;For basic webscraping tasks the &lt;code&gt;readLines()&lt;/code&gt; function will usually suffice.  &lt;code&gt;readLines()&lt;/code&gt; allows simple access to webpage source data on non-secure servers.  In its simplest form, &lt;code&gt;readLines()&lt;/code&gt; takes a single argument - the URL of the web page to be read:  &lt;/p&gt;
&lt;p&gt;&lt;code&gt;web_page &lt;- readLines("http://www.interestingwebsite.com")&lt;/code&gt;&lt;br /&gt;
&lt;br/&gt;&lt;/p&gt;
&lt;p&gt;As an example of a (somewhat) practical use of webscraping, imagine a scenario in which we wanted to know the 10 most frequent posters to the R-help listserve for January 2009.  Because the listserve is on a secure site (e.g. it has https:// rather than http:// in the URL) we can't easily access the live version with &lt;code&gt;readLines()&lt;/code&gt;.  So for this example, I've posted a local copy of the  list archives on the this site.&lt;br /&gt;
&lt;br/&gt;&lt;/p&gt;
&lt;p&gt;One note, by itself  &lt;code&gt;readLines()&lt;/code&gt; can only acquire the data.  You'll need to use &lt;code&gt;grep(), gsub()&lt;/code&gt; or equivalents to parse the data and keep what you need.  &lt;/p&gt;
&lt;p&gt;&lt;code&gt;# Get the page's source&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;web_page &lt;- readLines("http://www.programmingr.com/jan09rlist.html")&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;# Pull out the appropriate line&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;author_lines &lt;- web_page[grep("&amp;lt;I&amp;gt;", web_page)]&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;# Delete unwanted characters in the lines we pulled out&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;authors &lt;- gsub("&amp;lt;I&amp;gt;", "", author_lines, fixed = TRUE)&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;# Present only the ten most frequent posters&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;author_counts &lt;- sort(table(authors), decreasing = TRUE)&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;author_counts[1:10]&lt;/code&gt;&lt;br /&gt;
&lt;br/&gt;&lt;/p&gt;
&lt;div style="text-align: center"&gt;&lt;a href="/images/webscrape1.jpg"&gt;&lt;img height="300" title="webscrape results" src="/images/webscrape1.jpg"/&gt;&lt;/a&gt;&lt;/div&gt;
&lt;p&gt;&lt;br/&gt;&lt;/p&gt;
&lt;p&gt;We can see that Gabor Grothendieck was the most frequent poster to R-help in January 2009.&lt;br /&gt;
&lt;br/&gt;&lt;/p&gt;
&lt;h2&gt;The RCurl package&lt;/h2&gt;
&lt;p&gt;To get more advanced http features such as POST capabilities and https access, you'll need to use the RCurl package.  To do webscraping tasks with the RCurl package use the &lt;code&gt;getURL()&lt;/code&gt; function.  After the data has been acquired via &lt;code&gt;getURL()&lt;/code&gt;, it needs to be restructured and parsed.  The &lt;code&gt;htmlTreeParse()&lt;/code&gt; function from the XML package is tailored for just this task.  Using &lt;code&gt;getURL()&lt;/code&gt; we can access a secure site so we can use the live site as an example this time. &lt;/p&gt;
&lt;p&gt;&lt;code&gt;# Install the RCurl package if necessary&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;install.packages("RCurl", dependencies = TRUE)&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;library("RCurl")&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;# Install the XML package if necessary&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;install.packages("XML", dependencies = TRUE)&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;library("XML")&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;# Get first quarter archives&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;jan09 &lt;- getURL("https://stat.ethz.ch/pipermail/r-help/2009-January/date.html", ssl.verifypeer = FALSE)&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;jan09_parsed &lt;- htmlTreeParse(jan09)&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;# Continue on similar to above&lt;/code&gt;&lt;br /&gt;
&lt;code&gt;...&lt;/code&gt;&lt;br /&gt;
&lt;br/&gt;&lt;/p&gt;
&lt;p&gt;For basic webscraping tasks &lt;code&gt;readLines()&lt;/code&gt; will be enough and avoids over complicating the task.  For more difficult procedures or for tasks requiring other http features &lt;code&gt;getURL()&lt;/code&gt; or other functions from the RCurl package may be required.  For more information on cURL visit the project page &lt;a href="http://curl.haxx.se/" title="cURL project page"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/2PJhZfHumAZ98pGHXmX1UkXJjGs/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/2PJhZfHumAZ98pGHXmX1UkXJjGs/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/2PJhZfHumAZ98pGHXmX1UkXJjGs/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/2PJhZfHumAZ98pGHXmX1UkXJjGs/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ProgrammingR/~4/TRcTSX9JBeY" height="1" width="1"/&gt;</description>
 <comments>http://www.programmingr.com/content/webscraping-using-readlines-and-rcurl#comments</comments>
 <category domain="http://www.programmingr.com/category/stype/articles">Articles</category>
 <pubDate>Tue, 14 Apr 2009 22:56:42 +0000</pubDate>
 <dc:creator>bryan</dc:creator>
 <guid isPermaLink="false">42 at http://www.programmingr.com</guid>
<feedburner:origLink>http://www.programmingr.com/content/webscraping-using-readlines-and-rcurl</feedburner:origLink></item>
<item>
 <title>Helpful statistical references</title>
 <link>http://feedproxy.google.com/~r/ProgrammingR/~3/AliH1X2F24c/helpful-statistical-references</link>
 <description>&lt;!-- google_ad_section_start --&gt;&lt;p&gt;In a &lt;a href="http://www.programmingr.com/content/online-r-programming-resources" title="R-programming resources"&gt;previous article&lt;/a&gt; I provided a list of R programming resources.  As a complement to that post, I've compiled a list of statistically oriented websites that colleagues and I have found useful below.  For the most part, these sites focus on statistics and quantitative research methods rather than programming.  &lt;/p&gt;
&lt;p&gt;This first grouping lists sites that are mostly one-stop-shops for research design and analytical information.  The first two, (and especially the UCLA website) are Tier I statistics/research methods sites.  They are indispensable.  The three remaining sites in this section cover less advanced topics and focus more on basics, but may be helpful for the R user who is more programmer than statistician.&lt;/p&gt;
&lt;p&gt;The second group of sites is comprised of technical references such as statistical dictionaries and notation guides.  The final section list two sites that have detailed information and examples focused on running statistical analyses in R.  Note that the UCLA site also includes many examples using R.  &lt;/p&gt;
&lt;h3&gt;Comprehensive coverage&lt;/h3&gt;
&lt;p&gt;&lt;a href="http://www.ats.ucla.edu/stat/" title="Statistical computing at UCLA"&gt;Statistical computing at UCLA&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://faculty.chass.ncsu.edu/garson/PA765/statnote.htm" title="Statnotes: Topics in Multivariate Analysis, by G. David Garson"&gt;Statnotes: Topics in Multivariate Analysis, by G. David Garson&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.psychstat.missouristate.edu/introbook/sbk00.htm" title="Introductory Statistics: Concepts, models, and applications"&gt;Introductory Statistics: Concepts, models, and applications&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.socialresearchmethods.net/kb/contents.php" title="Social Research Methods Knowledge Base"&gt;Social Research Methods Knowledge Base&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://mathworld.wolfram.com/" title="Wolfram MathWorld"&gt;Wolfram MathWorld&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;Technical References&lt;/h3&gt;
&lt;p&gt;&lt;a href="http://www.statsoft.com/textbook/glosfra.html" title="StatSoft statistical glossary"&gt;StatSoft statistical glossary&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.mathpropress.com/notation/notation.html" title="Glossary of technical notation"&gt;Glossary of technical notation&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.itl.nist.gov/div897/sqg/dads/" title="Dictionary of Algorithms and Data Structures"&gt;Dictionary of Algorithms and Data Structures&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;R specific sites&lt;/h3&gt;
&lt;p&gt;&lt;a href="http://www.jstatsoft.org/" title="Journal of Statistical Software"&gt;Journal of Statistical Software&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.statmethods.net/" title="QuickR"&gt;QuickR&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you know of another site for either R programming or statistics that I've missed, mention it in the comments below and I'll add it to the proper list. &lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/CoxWRi4I5Jqa3d9lF2MU4nWmZHI/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/CoxWRi4I5Jqa3d9lF2MU4nWmZHI/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/CoxWRi4I5Jqa3d9lF2MU4nWmZHI/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/CoxWRi4I5Jqa3d9lF2MU4nWmZHI/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ProgrammingR/~4/AliH1X2F24c" height="1" width="1"/&gt;</description>
 <comments>http://www.programmingr.com/content/helpful-statistical-references#comments</comments>
 <category domain="http://www.programmingr.com/category/stype/articles">Articles</category>
 <category domain="http://www.programmingr.com/category/stype/help">Help</category>
 <pubDate>Tue, 07 Apr 2009 12:49:21 +0000</pubDate>
 <dc:creator>bryan</dc:creator>
 <guid isPermaLink="false">38 at http://www.programmingr.com</guid>
<feedburner:origLink>http://www.programmingr.com/content/helpful-statistical-references</feedburner:origLink></item>
<item>
 <title>Positioning charts with fig and fin</title>
 <link>http://feedproxy.google.com/~r/ProgrammingR/~3/36Z1c5fMvlY/positioning-charts-fig-and-fin</link>
 <description>&lt;!-- google_ad_section_start --&gt;&lt;p&gt;R offers several ways to spatially orient multiple graphs in a single graphing space.  The &lt;code&gt;layout()&lt;/code&gt; function and &lt;code&gt;mfrow&lt;/code&gt;/&lt;code&gt;mfcol&lt;/code&gt; parameter settings are adequate solutions for many tasks and allow the graphing space to be broken up into tabular or matrix-based arrangements.  For more fine grained manipulation, the &lt;code&gt;fig&lt;/code&gt; and &lt;code&gt;fin&lt;/code&gt; parameter settings are available.  This article illustrates the capabilities and use of &lt;code&gt;fig&lt;/code&gt; and &lt;code&gt;fin&lt;/code&gt;.&lt;br /&gt;
&lt;br /&gt;
First we'll create some simulation data to work with:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
# create data&lt;br /&gt;
sim.data &lt;- cbind(replicate(5,runif(8,min=0, max=100)))&lt;br /&gt;
&lt;/code&gt;&lt;br /&gt;
&lt;br /&gt;
The code above results in a matrix object with eight rows and three columns.&lt;br /&gt;
&lt;br /&gt;
The &lt;code&gt;fig&lt;/code&gt; and &lt;code&gt;fin&lt;/code&gt; parameters affect the same graphing elements via different units.  The fig parameter takes normalized device coordinates (NDC) and fin takes dimensions in inches of the device region.  Because the &lt;code&gt;fig&lt;/code&gt; units are generally more user friendly, I will use it in the examples below; however, selecting equivalent dimensions using the &lt;code&gt;fin&lt;/code&gt; would have an identical effect.  Similar to other functions that use NDC to define graphing space, &lt;code&gt;fig&lt;/code&gt; takes a four item vector wherein positions one and three define, in percentages of the device region, the starting points of the x and y axes, respectively, while positions two and four define the end points. The default &lt;code&gt;fig&lt;/code&gt; setting is &lt;code&gt;(0, 1, 0, 1)&lt;/code&gt; and uses the entire device space. The default &lt;code&gt;fig&lt;/code&gt; setting is &lt;code&gt;(0, 1, 0, 1)&lt;/code&gt; and uses the entire device space.  The graph below illustrates the default settings of &lt;code&gt;fig&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
# graph cases by first column using default fig&lt;br /&gt;
# settings of 0 1 0 1 (the full device width and height)&lt;br /&gt;
par(mar=c(2, 2, 1, 1), new = FALSE, cex.axis = .6, mgp = c(0, 0, 0))&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
#open plot&lt;br /&gt;
plot(c(0,100), c(-1,1), type = "n", ylab = "", yaxt = "n", xlab = "")&lt;br /&gt;
points(sim.data[,1], replicate(8, 0), pch = 19, col = 1:8, cex = 1.5)&lt;br /&gt;
# add center reference line&lt;br /&gt;
abline(0,0)&lt;br /&gt;
legend("bottomright", fill = c(1:8), legend = c(1:8), ncol = 4)&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;div style="text-align: center"&gt;&lt;a href="/images/fig/fig1.jpg"&gt;&lt;img height="300" title="fig default" alt="fig default" src="/images/fig/fig1.jpg"/&gt;&lt;/a&gt;&lt;/div&gt;
&lt;p&gt;&lt;br/&gt;&lt;br /&gt;
To make the horizontal dimensions of the graph smaller or to move the graph left or right, adjust the starting and ending x coordinates, given by the first and second positions of the &lt;code&gt;fig&lt;/code&gt; value vector.  To make the vertical dimensions of the graph smaller or to move the graph up or down, adjust the staring and ending y coordinates given in the third and fourth positions as below.  &lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
# decrease horizontal span&lt;br /&gt;
par(fig=c(0, 1, .2, .8))&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
#open plot&lt;br /&gt;
plot(c(0,100), c(-1,1), type = "n", ylab = "", yaxt = "n", xlab = "")&lt;br /&gt;
points(sim.data[,1], replicate(8, 0), pch = 19, col = 1:8, cex = 1.5)&lt;br /&gt;
# add center reference line&lt;br /&gt;
abline(0,0)&lt;br /&gt;
legend("bottomright", fill = c(1:8), legend = c(1:8), ncol = 4)&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;div style="text-align: center"&gt;&lt;a href="/images/fig/fig2.jpg"&gt;&lt;img height="300" title="fig thin" alt="fig thin" src="/images/fig/fig2.jpg"/&gt;&lt;/a&gt;&lt;/div&gt;
&lt;p&gt;&lt;br/&gt;&lt;br /&gt;
It is possible to resize and move a single graph to any spatial orientation on the graphing device using the approach above.  Additionally, you can also use this method to add multiple graphs of various sizes to a single device:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
# place graph one in the bottom left&lt;br /&gt;
par(fig=c(0, .25, 0, .25), mar=c(2,.5,1,.5), mgp=c(0, 1, 0))&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
#open plot&lt;br /&gt;
plot(c(0,100), c(-1,1), type = "n", ylab = "", yaxt = "n", xlab = "")&lt;br /&gt;
points(sim.data[,1], replicate(8, 0), pch = 19, col = 1:8)&lt;br /&gt;
# add center reference line&lt;br /&gt;
abline(0,0)&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
# place graph two in the top right&lt;br /&gt;
# set graphing parameters for next plot and set new parameter to TRUE&lt;br /&gt;
par(fig=c(.75, 1, .75, 1), new = TRUE)&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
#open plot&lt;br /&gt;
plot(c(0,100), c(-1,1), type = "n", ylab = "", yaxt = "n", xlab = "")&lt;br /&gt;
points(sim.data[,2], replicate(8, 0), pch = 19, col = 1:8)&lt;br /&gt;
# add center reference line&lt;br /&gt;
abline(0,0)&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
# place main graph in the center&lt;br /&gt;
# set graphing parameters for next plot and set new parameter to TRUE&lt;br /&gt;
par(fig=c(.25, .75, .25, .75), new = TRUE)&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;br /&gt;
#open plot&lt;br /&gt;
plot(c(0,100), c(-1,1), type = "n", ylab = "", yaxt = "n", xlab = "")&lt;br /&gt;
points(sim.data[,3], replicate(8, 0), pch = 19, col = 1:8, cex = 1.5)&lt;br /&gt;
# add center reference line&lt;br /&gt;
abline(0,0)&lt;br /&gt;
legend("bottomright", fill = c(1:8), legend = c(1:8), ncol = 4)&lt;br /&gt;
&lt;/code&gt;&lt;/p&gt;
&lt;div style="text-align: center"&gt;&lt;a href="/images/fig/fig3.jpg"&gt;&lt;img height="300" title="fig multiple" alt="fig multiple" src="/images/fig/fig3.jpg"/&gt;&lt;/a&gt;&lt;/div&gt;
&lt;p&gt;&lt;br/&gt;&lt;br /&gt;
For simplicity I have mostly avoided labels and titles in these graphs; however they can be added and manipulated as they would be without the use of &lt;code&gt;fig&lt;/code&gt; or &lt;code&gt;fin&lt;/code&gt;.&lt;/p&gt;
&lt;!-- google_ad_section_end --&gt;
&lt;p&gt;&lt;a href="http://feedads.g.doubleclick.net/~a/ybWINVoMjkDzsKhofrXzhv7vnvI/0/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/ybWINVoMjkDzsKhofrXzhv7vnvI/0/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;br/&gt;
&lt;a href="http://feedads.g.doubleclick.net/~a/ybWINVoMjkDzsKhofrXzhv7vnvI/1/da"&gt;&lt;img src="http://feedads.g.doubleclick.net/~a/ybWINVoMjkDzsKhofrXzhv7vnvI/1/di" border="0" ismap="true"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/ProgrammingR/~4/36Z1c5fMvlY" height="1" width="1"/&gt;</description>
 <comments>http://www.programmingr.com/content/positioning-charts-fig-and-fin#comments</comments>
 <category domain="http://www.programmingr.com/category/stype/articles">Articles</category>
 <category domain="http://www.programmingr.com/category/stype/help">Help</category>
 <pubDate>Tue, 10 Feb 2009 03:14:19 +0000</pubDate>
 <dc:creator>bryan</dc:creator>
 <guid isPermaLink="false">24 at http://www.programmingr.com</guid>
<feedburner:origLink>http://www.programmingr.com/content/positioning-charts-fig-and-fin</feedburner:origLink></item>
</channel>
</rss>

