<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:blogger="http://schemas.google.com/blogger/2008" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;DkMCQH0yeSp7ImA9WhFSEk8.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998</id><updated>2013-06-14T08:41:01.391-07:00</updated><category term="random forest" /><category term="packages" /><category term="time-series." /><category term="finance" /><category term="forecasting" /><category term="time series cross-validation" /><category term="intro" /><category term="model building" /><category term="Global Warming" /><category term="caret" /><category term="web scraping" /><category term="data acquisition" /><category term="time series" /><category term="kaggle" /><category term="Stocks" /><category term="error metrics" /><category term="Great Australian Sheep Decline" /><category term="meta" /><category term="R graphics" /><category term="Friday" /><category term="recessions" /><category term="time-series" /><category term="backtesting" /><category term="DEoptim" /><category term="cross-validation" /><category term="optimization" /><category term="pokeR" /><category term="big list" /><category term="predictive modeling" /><category term="feature selection" /><category term="heritage prize" /><category term="R" /><title>Modern Toolmaking</title><subtitle type="html">Practical tools for predictive modeling, data science, machine learning and web scraping</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://moderntoolmaking.blogspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>31</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/ModernToolMaking" /><feedburner:info uri="moderntoolmaking" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry gd:etag="W/&quot;CUcBRX49fSp7ImA9WhBQFEQ.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-519105528515483607</id><published>2013-03-16T20:34:00.002-07:00</published><updated>2013-03-16T21:04:14.065-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-03-16T21:04:14.065-07:00</app:edited><title>caretEnsemble Classification example</title><content type="html">Here's a quick demo of how to fit a binary classification model with caretEnsemble. &amp;nbsp;Please note that I haven't spent as much time debugging caretEnsemble for&amp;nbsp;classification&amp;nbsp;models, so there's probably more bugs than my last post. &amp;nbsp;Also note that&amp;nbsp;multi class&amp;nbsp;models are not yet supported.&lt;br /&gt;
&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/zachmayer/5179418.js"&gt;&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
Right now, this code fails for me if I try a model like a nnet or an SVM for stacking, so there's clearly bugs to fix.&lt;br /&gt;
&lt;br /&gt;
The greedy model&amp;nbsp;relies&amp;nbsp;100% on the gbm, which makes sense as the gbm has an AUC of 1 on the training set. &amp;nbsp;The linear model uses all of the models, and&amp;nbsp;achieves&amp;nbsp;an AUC of .5. &amp;nbsp;This is a little weird, as the gbm, rf, SVN, and knn all achieve an AUC of close to 1.0 on the&amp;nbsp;training&amp;nbsp;set, and I would have expected the linear model to focus on these predictions. I'm not sure if this is a bug, or a failure of my stacking model.&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/cx5p2XTEywk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/519105528515483607/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2013/03/caretensemble-classification-example.html#comment-form" title="8 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/519105528515483607?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/519105528515483607?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/cx5p2XTEywk/caretensemble-classification-example.html" title="caretEnsemble Classification example" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>8</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2013/03/caretensemble-classification-example.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0MEQnkzeCp7ImA9WhBQEUU.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-7426198950183713714</id><published>2013-03-13T07:36:00.001-07:00</published><updated>2013-03-13T07:36:43.780-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-03-13T07:36:43.780-07:00</app:edited><title>New package for ensembling R models</title><content type="html">&lt;br /&gt;
I've written a new R package called&amp;nbsp;&lt;a href="https://github.com/zachmayer/caretEnsemble"&gt;caretEnsemble&lt;/a&gt;&amp;nbsp;for creating ensembles of&amp;nbsp;&lt;a href="http://cran.r-project.org/web/packages/caret/index.html"&gt;caret models&lt;/a&gt;&amp;nbsp;in R. &amp;nbsp;It currently works well for regression models, and I've written some preliminary support for binary classification models.&lt;br /&gt;
&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
At this point, I've got 2 different&amp;nbsp;algorithms&amp;nbsp;for combining models:&lt;br /&gt;
&lt;br /&gt;
1. Greedy stepwise ensembles (returns a weight for each model)&lt;br /&gt;
2. Stacks of caret models&lt;br /&gt;
&lt;br /&gt;
(You can also manually specify weights for a greedy&amp;nbsp;ensemble)&lt;br /&gt;
&lt;br /&gt;
The greedy algorithm&amp;nbsp;is based on the work of &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.60.2859&amp;amp;rep=rep1&amp;amp;type=pdf"&gt;Caruana et al., 2004&lt;/a&gt;, and inspired by the &lt;a href="https://github.com/mewo2/medley"&gt;medley package&lt;/a&gt; here on github. &amp;nbsp;The stacking&amp;nbsp;algorithm&amp;nbsp;simply builds a second caret model on top of the existing models (using their predictions as input), and employs all of the&amp;nbsp;flexibility&amp;nbsp;of the caret package.&lt;br /&gt;
&lt;br /&gt;
All the models in the ensemble must use the same training/test folds. &amp;nbsp;Both&amp;nbsp;algorithms&amp;nbsp;use the out-of-sample predictions to find the weights and train the stack. Here's a brief script demonstrating how to use the package:&lt;br /&gt;&lt;script src="https://gist.github.com/zachmayer/5152157.js"&gt;&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
Please feel free to submit any comments here or on github. &amp;nbsp;I'd also be happy to include any patches you feel like submitting. &amp;nbsp;In particular, I could use some help writing support for multi-class models, writing more tests, and fixing bugs.&lt;br /&gt;
&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/WGvyK22LOXY" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/7426198950183713714/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2013/03/new-package-for-ensembling-r-models.html#comment-form" title="32 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/7426198950183713714?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/7426198950183713714?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/WGvyK22LOXY/new-package-for-ensembling-r-models.html" title="New package for ensembling R models" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>32</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2013/03/new-package-for-ensembling-r-models.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkUHRHs_fSp7ImA9WhNaEEQ.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-8800239507461677408</id><published>2013-01-24T16:22:00.001-08:00</published><updated>2013-01-24T22:10:35.545-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2013-01-24T22:10:35.545-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="time series" /><category scheme="http://www.blogger.com/atom/ns#" term="cross-validation" /><category scheme="http://www.blogger.com/atom/ns#" term="caret" /><category scheme="http://www.blogger.com/atom/ns#" term="time series cross-validation" /><title>Time series cross-validation 5</title><content type="html">The &lt;a href="http://cran.r-project.org/web/packages/caret/news.html"&gt;caret package for R&lt;/a&gt; now supports time series cross-validation! &amp;nbsp;(Look for&amp;nbsp;version 5.15-052 in the news file). &amp;nbsp;You can use the&amp;nbsp;createTimeSlices function to do time-series cross-validation with a fixed window, as well as a growing window. &amp;nbsp;This function generates a list of indexes for the training set, as well as a list of indexes for the test set, which you can then pass to the trainControl object.&lt;br /&gt;
&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
Caret does not currently support univariate time series models (like arima, auto.arima and ets), but perhaps that functionality is&amp;nbsp;coming&amp;nbsp;in the future? &amp;nbsp;I'd also love to see someone write a timeSeriesSummary function for caret that calculates error at each horizon in the test set and a createTimeResamples function, perhaps using the&amp;nbsp;&lt;a href="http://cran.r-project.org/web/packages/meboot/index.html"&gt;Maximum Entropy Bootstrap&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Here's a quick demo of how you might use this new functionality:&lt;br /&gt;
&lt;script src="https://gist.github.com/4630129.js"&gt;&lt;/script&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/8y7CWAZOwpk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/8800239507461677408/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2013/01/time-series-cross-validation-5.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/8800239507461677408?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/8800239507461677408?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/8y7CWAZOwpk/time-series-cross-validation-5.html" title="Time series cross-validation 5" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>1</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2013/01/time-series-cross-validation-5.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Dk8NRHY-eip7ImA9WhJSFUQ.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-9085482304525032851</id><published>2012-07-06T10:08:00.001-07:00</published><updated>2012-07-06T10:14:55.852-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-07-06T10:14:55.852-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="error metrics" /><category scheme="http://www.blogger.com/atom/ns#" term="predictive modeling" /><category scheme="http://www.blogger.com/atom/ns#" term="caret" /><category scheme="http://www.blogger.com/atom/ns#" term="kaggle" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Error metrics for multi-class problems in R: beyond Accuracy and Kappa</title><content type="html">The caret package for R provides a variety of error metrics for regression models and 2-class classification models, but only calculates Accuracy and Kappa for multi-class models. &amp;nbsp;Therefore, I wrote the following function to allow &lt;a href="http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=caret:train"&gt;caret:::train&lt;/a&gt; to calculate a wide variety of error metrics for multi-class problems:&lt;br /&gt;
&lt;br /&gt;
&lt;div&gt;
&lt;script src="https://gist.github.com/3061272.js?file=multiclass.R"&gt;
&lt;/script&gt;&lt;/div&gt;
&lt;div&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;
This function was prompted by &lt;a href="http://stats.stackexchange.com/q/31579/2817"&gt;a question on cross-validated&lt;/a&gt;, asking what the optimal value of k is for a knn model fit to the iris dataset. &amp;nbsp;I wanted to look at statistics besides accuracy and kappa, so I wrote a wrapper function for &lt;a href="http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=SDMTools:confusion.matrix"&gt;caret:::confusionMatrix&lt;/a&gt; and auc and logLoss from the Metric packages. &amp;nbsp;Use the following code to fit a knn model to the iris dataset, aggregate all of the metrics, and save a plot for each metric to a pdf file:&lt;br /&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;script src="https://gist.github.com/3061272.js?file=testit.R"&gt;
&lt;/script&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
This demonstrates that, depending on what metric you use, you will end up with a different model. &amp;nbsp;For example, Accuracy seems to peak around 17:&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-FuVT0CPBKEY/T_catMEw6tI/AAAAAAAADsU/QiBbSiNc7PU/s1600/Accuracy.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="640" src="http://2.bp.blogspot.com/-FuVT0CPBKEY/T_catMEw6tI/AAAAAAAADsU/QiBbSiNc7PU/s640/Accuracy.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
While AUC and logLoss seem to peak around 6:&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-B6P3-Y3xjvk/T_ca1H_gY1I/AAAAAAAADsc/iqHbFMjovFs/s1600/ROC.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="640" src="http://3.bp.blogspot.com/-B6P3-Y3xjvk/T_ca1H_gY1I/AAAAAAAADsc/iqHbFMjovFs/s640/ROC.png" width="640" /&gt;&lt;/a&gt;&lt;a href="http://3.bp.blogspot.com/-Ol4wk03HDdI/T_ca1-b3bII/AAAAAAAADsk/J85bRONkgTU/s1600/logLoss.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="640" src="http://3.bp.blogspot.com/-Ol4wk03HDdI/T_ca1-b3bII/AAAAAAAADsk/J85bRONkgTU/s640/logLoss.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
You can also increase the number of cross-validation repeats, or use a different method of&amp;nbsp;re-sampling, such as&amp;nbsp;bootstrap&amp;nbsp;re-sampling.&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/pDhlbhmMFT8" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/9085482304525032851/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2012/07/error-metrics-for-multi-class-problems.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/9085482304525032851?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/9085482304525032851?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/pDhlbhmMFT8/error-metrics-for-multi-class-problems.html" title="Error metrics for multi-class problems in R: beyond Accuracy and Kappa" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-FuVT0CPBKEY/T_catMEw6tI/AAAAAAAADsU/QiBbSiNc7PU/s72-c/Accuracy.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2012/07/error-metrics-for-multi-class-problems.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkANQ3s8cCp7ImA9WhVaFE4.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-2439596149656717424</id><published>2012-06-11T11:15:00.000-07:00</published><updated>2012-06-11T11:19:52.578-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-06-11T11:19:52.578-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="forecasting" /><category scheme="http://www.blogger.com/atom/ns#" term="model building" /><category scheme="http://www.blogger.com/atom/ns#" term="cross-validation" /><category scheme="http://www.blogger.com/atom/ns#" term="finance" /><category scheme="http://www.blogger.com/atom/ns#" term="backtesting" /><category scheme="http://www.blogger.com/atom/ns#" term="time-series" /><title>Time series cross-validation 4: forecasting the S&amp;P 500</title><content type="html">I finally got around to publishing my&amp;nbsp;&lt;a href="https://github.com/zachmayer/cv.ts"&gt;time series cross-validation package&lt;/a&gt; to github, and I plan to push it out to CRAN &amp;nbsp;shortly.&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
You can clone the repo using github for &lt;a href="http://mac.github.com/"&gt;mac&lt;/a&gt;,&amp;nbsp;for &lt;a href="https://github.com/blog/1127-github-for-windows"&gt;windows&lt;/a&gt;, or linux, and then run the following script to check it out:&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;This script downloads monthly data for S&amp;amp;P 500 (adjusted for splits and dividends), and, for each month form 1995 to the present, fits a&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/Forecasting#Na.C3.AFve_approach"&gt;naive model&lt;/a&gt;, an&amp;nbsp;&lt;a href="http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=forecast:auto.arima"&gt;auto.arima()&lt;/a&gt;&amp;nbsp;model, and an&amp;nbsp;&lt;a href="http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=forecast:ets"&gt;ets()&lt;/a&gt;&amp;nbsp;model to the past 5 year's worth of data and uses those models to predict&amp;nbsp;S&amp;amp;P 500 prices for the next 12 months (note that the progress bar doesn't update if you register a parallel&amp;nbsp;backend. &amp;nbsp;I can't figure out how to fix this bug):
&lt;br /&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;script src="https://gist.github.com/2911611.js?file=cv.ts 4.R"&gt;
&lt;/script&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div&gt;
The naive model outperforms the arima and exponential smoothing models, both of which take into account seasonal patterns, trends, and mean-reversion! &amp;nbsp;Furthermore, we're not just using any arima/exponential smoothing model: at each step we're selecting the best model, based on the last 5 years worth of data. &amp;nbsp;(The ets model slightly outperforms the naive model at the 3 month horizon, but not the 2 month or 4 month horizons).&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-J_lEZj7UzpY/T9Y1BJU-24I/AAAAAAAADr8/Rv_R7_SwxNM/s1600/Rplot03.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://4.bp.blogspot.com/-J_lEZj7UzpY/T9Y1BJU-24I/AAAAAAAADr8/Rv_R7_SwxNM/s640/Rplot03.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Forecasting equities prices is hard!&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/0onDRU5p0U0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/2439596149656717424/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2012/06/time-series-cross-validation-4.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/2439596149656717424?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/2439596149656717424?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/0onDRU5p0U0/time-series-cross-validation-4.html" title="Time series cross-validation 4: forecasting the S&amp;P 500" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-J_lEZj7UzpY/T9Y1BJU-24I/AAAAAAAADr8/Rv_R7_SwxNM/s72-c/Rplot03.png" height="72" width="72" /><thr:total>2</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2012/06/time-series-cross-validation-4.html</feedburner:origLink></entry><entry gd:etag="W/&quot;C0EHQnw-eip7ImA9WhRUFU0.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-6811770712018337569</id><published>2012-01-23T06:53:00.000-08:00</published><updated>2012-01-25T06:27:13.252-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-01-25T06:27:13.252-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="packages" /><category scheme="http://www.blogger.com/atom/ns#" term="DEoptim" /><category scheme="http://www.blogger.com/atom/ns#" term="optimization" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>My first R package: parallel differential evolution</title><content type="html">UPDATE: a better parallel algorythm will be included in a future version of DEoptim, so I've removed my package from CRAN. &amp;nbsp;You can still use the code from this post, but keep Josh's comments in mind.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Last night I was working on a difficult optimization problems, using the wonderful &lt;a href="http://cran.r-project.org/web/packages/DEoptim/index.html"&gt;DEoptim&lt;/a&gt;&amp;nbsp;package for R.&amp;nbsp;Unfortunately, the optimization was taking a long time, so I thought I'd speed it up using a &lt;a href="http://cran.r-project.org/web/packages/foreach/index.html"&gt;foreach&lt;/a&gt;&amp;nbsp;loop, which resulted in the following function:&lt;br /&gt;
&lt;script src="https://gist.github.com/1663393.js?file=1.parDEoptim.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
Here's what's going on: I divide the bounds for each parameter into n segments, and use a foreach loop to run DEoptim on each segment, collect the results of the loop, and then return the optimization results for the segment with the lowest value of the objective function. &amp;nbsp;Additionally, I defined a "parDEoptim" class to make it easier to combine the results during the foreach loop. &amp;nbsp;All of the work is still being done by the&amp;nbsp;&lt;a href="http://cran.r-project.org/web/packages/DEoptim/index.html"&gt;DEoptim&lt;/a&gt;&amp;nbsp;algorithm. &amp;nbsp;All I've done is split up the problem into several chunks.&lt;br /&gt;
&lt;br /&gt;
Here is an example, straight out of the DEoptim documentation:&lt;br /&gt;
&lt;script src="https://gist.github.com/1663393.js?file=2.%20Example.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
In theory, on a 20-core machine, this should run a bit faster than the serial example. &amp;nbsp;Note that you may need to set itermax for the parallel run at a higher value than (itermax for the serial run)/(number of segments), as you want to make sure the&amp;nbsp;algorithm&amp;nbsp;can find the minimum of each segment. &amp;nbsp;Also note that, in this example, there are 20 segments on the interval c(-10,-10) to c(10,10), which means that 2 of the segments have&amp;nbsp;boundaries&amp;nbsp;at c(1,1), which is the global minimum of the function. &amp;nbsp;The DEoptim&amp;nbsp;algorithm&amp;nbsp;has no trouble finding a solution at the boundary of the parameter space, which is why it's so easy to parallelize.&lt;br /&gt;
&lt;br /&gt;
&lt;a href="http://stackoverflow.com/a/8967426/345660"&gt;Rumor has it that the next version of DEoptim&lt;/a&gt; will include foreach parallelization, but if you can't wait until then, &lt;a href="http://cran.r-project.org/web/packages/parDEoptim/index.html"&gt;I rolled up the above function into an R package&lt;/a&gt; and posted it to CRAN. &amp;nbsp;Let me know what you think!&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/MaPOQSNzRuQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/6811770712018337569/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2012/01/my-first-r-package-parallel.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/6811770712018337569?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/6811770712018337569?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/MaPOQSNzRuQ/my-first-r-package-parallel.html" title="My first R package: parallel differential evolution" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>1</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2012/01/my-first-r-package-parallel.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkANRng5eSp7ImA9WhVaFE4.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-2207986019344795590</id><published>2011-12-29T10:47:00.000-08:00</published><updated>2012-06-11T11:19:57.621-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-06-11T11:19:57.621-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="forecasting" /><category scheme="http://www.blogger.com/atom/ns#" term="model building" /><category scheme="http://www.blogger.com/atom/ns#" term="cross-validation" /><category scheme="http://www.blogger.com/atom/ns#" term="finance" /><category scheme="http://www.blogger.com/atom/ns#" term="backtesting" /><category scheme="http://www.blogger.com/atom/ns#" term="time-series" /><title>Benchmarking time series models</title><content type="html">This is a quick post on the importance of benchmarking time-series forecasts. &amp;nbsp;First we need to reload the functions from my &lt;a href="http://moderntoolmaking.blogspot.com/2011/11/functional-and-parallel-time-series.html"&gt;last&lt;/a&gt; few &lt;a href="http://moderntoolmaking.blogspot.com/2011/11/time-series-cross-validation-2.html"&gt;posts&lt;/a&gt; on &lt;a href="http://moderntoolmaking.blogspot.com/2011/12/time-series-cross-validation-3.html"&gt;times-series cross-validation&lt;/a&gt;. &amp;nbsp;(I copied the relevant code at the bottom of this post so you don't have to find it).&lt;br /&gt;
&lt;br /&gt;
Next, we need to load data for the S&amp;amp;P 500. &amp;nbsp;To simplify things, and allow us to explore seasonality effects, I'm going to load monthly data, back to 1980.&lt;br /&gt;
&lt;script src="https://gist.github.com/1535323.js?file=2.%20Load%20Data.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;The object "Data" has monthly closing prices for the S&amp;amp;P 500 back until 1980. &amp;nbsp;Next, we cross validate 3 time series forecasting models: &amp;nbsp;&lt;a href="http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=forecast:auto.arima"&gt;auto.arima&lt;/a&gt;, from the &lt;a href="http://cran.r-project.org/web/packages/forecast/index.html"&gt;forecast package&lt;/a&gt;, &amp;nbsp;a mean forecast, that returns the mean value over the last year, and a naive forecast, which assumes the next value of the series will be equal to the present value. &amp;nbsp;These last 2 forecasts serve as benchmarks, to help determine if auto.arima would be useful for forecasting the S&amp;amp;P 500. &amp;nbsp;Also note that I'm using &lt;a href="http://en.wikipedia.org/wiki/Bayesian_information_criterion"&gt;BIC&lt;/a&gt; as a criteria for selecting arima models, and I have trace on so you can see the results of the model selection process.&lt;br /&gt;
&lt;script src="https://gist.github.com/1535323.js?file=3.%20Cross-Validate.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
After the 3 models finish cross-validating, it is useful to plot their forecast&amp;nbsp;errors&amp;nbsp;at different horizons. &amp;nbsp;As you can see, auto.arima performs much better than the mean model, but is&amp;nbsp;constantly&amp;nbsp;worse than the naive model. &amp;nbsp;This illustrates the importance of benchmarking forecasts. &amp;nbsp;If you can't&amp;nbsp;constantly&amp;nbsp;beat a naive forecast, there's no reason to waste processing power on a useless model.&lt;br /&gt;
&lt;script src="https://gist.github.com/1535323.js?file=4.%20Plot%20results"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-JaOZAidhkvc/Tvy1kMdT_jI/AAAAAAAACs4/0fLPE7MgBOs/s1600/Untitled.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="512" src="http://1.bp.blogspot.com/-JaOZAidhkvc/Tvy1kMdT_jI/AAAAAAAACs4/0fLPE7MgBOs/s640/Untitled.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;br /&gt;
Finally, here is all the code in one place. &amp;nbsp;Note that you can parallelize the cv.ts function by loading your favorite foreach backend.&lt;br /&gt;
&lt;script src="https://gist.github.com/1535323.js?file=5.%20All%20together%20now.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;script src="https://gist.github.com/1535323.js?file=4.%20Plot%20results"&gt;
&lt;/script&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/dw8AiQYImkw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/2207986019344795590/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/12/benchmarking-time-series-models.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/2207986019344795590?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/2207986019344795590?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/dw8AiQYImkw/benchmarking-time-series-models.html" title="Benchmarking time series models" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-JaOZAidhkvc/Tvy1kMdT_jI/AAAAAAAACs4/0fLPE7MgBOs/s72-c/Untitled.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/12/benchmarking-time-series-models.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ak8EQHc-eyp7ImA9WhVaFE4.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-4938464697721256066</id><published>2011-12-12T08:37:00.000-08:00</published><updated>2012-06-11T11:20:01.953-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-06-11T11:20:01.953-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="forecasting" /><category scheme="http://www.blogger.com/atom/ns#" term="model building" /><category scheme="http://www.blogger.com/atom/ns#" term="cross-validation" /><category scheme="http://www.blogger.com/atom/ns#" term="finance" /><category scheme="http://www.blogger.com/atom/ns#" term="backtesting" /><category scheme="http://www.blogger.com/atom/ns#" term="time-series" /><title>Time series cross-validation 3</title><content type="html">I've updated my &lt;a href="http://moderntoolmaking.blogspot.com/2011/11/time-series-cross-validation-2.html"&gt;time-series cross validation&amp;nbsp;algorithm&lt;/a&gt;&amp;nbsp;to fix some bugs and allow for a possible xreg term. &amp;nbsp; &amp;nbsp; This allows for cross-validation of multivariate models, so long as they are specified as a function with the following paramters: x (the series to model), xreg (independent&amp;nbsp;variables, optional), newxreg (xregs for the forecast), and h (the number of periods to forecast). &amp;nbsp;Note that h should equal the number of rows in the xreg matrix. &amp;nbsp;Also note that you need to forecast the xreg object BEFORE forecasting your x object. &amp;nbsp;For example, if you wish to forecast 12 months into the future, your xreg object should have 12 extra rows.&lt;br /&gt;
&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;Here is the source code for the new function:&lt;br /&gt;
&lt;script src="https://gist.github.com/1468089.js?file=cv.ts.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
And here is an example, using a linear model with xregs:&lt;br /&gt;
&lt;script src="https://gist.github.com/1468089.js?file=test.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
I am particularly excited about this code because it will allow me to apply&amp;nbsp;arbitrary&amp;nbsp;machine learning&amp;nbsp;algorithms&amp;nbsp;to forecasting problems. &amp;nbsp;For example, I could create an xreg matrix of lags and use a support vector machine, neural network, or random forest to make 1-step forecasts. &amp;nbsp;I am planning to release this code as a package on CRAN, once I finish the documentation. &amp;nbsp;I'm also planning to re-work the function a bit to return an S3 class, containing:&lt;br /&gt;
&lt;br /&gt;
&lt;ol&gt;
&lt;li&gt;Predicted values at each forecast horizon, including beyond the length of the input time series.&lt;/li&gt;
&lt;li&gt;Actual values at each forecast horizon, for easy comparison to #1.&lt;/li&gt;
&lt;li&gt;Matrix of average error at each horizon.&lt;/li&gt;
&lt;li&gt;The final model.&lt;/li&gt;
&lt;li&gt;Forecasts using the final model, from the last&amp;nbsp;observation&amp;nbsp;of x to the "max horizon".&lt;/li&gt;
&lt;li&gt;A print method that will show #3.&lt;/li&gt;
&lt;li&gt;A plot method that will plot #3.&lt;/li&gt;
&lt;/ol&gt;
&lt;div&gt;
Let me know if you have any suggestions or spot any bugs!&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/-8PZ4LZdJoI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/4938464697721256066/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/12/time-series-cross-validation-3.html#comment-form" title="9 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/4938464697721256066?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/4938464697721256066?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/-8PZ4LZdJoI/time-series-cross-validation-3.html" title="Time series cross-validation 3" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>9</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/12/time-series-cross-validation-3.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ak8ESXc5eyp7ImA9WhVaFE4.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-7130418460671365820</id><published>2011-12-05T06:16:00.001-08:00</published><updated>2012-06-11T11:20:08.923-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-06-11T11:20:08.923-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="pokeR" /><title>A pure R poker hand evaluator</title><content type="html">There's already &lt;a href="http://www.codingthewheel.com/archives/poker-hand-evaluator-roundup"&gt;a lot of great&lt;/a&gt;&amp;nbsp;posts &lt;a href="http://www.suffecool.net/poker/evaluator.html"&gt;out there&lt;/a&gt; about poker hand evaluators, so I'll keep this short. &amp;nbsp;Kenneth J. Shackleton recently released a very slick 5-card and 7-card poker hand evaluator called &lt;a href="https://github.com/SpecialK/SpecialKEval"&gt;SpecialK&lt;/a&gt;. &amp;nbsp;This evaluator is&amp;nbsp;licensed&amp;nbsp;under &lt;a href="https://github.com/SpecialK/SpecialKEval/blob/master/GPLv3LICENSE.rtf"&gt;GPL 3&lt;/a&gt;, and is described in detail in 2 blog posts: &lt;a href="http://specialk-coding.blogspot.com/2010/04/texas-holdem-7-card-evaluator_23.html"&gt;part 1&lt;/a&gt; and &lt;a href="http://specialk-coding.blogspot.com/2011/02/texas-holdem-7-card-evaluator-part-ii.html"&gt;part 2&lt;/a&gt;. &amp;nbsp;Since the provided code is open source, I felt free to hack around with it a bit, and ported the python source to R.&lt;br /&gt;
&lt;div&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;
You can &lt;a href="https://gist.github.com/1433677"&gt;download my code from github&lt;/a&gt;, and save it to&amp;nbsp;~/SpecialK/R. &amp;nbsp;Run the following script to initialize the&amp;nbsp;evaluator&amp;nbsp;and test it out. &amp;nbsp;Higher numbers=better hands.&lt;/div&gt;
&lt;div&gt;
&lt;script src="https://gist.github.com/1433677.js?file=test.R"&gt;
&lt;/script&gt;&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
My R code is almost identical to the python source, as all I did was change python classes to R lists and make the lists 1-indexed (R) rather than 0-indexed (python). &amp;nbsp;I also used the wonderful &lt;a href="http://cran.r-project.org/web/packages/bitops/index.html"&gt;bitops package&lt;/a&gt; for bitwise&amp;nbsp;operations. Obviously this code should be vectorized, but I don't have time to do that right now. &amp;nbsp;I also used the "compiler" package to speed things up somewhat, but the "SevenEval" file still takes a long time to load.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Right now, I'm getting about 18,000 hands per second on my core i5 laptop, which is pretty crappy compared to the 250 million hands per second&amp;nbsp;Shackleton&amp;nbsp;reports for the C++ version. &amp;nbsp;Vectorization should further increase&amp;nbsp;performance, but I don't think pure R is ever going to approach C++ in terms of raw speed. &amp;nbsp;While this was a fun port to make, I think this code is an obvious&amp;nbsp;candidate&amp;nbsp;for a re-write using the &lt;a href="http://dirk.eddelbuettel.com/code/rcpp.html"&gt;Rcpp package&lt;/a&gt;.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
I'm interested to see how far the pure R code can be optimized.&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/al3b1oY6NcI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/7130418460671365820/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/12/pure-r-poker-hand-evaluator.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/7130418460671365820?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/7130418460671365820?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/al3b1oY6NcI/pure-r-poker-hand-evaluator.html" title="A pure R poker hand evaluator" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>0</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/12/pure-r-poker-hand-evaluator.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ak8ASXc6eip7ImA9WhVaFE4.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-2528227784416056842</id><published>2011-11-22T07:12:00.001-08:00</published><updated>2012-06-11T11:20:48.912-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2012-06-11T11:20:48.912-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="forecasting" /><category scheme="http://www.blogger.com/atom/ns#" term="model building" /><category scheme="http://www.blogger.com/atom/ns#" term="cross-validation" /><category scheme="http://www.blogger.com/atom/ns#" term="finance" /><category scheme="http://www.blogger.com/atom/ns#" term="backtesting" /><category scheme="http://www.blogger.com/atom/ns#" term="time-series" /><title>Time series cross-validation 2</title><content type="html">In my &lt;a href="http://moderntoolmaking.blogspot.com/2011/11/functional-and-parallel-time-series.html"&gt;previous post&lt;/a&gt;, I shared a function for parallel time-series cross-validation, based on &lt;a href="http://robjhyndman.com/researchtips/tscvexample/"&gt;Rob Hyndman's code&lt;/a&gt;. &amp;nbsp;I thought I'd expand on that example a little bit, and share some additional wrapper functions I wrote to test other forecasting&amp;nbsp;algorithms. &amp;nbsp;Before you try this at home, be sure to load the&amp;nbsp;&lt;span class="Apple-style-span" style="background-color: white; font-family: 'Bitstream Vera Sans Mono', 'Courier New', monospace; font-size: 12px; line-height: 16px; white-space: pre;"&gt;cv.ts &lt;/span&gt;and&amp;nbsp;&lt;span class="Apple-style-span" style="background-color: white; font-family: 'Bitstream Vera Sans Mono', 'Courier New', monospace; font-size: 12px; line-height: 16px; white-space: pre;"&gt;tsSummary &lt;/span&gt;functions from my &lt;a href="http://moderntoolmaking.blogspot.com/2011/11/functional-and-parallel-time-series.html"&gt;last post&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;script src="https://gist.github.com/1383028.js?file=additional%20methods.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
These functions add &lt;a href="http://www.oga-lab.net/RGM2/func.php?rd_id=forecast:rw.f"&gt;random walk models&lt;/a&gt;, the &lt;a href="http://www.oga-lab.net/RGM2/func.php?rd_id=forecast:theta"&gt;theta method&lt;/a&gt;, &lt;a href="http://www.oga-lab.net/RGM2/func.php?rd_id=forecast:forecast.StructTS"&gt;structural time series&lt;/a&gt;, &lt;a href="http://www.oga-lab.net/RGM2/func.php?rd_id=stats:stl"&gt;seasonal decomposition&lt;/a&gt;, &amp;nbsp;and &lt;a href="http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=forecast:meanf"&gt;simple mean forecasts&lt;/a&gt;&amp;nbsp;to our cross-validation repertoire. &amp;nbsp;The following code fits each of these models to the example dataset, and charts their&amp;nbsp;accuracies&amp;nbsp;out to a forecast horizon of 12 months. &amp;nbsp;Note that none of this code runs in parallel, but if you wish, you can parallelize things by loading your favorite &lt;a href="http://cran.r-project.org/web/packages/foreach/index.html"&gt;foreach&lt;/a&gt;&amp;nbsp;backend. &amp;nbsp;I would suggest running meanf, rwf, thetaf, and the linear model before loading a parallel backend, as all of these methods run very fast and do not need parallelization.&lt;br /&gt;
&lt;script src="https://gist.github.com/1383028.js?file=additional%20demos.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
Here is the resulting figure. &amp;nbsp;As you can see, the mean forecast is very&amp;nbsp;inaccurate, but provides a useful baseline. &amp;nbsp;The random walk forecast and the theta forecast are both improvements, but they ignore the function's seasonal component and have a clear seasonal error pattern. &amp;nbsp;StructTS and STL are clustered down at the bottom with accuracies similar to the linear model, arima model, and exponential smoothing model:&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-OQwVO2fbV3c/Tsu-EtGQ1dI/AAAAAAAACsg/0pRx5fA6wdU/s1600/All.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://2.bp.blogspot.com/-OQwVO2fbV3c/Tsu-EtGQ1dI/AAAAAAAACsg/0pRx5fA6wdU/s400/All.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
If we ignore the mean, random walk, and theta forecasts, we get the following figure:&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-JQKhzcTUjRM/Tsu-PiTZDbI/AAAAAAAACso/k5GdweSzpT4/s1600/small.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://3.bp.blogspot.com/-JQKhzcTUjRM/Tsu-PiTZDbI/AAAAAAAACso/k5GdweSzpT4/s400/small.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
As you can see, the structural time series is close to the exponential smoothing model in accuracy, while the to seasonal decomposition models are consistently worse. &amp;nbsp;The arima model still&amp;nbsp;outperforms&amp;nbsp;all other models, at every forecast horizon.&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
(Note that I've found a couple of bugs in my ts.cv function. &amp;nbsp;It seems to not be working when fixed=TRUE, and it also doesn't like being told to just look at 1-step forecasts. &amp;nbsp;I'll try to fix both bugs soon.)&lt;/div&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/QN9jbNQNO7U" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/2528227784416056842/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/11/time-series-cross-validation-2.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/2528227784416056842?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/2528227784416056842?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/QN9jbNQNO7U/time-series-cross-validation-2.html" title="Time series cross-validation 2" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/-OQwVO2fbV3c/Tsu-EtGQ1dI/AAAAAAAACsg/0pRx5fA6wdU/s72-c/All.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/11/time-series-cross-validation-2.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DkAFRXYzfyp7ImA9WhRSGEQ.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-7348824675472939652</id><published>2011-11-21T08:24:00.001-08:00</published><updated>2011-11-21T08:58:34.887-08:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-11-21T08:58:34.887-08:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="time-series." /><category scheme="http://www.blogger.com/atom/ns#" term="model building" /><category scheme="http://www.blogger.com/atom/ns#" term="backtesting" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Functional and Parallel time series cross-validation</title><content type="html">&lt;a href="http://robjhyndman.com/researchtips/tscvexample/"&gt;Rob Hyndman has a great post&lt;/a&gt; on his blog with example on how to cross-validate a time series model. &amp;nbsp;The basic concept is simple: &amp;nbsp;You start with a minimum number of observations (k), and fit a model (e.g. an arima model) to those observations. &amp;nbsp;You then forecast out to a certain horizon (h), and compare your forecasts to the actual values for that series. &amp;nbsp;You then add the next observation (k+1), and repeat the process until you run out of data. &amp;nbsp;This gives you a matrix of forecast&amp;nbsp;accuracies&amp;nbsp;at various horizons (1 step ahead, 2 steps ahead, all the way to h steps ahead). &amp;nbsp;You then take the mean of each column of the matrix, and get the model's average accuracy at that horizon. &amp;nbsp;This method is&amp;nbsp;analogous&amp;nbsp;to leave-one-out&amp;nbsp;cross-validation.&lt;br /&gt;
&lt;br /&gt;
There are 2 variations to this method:&lt;br /&gt;
1. Use a fixed training window. &amp;nbsp;In this case, when you add an observation to your "training" series, you drop the first observation, keeping the training window fixed.&lt;br /&gt;
2. Increment by n at each step, rather than 1. &amp;nbsp;This is&amp;nbsp;analogous&amp;nbsp;to k-fold&amp;nbsp;cross-validation. &amp;nbsp;In this case, your forecast error is more unstable, and it's a good idea to average error across ALL horizons when evaluating the model.&lt;br /&gt;
&lt;br /&gt;
This technique is very useful, because it allows you to define a horizon of interest (say 1 month or 12 months), and then asses how well your model performs at that horizon. &amp;nbsp;Furthermore, you can use this data to compare various models, including different types of models, such as linear models vs. arima models vs. exponential smoothing model.&lt;br /&gt;
&lt;br /&gt;
However, time series cross-validation is very time consuming, particularly for arima and exponential smoothing models. &amp;nbsp;Therefore, I thought it would be a good idea to parallelize Hyndman's algorithm, using the &lt;a href="http://cran.r-project.org/web/packages/foreach/index.html"&gt;foreach&lt;/a&gt; package in R. &amp;nbsp;Furthermore, I wrapped the entire thing into a single function, which allows you to easily change the type of cross validation by altering the minObs (k), stepSize (n), and fixed-length or growing window parameters. &amp;nbsp;My function takes an argument tsControl, which contains each of these parameters, as well a summary function to calculate your error metric (such as MAE). &amp;nbsp;I've structured it similarly to the &lt;a href="http://cran.r-project.org/web/packages/caret/index.html"&gt;caret&lt;/a&gt;&amp;nbsp;packages's &lt;a href="http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=caret:train"&gt;train function&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/1383028.js?file=ts.cv.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
The FUN argument should be a function that takes 2 parameters: x and h. &amp;nbsp;x is a&amp;nbsp;univariate&amp;nbsp;time series, and h is a forecast horizon. &amp;nbsp;The function should build a model using x, and then return h forecasts. &amp;nbsp;Here are some examples of this function, for linear models, arima models, and exponential smoothing models. &amp;nbsp;As you can see, they all return the same output: a vector of h point forecasts.&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/1383028.js?file=forecast%20methods.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
Here is my replication of&amp;nbsp;Hyndman's&amp;nbsp;example. &amp;nbsp;Note that I create a "tsControl" list which contains the parameters for the cross-validation&amp;nbsp;algorithm. &amp;nbsp;This example is not&amp;nbsp;parallelized, but you can easily change that by loading your favorite "foreach" backend, such as &lt;a href="http://cran.r-project.org/web/packages/doMC/"&gt;doMC&lt;/a&gt; or &lt;a href="http://cran.r-project.org/web/packages/doRedis/index.html"&gt;doRedis&lt;/a&gt;. &amp;nbsp;Note that&amp;nbsp;parallelization&amp;nbsp;introduces a lot of overhead, and actually seems to slow down the linear model. &amp;nbsp;I recomend only running the more complicated models (such as ets and Arima) in parallel.&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/1383028.js?file=example.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
This function produces the same results as&amp;nbsp;Hyndman's&amp;nbsp;example:&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-lkttbcc3SWI/Tsp_YFethQI/AAAAAAAACsY/frG55hOOZTE/s1600/hyndmanGraph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://1.bp.blogspot.com/-lkttbcc3SWI/Tsp_YFethQI/AAAAAAAACsY/frG55hOOZTE/s400/hyndmanGraph.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
This code is very early stage, and could contain bugs. &amp;nbsp;Please comment if you find one, or if you think of any ways to improve this&amp;nbsp;algorithm. &amp;nbsp;Feel free to try it out with your own forecasting functions!&lt;/div&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/tA55YStURrM" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/7348824675472939652/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/11/functional-and-parallel-time-series.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/7348824675472939652?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/7348824675472939652?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/tA55YStURrM/functional-and-parallel-time-series.html" title="Functional and Parallel time series cross-validation" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-lkttbcc3SWI/Tsp_YFethQI/AAAAAAAACsY/frG55hOOZTE/s72-c/hyndmanGraph.png" height="72" width="72" /><thr:total>5</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/11/functional-and-parallel-time-series.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DE4HRXkzeSp7ImA9WhdaEk0.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-7418395567813557597</id><published>2011-10-21T07:21:00.000-07:00</published><updated>2011-10-21T07:22:14.781-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-21T07:22:14.781-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Stocks" /><category scheme="http://www.blogger.com/atom/ns#" term="finance" /><category scheme="http://www.blogger.com/atom/ns#" term="backtesting" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Backtesting Part 4: random strategies</title><content type="html">&lt;span class="Apple-style-span" style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 14px; line-height: 15px;"&gt;&lt;span class="Apple-style-span" style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 14px; line-height: 18px;"&gt;&lt;b&gt;&lt;i&gt;Note: This post is NOT financial advice! &amp;nbsp;This is just a fun way to explore some of the capabilities R has for importing and manipulating data. &amp;nbsp;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div&gt;
In &lt;a href="http://moderntoolmaking.blogspot.com/2011/09/backtesting-part-2-splits-dividends.html"&gt;part 2&lt;/a&gt;, we found that our 200-day high, hold 100 days strategy yielded average annual returns on the S&amp;amp;P 500 index of 7% in a backtest going back to 1950. &amp;nbsp;That's pretty good, but it's entirely possible that we got lucky and this was due to chance. &amp;nbsp;One way to test this is to compare our strategy to a strategy that randomly chooses long or short each day. &amp;nbsp;I tested just such a strategy on the S&amp;amp;P 500 index, again going back to 1950. &amp;nbsp;I also adjusted the returns series for splits and dividends, and assumed trade costs were 0.5%. &amp;nbsp;These are the exact same conditions I tested the "high-and-hold" strategy under.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Here is a histogram of average annual returns for these random strategies:&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-_Qpdhn6idRk/TqF-9JQAU1I/AAAAAAAACr4/8m5KDMTtPyg/s1600/srtats.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://4.bp.blogspot.com/-_Qpdhn6idRk/TqF-9JQAU1I/AAAAAAAACr4/8m5KDMTtPyg/s400/srtats.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
As you can see these strategies perform pretty dismally: the 0.5% trading cost quickly eats up all the available capital.&lt;/div&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
It would probably be a good idea to model a set dollar amount, and have trades costs a fixed amount, something like $7. &amp;nbsp;It might also be a good idea to make the random strategies&amp;nbsp;auto-correlated, which would probably better reflect actual investor&amp;nbsp;behavior. The code for this test is below the fold, if anyone wants to modify it and make these improvements.&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;
&lt;script src="https://gist.github.com/1302229.js?file=Random%20Strategies.R"&gt;
&lt;/script&gt;&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/Ze6NlfA0GtU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/7418395567813557597/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/10/backtesting-part-4-random-strategies.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/7418395567813557597?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/7418395567813557597?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/Ze6NlfA0GtU/backtesting-part-4-random-strategies.html" title="Backtesting Part 4: random strategies" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-_Qpdhn6idRk/TqF-9JQAU1I/AAAAAAAACr4/8m5KDMTtPyg/s72-c/srtats.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/10/backtesting-part-4-random-strategies.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DE4BQH4_fyp7ImA9WhdaEk0.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-2372180552790179827</id><published>2011-10-17T08:01:00.000-07:00</published><updated>2011-10-21T07:22:31.047-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-21T07:22:31.047-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Stocks" /><category scheme="http://www.blogger.com/atom/ns#" term="finance" /><category scheme="http://www.blogger.com/atom/ns#" term="backtesting" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Backtesting a Simple Stock Trading Strategy: Part 3</title><content type="html">&lt;span class="Apple-style-span" style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;"&gt;&lt;b&gt;&lt;i&gt;Note: This post is NOT financial advice! &amp;nbsp;This is just a fun way to explore some of the capabilities R has for importing and manipulating data. &amp;nbsp;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;"&gt;&lt;b&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;In a previous post, &lt;a href="http://moderntoolmaking.blogspot.com/2011/09/backtesting-simple-stock-trading.html"&gt;I examined a simple stock trading strategy&lt;/a&gt;: Find the high point over the last 200 days, and buy the stock if it's been less than 100 days since that high. &amp;nbsp;Otherwise, have no position.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;What if we use different parameters than 200-day high and hold 100 days? &amp;nbsp;How will that affect our strategy? &amp;nbsp;First of all, we have to reload the data for the S&amp;amp;P 500 index and re-define the functions used to implement our strategy.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;script src="https://gist.github.com/1292761.js?file=1.%20Setup.R"&gt;
&lt;/script&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;Next, we must decide the range of parameters we wish the test for our strategy. &amp;nbsp;I've decided to use a "grid search" to&amp;nbsp;thoroughly&amp;nbsp;examine the parameter space. Somewhat arbitrarily, I've decided to test the values from 5-500,&amp;nbsp;by 5, for both parameters. &amp;nbsp;This gives us 100 possible values for each parameter, or 10000 total. &amp;nbsp;Good thing the "daysSinceHigh" function is pretty fast!&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;Because my processing power is limited, I'm only going to look at every 5th value in this parameter space. &amp;nbsp;The first order of business is to calculate a matrix containing each n-Day high series, where the first column is the number of days since the 5-day high, the second column is the number of days since the 10-day high, etc. &amp;nbsp;This matrix has 100 columns:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;script src="https://gist.github.com/1292761.js?file=2.%20DaysSinceHigh.R"&gt;
&lt;/script&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;Next, I make a list with 100 elements. &amp;nbsp;Each element represents a holding period, which I will apply to a copy of the "n-Day high matrix" from the previous step. &amp;nbsp;For example, the 1st element in the list is a matrix representing a 5-day holding period. &amp;nbsp;The first column in this matrix&amp;nbsp;represents&amp;nbsp;buying at the 5-day high, and holding for 5 days. &amp;nbsp;This is equivalent to buy-and-hold. &amp;nbsp;The second&amp;nbsp;column&amp;nbsp;represents buying at the 10-day high, and holding for 5 days. &amp;nbsp;The third&amp;nbsp;column&amp;nbsp;represents buying at the 15-day high and so on. &amp;nbsp;I repeat this process for each element in the 100-matrix list, which gives us an object representing every possible permutation of our strategy.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;script src="https://gist.github.com/1292761.js?file=3.%20Holding%20Periods.R"&gt;
&lt;/script&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;It is then a relatively easy thing to calculate the returns associated with each&amp;nbsp;permutation&amp;nbsp;of the strategy, by using the "sweep" function to multiply each column of each matrix by the daily returns for our stock&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;script src="https://gist.github.com/1292761.js?file=4.%20Returns.R"&gt;
&lt;/script&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;Now we have a list of&amp;nbsp;matrices&amp;nbsp;of returns. &amp;nbsp;Each column of a matrix represents the returns of our strategy, using a different set of parameters. &amp;nbsp;This allows us to calculate cumulative returns for each set of parameters, and make a nifty graph that shows the relationship between nHigh, nHold, and returns.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-or-Q8TbQDUU/TpxCejoh1sI/AAAAAAAACrw/BliNAyXyMbc/s1600/Returns+Surface.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="400" src="http://3.bp.blogspot.com/-or-Q8TbQDUU/TpxCejoh1sI/AAAAAAAACrw/BliNAyXyMbc/s400/Returns+Surface.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;This graph uses a custom color ramp function, which was created by &lt;a href="http://stackoverflow.com/questions/7420281/create-a-rainbow-color-scale-based-on-a-vector-in-the-order-of-that-vector/7420959#7420959"&gt;Andrie on StackOverflow&lt;/a&gt;. &amp;nbsp;The color of each point in the corresponds to how high the returns are at that point. &amp;nbsp;The X axis is number of days to use for the nHigh, and the yAxis is the number of days to use for nHold. &amp;nbsp;As you can see, 100 days seems to be a solid holding period across many values of nHigh, but by using a different value of nHigh, we could increase returns substantially.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;script src="https://gist.github.com/1292761.js?file=5.%20Chart.R"&gt;
&lt;/script&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;Of course, just because these values worked in the past doesn't mean they will work in the future. &amp;nbsp;Still, it's good to see that our arbitrary parameters (which performed well in the &lt;a href="http://moderntoolmaking.blogspot.com/2011/09/backtesting-simple-stock-trading.html"&gt;last post&lt;/a&gt;), fall inside a wide range of parameters that yield a positive return for our strategy. &amp;nbsp;This brings up an interesting question: how DO we select parameters for our strategy? &amp;nbsp;How can we tell how well our parameter selection strategy would have performed in the past, given that we've optimized our selection based on of our knowledge of the past?&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;For homework, think about how&amp;nbsp;overfitting&amp;nbsp;and cross-validatation apply to this problem...&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;BONUS CODE: This creates some nifty 3D charts, using the rgl library.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;script src="https://gist.github.com/1292761.js?file=6.%203D%20Chart.R"&gt;
&lt;/script&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/QlJOS7YTOSA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/2372180552790179827/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/10/backtesting-simple-stock-trading.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/2372180552790179827?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/2372180552790179827?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/QlJOS7YTOSA/backtesting-simple-stock-trading.html" title="Backtesting a Simple Stock Trading Strategy: Part 3" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-or-Q8TbQDUU/TpxCejoh1sI/AAAAAAAACrw/BliNAyXyMbc/s72-c/Returns+Surface.png" height="72" width="72" /><thr:total>1</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/10/backtesting-simple-stock-trading.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUQGQHk_fSp7ImA9WhdUGEk.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-8241986802389604748</id><published>2011-09-20T11:42:00.000-07:00</published><updated>2011-10-05T13:42:01.745-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-10-05T13:42:01.745-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="finance" /><category scheme="http://www.blogger.com/atom/ns#" term="recessions" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Recession forecasting III: A Better Naive Forecast</title><content type="html">In &lt;a href="http://moderntoolmaking.blogspot.com/2011/08/recession-forecasting-ii-assessing.html"&gt;Recession Forecasting Part II&lt;/a&gt;, I compared the accuracy of Hussman's&amp;nbsp;recession&amp;nbsp;forecasts to the accuracy of a naive forecast that assumed the current state of the recession variable would continue next month.&lt;br /&gt;
&lt;br /&gt;
An&amp;nbsp;anonymous&amp;nbsp;commentator pointed out a great error in the naive forecast, which is that the NBER BACKDATES their recession indicator. &amp;nbsp;In other words, a recession may start, but you would not know about it for up to 6 months. &amp;nbsp;Therefore, assuming the naive forecast would know of the recession in the second month is akin to predicting the future.&lt;br /&gt;
&lt;br /&gt;
This creates the need for a naive forecast that only has access to the information available at a given point in time. &amp;nbsp;Fortunarly, our&amp;nbsp;anonymous&amp;nbsp;commentator kindly provided this data, which he constructed using the &lt;a href="http://www.nber.org/cycles.html"&gt;recession announcement&lt;/a&gt; dates from the NBER. &amp;nbsp;Previous to 1980, he assumed a 6-month lag in identifying turning points. This data is available as an &lt;a href="http://dl.dropbox.com/u/7428423/Modified%20NBER%20Recession%20Data.xls"&gt;excel spreadsheet&lt;/a&gt; (with commentary) and a &lt;a href="http://dl.dropbox.com/u/7428423/Modified%20NBER%20Recession%20Data.csv"&gt;csv file&lt;/a&gt; (suitable for import into R).&lt;br /&gt;
&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;Using this modified dataset, we can construct truly naive recession forecast, which assumes that the current known state of the recession variable will continue indefinitely.&lt;br /&gt;
&lt;script src="https://gist.github.com/1229899.js?file=Naive.R"&gt;
&lt;/script&gt;&lt;br /&gt;
This naive forecast is 66.9% accurate, which is pretty good. &amp;nbsp;This number also make a lot more intuitive sense than the 97.6% accuracy of my previous naive forecast. &amp;nbsp;Compared to this new metric, Hussman's accuracy of 81.6% is&amp;nbsp;actually&amp;nbsp;quiet&amp;nbsp;good. &amp;nbsp;However, keep in mind that Hussman's "future" knowledge of past recessions may have biased his selection of variables and parameters.&lt;br /&gt;
&lt;br /&gt;
Finally, Hussman's model uses &lt;a href="http://finance.yahoo.com/q?s=^GSPC"&gt;Stock Prices&lt;/a&gt;, &lt;a href="http://research.stlouisfed.org/fred2/series/NAPM"&gt;PMI&lt;/a&gt;, the &lt;a href="http://research.stlouisfed.org/fred2/series/GS3M"&gt;yield curve&lt;/a&gt;, &lt;a href="http://research.stlouisfed.org/fred2/series/CPF3M"&gt;credit spreads&lt;/a&gt;, and &lt;a href="http://research.stlouisfed.org/fred2/series/PAYEMS"&gt;employment &lt;/a&gt;as inputs. &amp;nbsp;I was wondering if there are any&amp;nbsp;significant&amp;nbsp;lags in any of these variables that could introduce look-ahead bias into my implementation of Hussman's model.&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/yVqzkA_NnE4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/8241986802389604748/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/09/recession-forecasting-iii-better-naive.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/8241986802389604748?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/8241986802389604748?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/yVqzkA_NnE4/recession-forecasting-iii-better-naive.html" title="Recession forecasting III: A Better Naive Forecast" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>1</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/09/recession-forecasting-iii-better-naive.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUAMRXsycSp7ImA9WhdVFk4.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-1425504080898811512</id><published>2011-09-16T11:03:00.000-07:00</published><updated>2011-09-21T12:36:24.599-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-21T12:36:24.599-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Stocks" /><category scheme="http://www.blogger.com/atom/ns#" term="finance" /><category scheme="http://www.blogger.com/atom/ns#" term="backtesting" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Backtesting Part 2: Splits, Dividends, Trading Costs and Log Plots</title><content type="html">&lt;span class="Apple-style-span" style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 14px; line-height: 18px;"&gt;&lt;b&gt;&lt;i&gt;Note: This post is NOT financial advice! &amp;nbsp;This is just a fun way to explore some of the capabilities R has for importing and manipulating data. &amp;nbsp;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 14px; line-height: 15px;"&gt;&lt;br /&gt;&lt;i&gt;UPDATE 9/21/2011:&amp;nbsp;Costas correctly points out that I should lag my strategy vector by one day, as today's returns are determined the position we chose yesterday. &amp;nbsp;I updated the code, results, and charts to reflect this. &amp;nbsp;Please alert me to any similar errors in the future&lt;b&gt;.&lt;/b&gt;&lt;/i&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 14px; line-height: 15px;"&gt;&lt;i&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/i&gt;&lt;/span&gt;&lt;br /&gt;
&lt;a href="http://moderntoolmaking.blogspot.com/2011/09/backtesting-simple-stock-trading.html"&gt;In my last post&lt;/a&gt;, I demonstrated how to backtest a simple momentum-based stock trading strategy in R. &amp;nbsp;However, there were a few issues with my implementation: I ignored splits, dividends and transaction costs, all of which can have a large impact on a strategy. &amp;nbsp;I also came up with a better plot to help show how a given strategy compares to a benchmark, and wrapped everything together into one function.&lt;br /&gt;
&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;First of all, we need to reload our functions from the last post. &amp;nbsp;These functions define our strategy and analyze its&amp;nbsp;performance.&lt;br /&gt;
&lt;script src="https://gist.github.com/1222639.js?file=1.%20Functions.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
Next, we can test our strategy. &amp;nbsp;I've added a couple new indexes:&lt;br /&gt;
&lt;script src="https://gist.github.com/1222639.js?file=2.%20Analysis.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
This function does a lot of lifting:&lt;br /&gt;
1. It loads the data and adjusts the closing price for splits and dividends. &amp;nbsp;It uses the splits/dividends data from yahoo, but performs its own, more accurate calculations.&lt;br /&gt;
2. It determines a position series, based on the "daysSinceHigh" function. &amp;nbsp;This part is the same as in my last post.&lt;br /&gt;
3. It determines trades, which are defined as days when today's position is different from the previous day's positions. &amp;nbsp;I assumed that transactions costs are 0.5% of equity, so on trading days I subtracted 0.005 from my Returns.&lt;br /&gt;
4. It makes a plot. &amp;nbsp;This plot is different from the charts.PerformanceSummary we used last time. &amp;nbsp;The first plot shows&amp;nbsp;cumulative&amp;nbsp;returns of my strategy and the index, while the second plot shows the relative performance of my strategy over the benchmark (also known as alpha). &amp;nbsp;The third plot shows drawdowns.&lt;br /&gt;
5. It returns a data table of statistics, comparing the strategy to the benchmark.&lt;br /&gt;
&lt;br /&gt;
I tested this strategy on GSPC, FTSE,&amp;nbsp;DJI,&amp;nbsp;N225, EEM, EFA, and GLD. (The last 3 are ETFs). &amp;nbsp;The strategy performs well on some indexes, and poorly on others. Here's the results of my backtest:&lt;br /&gt;
&lt;script src="https://gist.github.com/1222639.js?file=3.%20Results.txt"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;
As you can see, this strategy tends to reduce drawdowns, but it also sometimes reduces overall returns. &amp;nbsp;In some cases, you could leverage up the strategy, which would increase both returns and drawdowns, but that's the subject of another post.&lt;br /&gt;
&lt;br /&gt;
Here's a buncha charts:&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-yiQT3m15DIA/Tno8d-tNbRI/AAAAAAAACrU/4_ay6kZVnas/s1600/SP500.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-yiQT3m15DIA/Tno8d-tNbRI/AAAAAAAACrU/4_ay6kZVnas/s1600/SP500.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-VPoVbPWUU0I/Tno8er7U1UI/AAAAAAAACrY/4LVAiKdDCLs/s1600/FTSE.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-VPoVbPWUU0I/Tno8er7U1UI/AAAAAAAACrY/4LVAiKdDCLs/s1600/FTSE.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-9uAmMi0H7Tc/Tno8fCu92TI/AAAAAAAACrc/8oty1PiadZ8/s1600/DJIA.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-9uAmMi0H7Tc/Tno8fCu92TI/AAAAAAAACrc/8oty1PiadZ8/s1600/DJIA.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-jnEYHZKaFZY/Tno8fhxL7qI/AAAAAAAACrg/pKF208yMTJc/s1600/N225.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-jnEYHZKaFZY/Tno8fhxL7qI/AAAAAAAACrg/pKF208yMTJc/s1600/N225.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://2.bp.blogspot.com/-Ag_sVZc1Y4A/Tno8f_tX8wI/AAAAAAAACrk/NbUoRcBa0mU/s1600/EEM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-Ag_sVZc1Y4A/Tno8f_tX8wI/AAAAAAAACrk/NbUoRcBa0mU/s1600/EEM.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-FfIK9nKQKqY/Tno8gupei3I/AAAAAAAACro/KSTKyCg21jE/s1600/EFA.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-FfIK9nKQKqY/Tno8gupei3I/AAAAAAAACro/KSTKyCg21jE/s1600/EFA.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-kQlWsAprwmY/Tno8hO6dBvI/AAAAAAAACrs/tf99D8Zzikc/s1600/GLD.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-kQlWsAprwmY/Tno8hO6dBvI/AAAAAAAACrs/tf99D8Zzikc/s1600/GLD.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/X-sOg7KF624" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/1425504080898811512/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/09/backtesting-part-2-splits-dividends.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/1425504080898811512?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/1425504080898811512?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/X-sOg7KF624/backtesting-part-2-splits-dividends.html" title="Backtesting Part 2: Splits, Dividends, Trading Costs and Log Plots" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-yiQT3m15DIA/Tno8d-tNbRI/AAAAAAAACrU/4_ay6kZVnas/s72-c/SP500.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/09/backtesting-part-2-splits-dividends.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CE8GSXg6eSp7ImA9WhdVEU8.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-4437754718388220465</id><published>2011-09-15T14:35:00.000-07:00</published><updated>2011-09-15T14:40:28.611-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-15T14:40:28.611-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Stocks" /><category scheme="http://www.blogger.com/atom/ns#" term="finance" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Correlations among US Stocks: Is it really time to fire your adviser?</title><content type="html">&lt;span class="Apple-style-span" style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px;"&gt;&lt;b&gt;&lt;i&gt;Note: This post is NOT financial advice! &amp;nbsp;This is just a fun way to explore some of the capabilities R has for importing and manipulating data.&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18px;"&gt;&lt;b&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="background-color: white;"&gt;&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;The Financial Times says it's time to "&lt;/span&gt;&lt;a href="http://ftalphaville.ft.com/blog/2011/09/13/676156/fire-your-adviser/" style="line-height: 18px;"&gt;Fire your&amp;nbsp;Adviser&lt;/a&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;" because correlations among US stocks are at their highest levels since the financial crisis. &amp;nbsp;Unfortunately, they only provide data going back 3 months and it's in a boring table, rather than an awesome chart. &amp;nbsp;After reading this article, I&amp;nbsp;immediately&amp;nbsp;pulled out quantmod and PerfomanceAnalytics in R. &amp;nbsp;I used quantmod to download the daily data (from yahoo finance) for each index listed in the article, and then used PerformanceAnalytics to analyze and graph it. After and hour or so of fiddling around, I ended up with this chart:&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-UKu8ShpR98I/TnJtp6EVdDI/AAAAAAAACpI/d_wvy44HKUM/s1600/Corr.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-UKu8ShpR98I/TnJtp6EVdDI/AAAAAAAACpI/d_wvy44HKUM/s1600/Corr.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
&lt;/div&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;Each line on this chart represents the rolling correlation between a major asset class and the S&amp;amp;P 500 (represented by SPY). &amp;nbsp;I used a 90-day window to calculate the correlations, so each point on each line is looking backwards at a 90-day timeframe. &amp;nbsp;As you can see, correlations have indeed gone up in the last few months, but there have been other periods in 2010 and 2011 with&amp;nbsp;similarly&amp;nbsp;high correlations.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;This next chart looks at just the major US sectors, and takes the average of each of their correlations with SPY. This shows the overall correlation of the US stock market since 2007. &amp;nbsp;Again, things are highly correlated right now, but we've been here before.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-YiWwgc-6Md4/TnJuooQerOI/AAAAAAAACpM/4VVZJ4cXHMA/s1600/corr2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-YiWwgc-6Md4/TnJuooQerOI/AAAAAAAACpM/4VVZJ4cXHMA/s1600/corr2.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;Here is the code I used to generate these charts. &amp;nbsp;Feel free to comment on my implementation, and I'll be happy to make any improvements and update the charts.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;script src="https://gist.github.com/1220538.js?file=Chart%201.R"&gt;
&lt;/script&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;
&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 18px;"&gt;&lt;script src="https://gist.github.com/1220538.js?file=Chart%202.R"&gt;
&lt;/script&gt;&lt;/span&gt;&lt;/span&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/_BptinmwGI4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/4437754718388220465/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/09/correlations-among-us-stocks-is-it.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/4437754718388220465?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/4437754718388220465?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/_BptinmwGI4/correlations-among-us-stocks-is-it.html" title="Correlations among US Stocks: Is it really time to fire your adviser?" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-UKu8ShpR98I/TnJtp6EVdDI/AAAAAAAACpI/d_wvy44HKUM/s72-c/Corr.png" height="72" width="72" /><thr:total>5</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/09/correlations-among-us-stocks-is-it.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUcMSHg5eSp7ImA9WhdVFk4.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-1174364426293325722</id><published>2011-09-13T06:06:00.000-07:00</published><updated>2011-09-21T12:24:49.621-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-21T12:24:49.621-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Stocks" /><category scheme="http://www.blogger.com/atom/ns#" term="finance" /><category scheme="http://www.blogger.com/atom/ns#" term="backtesting" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Backtesting a Simple Stock Trading Strategy</title><content type="html">&lt;b&gt;&lt;i&gt;Note: This post is NOT financial advice! &amp;nbsp;This is just a fun way to explore some of the capabilities R has for importing and manipulating data. &amp;nbsp;&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;
&lt;b&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;
&lt;i&gt;UPDATE 9/21/2011:&amp;nbsp;Costas correctly points out that I should lag my strategy vector by one day, as today's returns are determined the position we chose yesterday. &amp;nbsp;I updated the code, results, and charts to reflect this. &amp;nbsp;Please alert me to any similar errors in the future&lt;b&gt;.&lt;/b&gt;&lt;/i&gt;&lt;br /&gt;
&lt;b&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;
I recently read a &lt;a href="http://etfprophet.com/days-since-200-day-highs/"&gt;post on ETF Prophet&lt;/a&gt; that explored an interesting stock trading strategy in Excel. &amp;nbsp;The strategy is simple: Find the high point of the stock over the last 200 days, and count the number of days that have elapsed since that high. &amp;nbsp;If its been more less than 100 days, own the stock. &amp;nbsp;If it's been more than 100 days, don't own it. &amp;nbsp;This strategy is very simple, but it yields some impressive results.&amp;nbsp;(Note; however, that this example uses data that has not been adjusted from splits or dividends and could contain other errors. &amp;nbsp;Furthermore, we're ignoring trading costs and execution delays, both of which affect strategy performance.)&lt;br /&gt;
&lt;br /&gt;
Implementing this strategy in R is simple, and provides numerous advantages over excel, the primary of which is that pulling stock market data into R is easy, and we can test this strategy on a wide range of indexes with&amp;nbsp;relatively&amp;nbsp;little effort.&lt;br /&gt;
&lt;br /&gt;
First of all, we download data for &lt;a href="http://finance.yahoo.com/q?s=^GSPC"&gt;GSPC&lt;/a&gt;&amp;nbsp;using &lt;a href="http://www.quantmod.com/"&gt;quantmod&lt;/a&gt;. (GSPC stands for the S&amp;amp;P 500 index). Next, we construct a function to calculate the number of days since the n-day high in a time series, and a function to implement our trading strategy. &amp;nbsp;The latter function takes 2 parameters: the n-day high you wish to use, and the numbers of days past that high you will hold the stock. &amp;nbsp;The example is 200 and 100, but you could easily change this to the 500-day high and see what happens if you hold the stock 300 days past the high before bailing out. &amp;nbsp;Since this function is&amp;nbsp;parameterized, we can easily test many other versions of our strategy. &amp;nbsp;We pad the&amp;nbsp;beginning&amp;nbsp;of our strategy with zeros so it will be the same length as our input data. (If you wish for a more detailed explaination of the daysSinceHigh function, see &lt;a href="http://stackoverflow.com/questions/7354368/how-to-calculate-periods-since-200-period-high-of-a-stock"&gt;the discussion on cross-validated&lt;/a&gt;).&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
&lt;script src="https://gist.github.com/1212547.js?file=1.%20Build%20Model.R"&gt;
&lt;/script&gt;
&lt;br /&gt;
We multiply our position (0,1) vector by the returns from the index to get&amp;nbsp;our strategy's&amp;nbsp;returns. &amp;nbsp;Now we construct a function to return some statistics about a trading strategy, and compare our strategy to the benchmark. &amp;nbsp;Somewhat arbitrarily, I've decided to look at cumulative return, mean annual return, sharpe ratio, winning %, mean annual volatility, max drawdown, and max length drawdown. &amp;nbsp;Other stats would be easy to implement.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/1212547.js?file=2.%20Assess%20Peformance.R"&gt;
&lt;/script&gt;
&lt;br /&gt;
Results:&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://3.bp.blogspot.com/-dqYrt9kN2ms/Tno4tCwkYMI/AAAAAAAACrE/ZWMEybR5D_k/s1600/SP500.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-dqYrt9kN2ms/Tno4tCwkYMI/AAAAAAAACrE/ZWMEybR5D_k/s1600/SP500.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/1212547.js?file=3.%20Performance.txt"&gt;
&lt;/script&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;br /&gt;
As you can see, this strategy compares favorably to the default "buy-and-hold" approach.&lt;br /&gt;
&lt;br /&gt;
Finally, we test our strategy on 3 other indexes: &lt;a href="http://uk.finance.yahoo.com/q?s=^FTSE"&gt;FTSE&lt;/a&gt;&amp;nbsp;which represents Ireland and the UK, the &lt;a href="http://research.stlouisfed.org/fred2/series/DJIA"&gt;Dow Jones Industrial Index&lt;/a&gt;, which goes back to 1896, and the &lt;a href="http://finance.yahoo.com/q?s=^N225"&gt;N225&lt;/a&gt;, which represents Japan. &amp;nbsp;I've&amp;nbsp;functionalized&amp;nbsp;the entire process, so you can test each new strategy with 1 line of code:&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/1212547.js?file=4.%20Other%20Indexes.R"&gt;
&lt;/script&gt;
&lt;br /&gt;
Results:&lt;br /&gt;
&lt;script src="https://gist.github.com/1212547.js?file=5.%20Performance.txt"&gt;
&lt;/script&gt;
&lt;br /&gt;
FTSE:&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-qA9nC6CeiGA/Tno4_PcOKnI/AAAAAAAACrI/mcxfI0it51M/s1600/FTSE.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-qA9nC6CeiGA/Tno4_PcOKnI/AAAAAAAACrI/mcxfI0it51M/s1600/FTSE.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
DJIA:&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://1.bp.blogspot.com/-pevm-m1VBl4/Tno5CkOBSgI/AAAAAAAACrM/Ed-c7R7hKQE/s1600/DJIA.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-pevm-m1VBl4/Tno5CkOBSgI/AAAAAAAACrM/Ed-c7R7hKQE/s1600/DJIA.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;br /&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
N225:&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;a href="http://4.bp.blogspot.com/-wKm6YrbXxtQ/Tno5FmNzMqI/AAAAAAAACrQ/2_bR3a0cyBQ/s1600/N225.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-wKm6YrbXxtQ/Tno5FmNzMqI/AAAAAAAACrQ/2_bR3a0cyBQ/s1600/N225.png" /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: center;"&gt;
&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
The strategy out-performs&amp;nbsp;the other indexes as well. It even&amp;nbsp;performs&amp;nbsp;better than the N225 index, mainly by staying out of it.&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div class="separator" style="clear: both; text-align: left;"&gt;
Feel free to test this strategy with other parameters, or on other indexes. &amp;nbsp;For homework, think of possible ways that I have fooled myself in this backtest, and post them in the comments. &amp;nbsp;One example of this is that we have't looked at transaction costs, which might be&amp;nbsp;significant...&lt;/div&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/lFRwskrIp34" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/1174364426293325722/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/09/backtesting-simple-stock-trading.html#comment-form" title="10 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/1174364426293325722?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/1174364426293325722?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/lFRwskrIp34/backtesting-simple-stock-trading.html" title="Backtesting a Simple Stock Trading Strategy" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-dqYrt9kN2ms/Tno4tCwkYMI/AAAAAAAACrE/ZWMEybR5D_k/s72-c/SP500.png" height="72" width="72" /><thr:total>10</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/09/backtesting-simple-stock-trading.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0MERHg7fSp7ImA9WhdXFUk.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-5095348389576462727</id><published>2011-08-26T14:07:00.000-07:00</published><updated>2011-08-28T08:30:05.605-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-28T08:30:05.605-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="big list" /><category scheme="http://www.blogger.com/atom/ns#" term="data acquisition" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>25+ more ways to bring data into R</title><content type="html">The &lt;a href="http://blog.revolutionanalytics.com/2011/08/r-datamarket.html"&gt;rdatamarket&lt;/a&gt;&amp;nbsp;post on the Revolutions blog and &lt;a href="http://www.decisionstats.com/using-rstats-for-online-data-access/"&gt;this post&lt;/a&gt; on Decision Stats reminded me about my list of&amp;nbsp;&lt;a href="http://stats.stackexchange.com/questions/12670/data-apis-feeds-available-as-packages-in-r"&gt;Data APIs/feeds available as packages in R&lt;/a&gt; on &lt;a href="http://stats.stackexchange.com/"&gt;Cross-Validated&lt;/a&gt; (which is a great site that you all should use). &amp;nbsp;Many of these packages are from &lt;a href="http://www.omegahat.org/"&gt;Omega Hat&lt;/a&gt;, which is an awesome site.&lt;br /&gt;
&lt;br /&gt;
This is the most comprehensive list I'm aware of, so please alert me to any ommisions:&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
Free Data:&lt;br /&gt;
Data Source - Package&lt;br /&gt;
&lt;a href="http://www.google.com/finance/historical?q=NASDAQ%3aMSFT"&gt;Google Finance historical data&lt;/a&gt; - &lt;a href="http://www.quantmod.com/"&gt;quantmod&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.google.com/finance?q=NASDAQ%3aMSFT&amp;amp;fstype=ii"&gt;Google Finance balance sheets&lt;/a&gt; - quantmod&lt;br /&gt;
&lt;a href="http://finance.yahoo.com/q/hp?s=MSFT%20Historical%20Prices"&gt;Yahoo Finance historical data&lt;/a&gt; - quantmod&lt;br /&gt;
Yahoo Finance historical data - &lt;a href="http://rss.acs.unt.edu/Rdoc/library/tseries/html/get.hist.quote.html"&gt;tseries&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://finance.yahoo.com/q/op?s=MSFT"&gt;Yahoo Finance current options chain&lt;/a&gt; - quantmod&lt;br /&gt;
&lt;a href="http://finance.yahoo.com/q/ud?s=MSFThttp://finance.yahoo.com/q/ud?s=MSFT"&gt;Yahoo Finance historical analyst estimates&lt;/a&gt; - &lt;a href="http://cran.r-project.org/web/packages/fImport/index.html"&gt;fImport&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://finance.yahoo.com/q/ks?s=MSFT"&gt;Yahoo Finance current key stats&lt;/a&gt; - fImport - seems to be broken&lt;br /&gt;
&lt;a href="http://cfe.cboe.com/Products/"&gt;CFE historic futures&lt;/a&gt; prices -&amp;nbsp; &lt;a href="https://r-forge.r-project.org/R/?group_id=1113"&gt;qmao&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.oanda.com/"&gt;OANDA&lt;/a&gt; historic exchange rates/metal prices - quantmod&lt;br /&gt;
&lt;a href="http://research.stlouisfed.org/fred2/"&gt;FRED &lt;/a&gt;historic macroeconomic indicators - quantmod&lt;br /&gt;
&lt;a href="http://data.worldbank.org/indicator"&gt;World Bank historic macroeconomic indicators&lt;/a&gt; - &lt;a href="http://cran.r-project.org/web/packages/WDI/index.html"&gt;WDI&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.google.com/trends?q=microsoft"&gt;Google Trends historic search volume data&lt;/a&gt; - &lt;a href="http://www.omegahat.org/RGoogleTrends/"&gt;RGoogleTrends&lt;/a&gt;&lt;br /&gt;
Google Docs - &lt;a href="http://www.omegahat.org/RGoogleDocs/"&gt;RGoogleDocs&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://twitter.com/#%21/search/microsoft"&gt;Twitter &lt;/a&gt;- &lt;a href="http://cran.r-project.org/web/packages/twitteR/index.html"&gt;twitteR&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.zillow.com/"&gt;Zillow &lt;/a&gt;- &lt;a href="http://www.omegahat.org/Zillow/"&gt;Zillow&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.nytimes.com/"&gt;New York Times&lt;/a&gt; - &lt;a href="http://www.omegahat.org/RNYTimes/"&gt;RNYTimes&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.census.gov/main/www/cen2000.html"&gt;US Census 2000&lt;/a&gt; - &lt;a href="http://cran.r-project.org/web/packages/UScensus2000/index.html"&gt;UScensus2000&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.infochimps.com/"&gt;infochimps &lt;/a&gt;- &lt;a href="http://cran.r-project.org/web/packages/infochimps/index.html"&gt;infochimps&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://datamarket.com/"&gt;datamarket &lt;/a&gt;- &lt;a href="http://cran.r-project.org/web/packages/rdatamarket/index.html"&gt;rdatamarket &lt;/a&gt;- requires free account&lt;br /&gt;
&lt;a href="http://www.factual.com/"&gt;Factual.com&lt;/a&gt; - &lt;a href="http://cran.r-project.org/web/packages/factualR/index.html"&gt;factualR&lt;/a&gt;&lt;br /&gt;
Geocode addresses - &lt;a href="http://thelogcabin.wordpress.com/2011/05/02/r-and-the-data-science-toolkit/"&gt;RDSTK&lt;/a&gt;&lt;br /&gt;
Map coordinates to political boundaries - RDSTK&lt;br /&gt;
&lt;a href="http://www.wunderground.com/"&gt;Weather Underground&lt;/a&gt; - &lt;a href="http://casoilresource.lawr.ucdavis.edu/drupal/node/991"&gt;Roll your own&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.google.com/finance/company_news?q=NASDAQ%3aMSFT"&gt;Google News&lt;/a&gt; -&lt;a href="http://stackoverflow.com/questions/5761576/improving-a-function-to-get-stock-news-data-from-google-in-r"&gt; Roll your own&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://opendap.deltares.nl/thredds/catalog/opendap/catalog.html"&gt;Earth Sciences&lt;/a&gt; netCDF Data - &lt;a href="http://public.deltares.nl/display/OET/KML+overview+of+OPeNDAP+data#KMLoverviewofOPeNDAPdata-AccessingnetCDF/OPeNDAPdatawithR"&gt;Roll your own&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://processtrends.com/RClimate.htm"&gt;Climate Data&lt;/a&gt; - &lt;a href="http://chartsgraphs.wordpress.com/2011/01/24/using-rclimate-to-retrieve-climate-series-data/"&gt;Roll your own&lt;/a&gt;&lt;br /&gt;
Public health data - &lt;a href="http://www.medepi.net/msamuel/R.Visual.Display.Samuel.10-2008.pdf"&gt;Roll your own&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.fishbase.org/"&gt;FishBase&lt;/a&gt; - &lt;a href="https://github.com/cboettig/rfishbase"&gt;rfishbase&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
Paid Data:&lt;br /&gt;
&lt;a href="http://www.bloomberg.com/professional/software_support/"&gt;Bloomberg &lt;/a&gt;- &lt;a href="http://cran.r-project.org/web/packages/RBloomberg/index.html"&gt;RBloomberg&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.lim.com/"&gt;LIM &lt;/a&gt;- &lt;a href="http://www.lim.com/sites/default/files/R_package_LIMWS.pdf"&gt;LIM&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.nyxdata.com/Data-Products/Daily-TAQ"&gt;Trades and Quotes&lt;/a&gt; from NYSE - &lt;a href="http://cran.r-project.org/web/packages/RTAQ/index.html"&gt;RTAQ&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.interactivebrokers.com/en/p.php?f=marketData"&gt;Interactive Brokers&lt;/a&gt; - &lt;a href="http://cran.r-project.org/web/packages/IBrokers/index.html"&gt;IBrokers&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
Useful Tools:&lt;br /&gt;
&lt;a href="http://cran.r-project.org/web/packages/RCurl/index.html"&gt;RCurl&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://cran.r-project.org/web/packages/rjson/index.html"&gt;RJSON&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://www.omegahat.org/RJSONIO/"&gt;RJSONIO&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://cran.r-project.org/web/packages/XML/index.html"&gt;XML&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://cran.r-project.org/web/packages/scrapeR/index.html"&gt;scraper&lt;/a&gt;&lt;br /&gt;
&lt;a href="http://cran.r-project.org/web/packages/digitize/index.html"&gt;digitizer &lt;/a&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/qRagym6kTd4" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/5095348389576462727/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/08/25-more-ways-to-bring-data-into-r.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/5095348389576462727?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/5095348389576462727?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/qRagym6kTd4/25-more-ways-to-bring-data-into-r.html" title="25+ more ways to bring data into R" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>5</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/08/25-more-ways-to-bring-data-into-r.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUYBQng_eCp7ImA9WhdVEUQ.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-1697278634554894177</id><published>2011-08-26T08:26:00.000-07:00</published><updated>2011-09-16T11:19:13.640-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-16T11:19:13.640-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="Friday" /><category scheme="http://www.blogger.com/atom/ns#" term="Great Australian Sheep Decline" /><category scheme="http://www.blogger.com/atom/ns#" term="Global Warming" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Because it's Friday: Spurious correlation edition</title><content type="html">If the &lt;a href="http://www.youtube.com/watch?v=B1BdQcJ2ZYY"&gt;Flight of the Concords&lt;/a&gt; taught me anything, it's that &lt;a href="http://www.youtube.com/watch?v=zs_rXxi0zhM"&gt;you can't trust Australians&lt;/a&gt;.  This morning I was poking around the &lt;a href="http://datamarket.com/"&gt;DataMarket&lt;/a&gt; site, when I noticed something suspicious about Australian sheep production:&lt;br /&gt;
&lt;iframe frameborder="0" height="320" marginheight="0" marginwidth="0" scrolling="no" src="http://datamarket.com/data/embed/line.html?ds=1164|a2n=2u&amp;amp;api_key=d500e40f58194c2cb58a5db004fdefd8" width="320"&gt;&lt;/iframe&gt;&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
I decided to investigate further: just what are the Australians doing with all those poor sheep?  How might this nefarious plot impact the rest of the world?  I was shocked when I noticed an apparent relationship with Global Warming:&lt;br /&gt;
&lt;iframe frameborder="0" height="320" marginheight="0" marginwidth="0" scrolling="no" src="http://datamarket.com/data/embed/line.html?ds=1cb1|r14=2&amp;amp;api_key=d500e40f58194c2cb58a5db004fdefd8" width="420"&gt;&lt;/iframe&gt;&lt;br /&gt;
&lt;br /&gt;
Hot on the trail of something big, I pulled both datasets into R, using the &lt;a href="http://cran.r-project.org/web/packages/rdatamarket/index.html"&gt;rdatamarket&lt;/a&gt; package, and ran a quick correlation, which confirmed my worst suspicions:&lt;br /&gt;
&lt;img alt="" src="http://i.imgur.com/H7Qn3.png" title="Hosted by imgur.com" /&gt;&lt;br /&gt;
&lt;br /&gt;
The Great Australian Sheep Decline is clearly the cause of Global Warming!  To quantify this effect, I built a simple linear model, which proved to be highly signifigant:&lt;br /&gt;
&lt;script src="https://gist.github.com/1173581.js?file=Linear%20Model.R"&gt;
&lt;/script&gt;&lt;br /&gt;
For every 100 Million Sheep that disappear from Australia, the Earth warms by 0.71 degrees C!  Given that there are still 85 million sheep left in Australia, the potential for further warming is gigantic!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=========================================================================================&lt;br /&gt;
In all seriousness, Australian sheep have about as much to do with Global Warming as &lt;a href="http://en.wikipedia.org/wiki/Flying_Spaghetti_Monster#Pirates_and_global_warming"&gt;pirates&lt;/a&gt;. &amp;nbsp;Sheep numbers have been declining in &lt;a href="http://data.is/qtv6ey"&gt;New Zealand&lt;/a&gt;, &lt;a href="http://data.is/q8PMWl"&gt;Urugay&lt;/a&gt;, &lt;a href="http://data.is/pDqBtE"&gt;Iceland&lt;/a&gt;, and many other countries as well.  If you wish to explore other spurious correlations, here's the R code I used for this post:&lt;br /&gt;
&lt;script src="https://gist.github.com/1173581.js?file=Sheep.R"&gt;
&lt;/script&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/oQLFpuOJ1AQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/1697278634554894177/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/08/because-its-friday-spurious-correlation.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/1697278634554894177?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/1697278634554894177?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/oQLFpuOJ1AQ/because-its-friday-spurious-correlation.html" title="Because it's Friday: Spurious correlation edition" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>2</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/08/because-its-friday-spurious-correlation.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUYCSXsyfip7ImA9WhdVEUQ.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-9144625597229945720</id><published>2011-08-23T10:30:00.000-07:00</published><updated>2011-09-16T11:19:28.596-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-16T11:19:28.596-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="R graphics" /><title>Graphically analyzing variable interactions in R</title><content type="html">I studied Ecology as an undergraduate, which meant I spent a lot of time gathering and analyzing field data.  One of the basic tools we used to look for relationships in a large set of variables was correlation and scatterplot matrices.  Each of these requires a single line of code in R:&lt;br /&gt;
&lt;script src="https://gist.github.com/1165388.js?file=0.%20Plain.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;script src="https://gist.github.com/1165388.js?file=1.%20Plain%20result.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;img alt="" src="http://i.imgur.com/PTa9J.png" title="Hosted by imgur.com" /&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;The 'pairs' function in R contains a lot of additional options, which can be used to make very informative plots.  These options can get a little cumbersome, but fortunately several package authors have written wrapper functions that automatically enable some extra magic.  Two such packages are &lt;a href="http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=psych:pairs.panels"&gt;psych&lt;/a&gt; and &lt;a href="http://www.oga-lab.net/RGM2/func.php?rd_id=PerformanceAnalytics:chart.Correlation"&gt;PerformanceAnalytics&lt;/a&gt;.  I happen to prefer the 1 liner from PerformanceAnalytics, but it's a matter of personal taste:&lt;br /&gt;
&lt;script src="https://gist.github.com/1165388.js?file=2.%20Iris.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;img alt="" src="http://i.imgur.com/CD5gO.png" title="Hosted by imgur.com" /&gt;&lt;br /&gt;
This chart contains a LOT of information:  On the diagonal are the univariate distributions, plotted as histograms and kernel density plots.  On the right of the diagonal are the pair-wise correlations, with red stars signifying significance levels.  As the correlations get bigger the font size of the coefficient gets bigger.  On the left side of the diagonal is the scatter-plot matrix, with loess smoothers in red to help illustrate the underlying relationship.  This is one of my favorite plots in R, because it combines a large amount of information into one command and one easy to follow plot.  In fact, this plot contains more information than is revealed by the 1st two commands in this post!&lt;br /&gt;
&lt;br /&gt;
Of course, you can use this command on data from other domains besides Ecology.  PerformanceAnalytics is intended for the analysis of financial data, so lets put it through its paces.  First we download some financial data (a stock index, a bond index, and a gold index) from yahoo finance using quantmod, and then combine the daily close series of those indexes into one dataframe.  I'm not 100% happy with the legend in the plot, but I wanted to show how the correlations between these indexes have changed over the years.  I also skipped red (color #2) in the plots and in the legend, because the loess smoother is also red.&lt;br /&gt;
&lt;script src="https://gist.github.com/1165388.js?file=3.%20Finance.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;img alt="" src="http://i.imgur.com/Q6mZs.png" title="Hosted by imgur.com" /&gt;&lt;br /&gt;
&lt;br /&gt;
Finally, I'd like to acknowledge &lt;a href="http://gettinggeneticsdone.blogspot.com/2011/07/scatterplot-matrices-in-r.html"&gt;Stephen Turner&lt;/a&gt; over on &lt;a href="http://stats.stackexchange.com/questions/14108/r-package-for-identifying-relationships-between-variables/14129#14129"&gt;cross-validated&lt;/a&gt; for inspiring this post.&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/dh64Vfqrp0w" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/9144625597229945720/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/08/graphically-analyzing-variable.html#comment-form" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/9144625597229945720?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/9144625597229945720?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/dh64Vfqrp0w/graphically-analyzing-variable.html" title="Graphically analyzing variable interactions in R" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>7</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/08/graphically-analyzing-variable.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUYMQ3Y4fip7ImA9WhdVEUQ.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-4918221132049287527</id><published>2011-08-22T09:01:00.000-07:00</published><updated>2011-09-16T11:19:42.836-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-16T11:19:42.836-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="finance" /><category scheme="http://www.blogger.com/atom/ns#" term="recessions" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Recession forecasting II: Assessing Hussman's Accuracy</title><content type="html">In my &lt;a href="http://moderntoolmaking.blogspot.com/2011/08/forecasting-recessions.html"&gt;last post on recessions&lt;/a&gt;, I implemented John Hussman's &lt;a href="http://www.hussmanfunds.com/wmc/wmc110801.htm"&gt;Recession Warning Composite&lt;/a&gt; in R.  In this post I will examine how well this index performs and discuss how we might improve it.  If you would like to follow along at home, be sure to run the code from the last post, before running anything from this post.&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
First of all, lets evaluate how predictive Hussman's index is of recessions next month:&lt;br /&gt;
&lt;script src="https://gist.github.com/1162717.js?file=1.%20Compare%201%20month.R"&gt;
&lt;/script&gt;&lt;br /&gt;
This code simply compares the current value of USREC (US Recessions) to last month's value of the recession warning composite.  By this measure, the recession warning composite is only 81.55% accurate, with a 95% confidence interval of [75%,87%].&lt;br /&gt;
&lt;br /&gt;
Next, let's evaluate a warning ANYTIME in the last 6 months to the current value of USREC.&lt;br /&gt;
&lt;script src="https://gist.github.com/1162717.js?file=2.%20Compare%206%20months.R"&gt;
&lt;/script&gt;&lt;br /&gt;
By this measure, the forecast is even worse: the accuracy is 73.62% [66%,80%].  Interestingly, this measure has a very high 'Negative Predictive Value' (.9896), which indicates if the recession warning composite has been 0 for the past 6 months, you can be reasonable sure there will be no recession this month.&lt;br /&gt;
&lt;br /&gt;
Finally, let's make a naive recession forecast, and predict that the current value of USREC will be equal to it's previous value:&lt;br /&gt;
&lt;script src="https://gist.github.com/1162717.js?file=3.%20Naive"&gt;
&lt;/script&gt;&lt;br /&gt;
This forecast is 97.62% accurate! [94%,99%].  Therefore, I have to conclude that Hussman's recession warning composite, while interesting to implement, is not particularly useful for forecasting recessions. However, it may be that Hussman is primarily concerned with forecasting when recessions START and END.  Given that the current state of US recessions is highly predictive of the next state of US recessions, this might be a valid approach.  Still, I'm struggling to find a useful way of interpreting Hussman's index.&lt;br /&gt;
&lt;br /&gt;
If you have any ideas for using or improving Hussman's index, please leave a comment.&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/6mL3CFlvcqw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/4918221132049287527/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/08/recession-forecasting-ii-assessing.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/4918221132049287527?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/4918221132049287527?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/6mL3CFlvcqw/recession-forecasting-ii-assessing.html" title="Recession forecasting II: Assessing Hussman's Accuracy" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>4</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/08/recession-forecasting-ii-assessing.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUYNQH4zfip7ImA9WhdVEUQ.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-8443418168420816427</id><published>2011-08-10T11:16:00.000-07:00</published><updated>2011-09-16T11:19:51.086-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-16T11:19:51.086-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="model building" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Using the google prediction API from R</title><content type="html">Google has a "black box" prediction API that they provide for use with creating recommender systems or filtering spam. Furthermore, they provide an R package for interfacing that API, but try as I might I cannot get it to work under windows.  Here are the instructions for setting up the API to run in R under linux.  I haven't tried this out yet, so let me know in the comments if it works, or if you can get it to run on Windows.&lt;br /&gt;
&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;
First we have to setup the Google Prediction API, as well as some dependencies:&lt;br /&gt;
1. Go to the &lt;a href="https://code.google.com/apis/console"&gt;Google APIs Console&lt;/a&gt;.  This is your home base for managing google APIs.&lt;br /&gt;
2. In the upper left hand corner of the website (under the Google APIs logo) is a dropdown menu. &amp;nbsp;Use this to create a new project, called something informative like "R predictions."&lt;br /&gt;
3. Activate the Google storage API and turn it on. Activating may require opening a new page.&lt;br /&gt;
4. Activate the Google prediction API and turn it on. &amp;nbsp;Activating may require opening a new page.&lt;br /&gt;
5. Click on the "Billing" tab, and make sure billing is enabled. &amp;nbsp;You may have to enter your billing information. &amp;nbsp;Note that you get 5GB of free storage through the end of 2011, and there's a free quota on the prediction API for 5MB trained per day and 100 predictions per day, up to 20,000 total predictions.&lt;br /&gt;
6. Click the "Google Storage" tab, and make a note of the "x-goog-project-id."  You will need this when installing GSUtils.&lt;br /&gt;
&lt;br /&gt;
Next we have to install some software on our computer to enable communication between R and the&amp;nbsp;prediction&amp;nbsp;API:&lt;br /&gt;
1. Install &lt;a href="http://www.python.org/download/"&gt;python&lt;/a&gt;, if you do not already have it.&lt;br /&gt;
2. Make sure you can run python from the command prompt. &amp;nbsp;You may need to add python to your "path" or "environment" variables to do this. &amp;nbsp;On windows, run the command prompt as administrator.&lt;br /&gt;
3. Install the R packages rjson and RCurl using install.packages() in R.&lt;br /&gt;
4. Make sure you can open .tar archives. &amp;nbsp;This is no problem on Mac/Linux systems, but on windows you need &lt;a href="http://www.7-zip.org/"&gt;7zip&lt;/a&gt;.&lt;br /&gt;
5. Download &lt;a href="http://code.google.com/apis/storage/docs/gsutil_install.html"&gt;GSUtil&lt;/a&gt;, and follow the directions to install it on your system. This is the tricky part.&lt;br /&gt;
6. When you run GSUtil for the first time, make sure to use the&amp;nbsp;following&amp;nbsp;command:&amp;nbsp;&lt;span class="Apple-style-span" style="background-color: #efefef; color: #007000; font-family: monospace; font-size: 13px; line-height: 16px;"&gt;python gsutil config -b&lt;/span&gt;&amp;nbsp;to allow gsutil to open a web page and authorize access to your google storage account.&lt;br /&gt;
6. When prompted, enter the project ID you recorded in part 1.&lt;br /&gt;
7. Download the &lt;a href="https://code.google.com/p/google-prediction-api-r-client/downloads/list"&gt;googlepredictionapi&lt;/a&gt;&amp;nbsp;package.&lt;br /&gt;
8. Open R, and setwd() to the folder containing the downloaded package.&lt;br /&gt;
9. Install the R package from source using this command:&lt;br /&gt;
&lt;script src="https://gist.github.com/1137637.js?file=install.R"&gt;
&lt;/script&gt;&lt;br /&gt;
Now we're all set to start using the prediction API:&lt;br /&gt;
1. First we need to create a bucket to store our data. &amp;nbsp;Do this from the &lt;a href="https://sandbox.google.com/storage"&gt;Google Storage Web Console&lt;/a&gt;. Name your bucket something useful, like rdata. &amp;nbsp;Don't use capital letters or symbols.&lt;br /&gt;
2. Run the following script to test that everything works.  Note that you have to save your data frame as a .csv file before GSUtil can upload it to google storage for modeling:&lt;br /&gt;
&lt;script src="https://gist.github.com/1137637.js?file=Run.R"&gt;
&lt;/script&gt;&lt;br /&gt;
Good luck! &amp;nbsp;Here are some links for future reference:&lt;br /&gt;
1. &lt;a href="http://code.google.com/p/google-prediction-api-r-client/"&gt;Google directions for installing the googlepredictionapi package&lt;/a&gt; in R&lt;br /&gt;
2. &lt;a href="http://code.google.com/apis/storage/docs/gsutil_install.html"&gt;Google directions for installing gsutil&lt;/a&gt;&lt;br /&gt;
3. &lt;a href="https://code.google.com/apis/console/#project:70293466461:overview"&gt;Google API console&lt;/a&gt; for managing APIs and billing&lt;br /&gt;
4. &lt;a href="https://sandbox.google.com/storage/"&gt;Google storage console&lt;/a&gt; for managing buckets&lt;br /&gt;
5. &lt;a href="http://code.google.com/apis/storage/docs/getting-started.html"&gt;Google APIs overview&lt;/a&gt;/introduction&lt;br /&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/50Mdyf_l4eI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/8443418168420816427/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/08/using-google-prediction-api-from-r.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/8443418168420816427?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/8443418168420816427?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/50Mdyf_l4eI/using-google-prediction-api-from-r.html" title="Using the google prediction API from R" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>2</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/08/using-google-prediction-api-from-r.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUYNRn8-eSp7ImA9WhdVEUQ.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-448595406925634218</id><published>2011-08-10T06:36:00.000-07:00</published><updated>2011-09-16T11:19:57.151-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-16T11:19:57.151-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="web scraping" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Scraping web data in R</title><content type="html">In my &lt;a href="http://moderntoolmaking.blogspot.com/2011/08/forecasting-recessions.html"&gt;last post&lt;/a&gt;, I went through a lot of effort to scrape the &lt;a href="http://www.ism.ws/ISMReport/content.cfm?ItemNumber=10752"&gt;PMI index&lt;/a&gt; off the ISM website. &amp;nbsp;It turns out that was&amp;nbsp;unnecessary&amp;nbsp;effort, as commentator "senne" pointed out that this index is available from FRED, with the symbol NAPM. &amp;nbsp;I've updated my code, which now pulls all the data straight from FRED.&lt;br /&gt;
&lt;br /&gt;
However, it was&amp;nbsp;surprisingly&amp;nbsp;easy to scrape web data into R, using the &lt;a href="http://www.oga-lab.net/RGM2/func.php?rd_id=XML:readHTMLTable"&gt;readHTMLTable&lt;/a&gt;&amp;nbsp;function in the &lt;a href="http://cran.r-project.org/web/packages/XML/index.html"&gt;XML&lt;/a&gt;&amp;nbsp;package. &amp;nbsp;I thought I'd keep the code I used on my blog, as it's a good example of how easily you can pull web data into R.&lt;br /&gt;
&lt;br /&gt;
&lt;script src="https://gist.github.com/1136793.js?file=WebScrape.R"&gt;
&lt;/script&gt;&lt;br /&gt;
&lt;br /&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/tHk9lBjlbpw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/448595406925634218/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/08/scraping-web-data-in-r.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/448595406925634218?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/448595406925634218?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/tHk9lBjlbpw/scraping-web-data-in-r.html" title="Scraping web data in R" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>1</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/08/scraping-web-data-in-r.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUUEQnozeCp7ImA9WhdVEUQ.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-912446651499378188</id><published>2011-08-09T09:29:00.000-07:00</published><updated>2011-09-16T11:20:03.480-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-09-16T11:20:03.480-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="finance" /><category scheme="http://www.blogger.com/atom/ns#" term="recessions" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Forecasting recessions</title><content type="html">John Hussman has a &lt;a href="http://www.hussmanfunds.com/wmc/wmc110801.htm"&gt;Recession Warning Composite&lt;/a&gt; that I am attempting to replicate/improve.  The underlying data seems to be easy enough to get from &lt;a href="http://research.stlouisfed.org/fred2/"&gt;FRED&lt;/a&gt; using the &lt;a href="http://www.quantmod.com/"&gt;quantmod&lt;/a&gt; package in R.  I don't quite understand the index Hussman is using for commercial paper, so I used the '3-month AA financial commercial paper index' from FRED.&lt;br /&gt;
&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
The PMI index requires &lt;a href="http://www.ism.ws/ISMReport/content.cfm?ItemNumber=10752"&gt;scraping a table&lt;/a&gt; from the ISM website, which is easy enough to do with the &lt;a href="http://cran.r-project.org/web/packages/XML/index.html"&gt;XML&lt;/a&gt; package. &lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
Here's my code so far, please leave a comment and let me know what you think:&lt;br /&gt;
&lt;div&gt;
&lt;br /&gt;&lt;/div&gt;
&lt;div&gt;
&lt;script src="https://gist.github.com/1134508.js?file=Recessions.R"&gt;
&lt;/script&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/oGbPSrnnfQw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/912446651499378188/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/08/forecasting-recessions.html#comment-form" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/912446651499378188?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/912446651499378188?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/oGbPSrnnfQw/forecasting-recessions.html" title="Forecasting recessions" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>4</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/08/forecasting-recessions.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUIEQnczfSp7ImA9WhdXEE4.&quot;"><id>tag:blogger.com,1999:blog-1392191097696786998.post-2149491955473417757</id><published>2011-07-22T06:56:00.000-07:00</published><updated>2011-08-22T11:25:03.985-07:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2011-08-22T11:25:03.985-07:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="random forest" /><category scheme="http://www.blogger.com/atom/ns#" term="heritage prize" /><category scheme="http://www.blogger.com/atom/ns#" term="model building" /><category scheme="http://www.blogger.com/atom/ns#" term="R" /><title>Parallel random forests using foreach</title><content type="html">There's been some discussion on the &lt;a href="http://www.heritagehealthprize.com/c/hhp/forums/t/666/contribute-an-r-function"&gt;kaggle forums&lt;/a&gt; and on a few &lt;a href="http://anotherdataminingblog.blogspot.com/2011/07/pump-up-volume.html"&gt;blogs &lt;/a&gt;about various ways to parallelize random forests, so I thought I'd add my thoughts on the issue.&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;Here's my version of the 'parRF' function, which is based on the elegant version in the &lt;a href="http://cran.r-project.org/web/packages/foreach/vignettes/foreach.pdf"&gt;foreach vignette&lt;/a&gt;:&lt;/div&gt;&lt;div&gt;&lt;script src="https://gist.github.com/1099499.js?file=multiRF.R"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;This function works very simply: you pass it a vector of mtry values, and it fits a random forest using each of those values and returns the combined result.  You can all pass any additional parameters you want (like ntree) to the randomForest function.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;I think this functions provides 2 improvements over previous implementations.  #1 is you can use any parallel backend you want. &lt;a href="http://cran.r-project.org/web/packages/doRedis/index.html"&gt;doRedis &lt;/a&gt;is my current favorite, as it's cross-platform and fault-tolerant and let's me commandeer idle laptops around the house/office when a random forest is taking too long to fit.  #2 is the argument .inorder=FALSE in the foreach function, which provides a small performance improvement as it lets R combine the random forests as they finish, rather than forcing R to combine them in the order they start.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;Lets say you want a random forest with 5000 trees.  The default value for ntree is 500, so we use rep(4,10) as the argument for the function.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;Maybe we're unsure of the optimal mtry value, and want combine 2 ensembles of 2500 trees.  Then we use the argument c(rep(3,5),rep(4,5)).  This gives us 2500 trees with mtry=3 and 2500 with mtry=4.  I like to think of this as a sort of meta-ensemble of decision trees, but I've yet to see it improve my predictive accuracy.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;At the very least, this can help with those damn 'out of memory' errors I've been getting on my laptop when fitting random forests to large datasets.&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/ModernToolMaking/~4/AI9GOarr5ok" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://moderntoolmaking.blogspot.com/feeds/2149491955473417757/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://moderntoolmaking.blogspot.com/2011/07/parallel-random-forests-using-foreach.html#comment-form" title="13 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/2149491955473417757?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/1392191097696786998/posts/default/2149491955473417757?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/ModernToolMaking/~3/AI9GOarr5ok/parallel-random-forests-using-foreach.html" title="Parallel random forests using foreach" /><author><name>Zachary Mayer</name><uri>https://plus.google.com/108375494102265580837</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh4.googleusercontent.com/-xfGpbEdpNCo/AAAAAAAAAAI/AAAAAAAADuU/UXSnJAjKjUs/s512-c/photo.jpg" /></author><thr:total>13</thr:total><feedburner:origLink>http://moderntoolmaking.blogspot.com/2011/07/parallel-random-forests-using-foreach.html</feedburner:origLink></entry></feed>
