Statistics, plain and sample.2012-01-22T22:44:06-08:00http://gjkerns.github.com/G. Jay Kernsgkerns@ysu.eduPower and Sample Size for Repeated Measures ANOVA with R2012-01-20T00:00:00-08:00http://gjkerns.github.com/R/2012/01/20/power-sample-size<div id="outline-container-1" class="outline-2">
<h2 id="sec-1">Background</h2>
<div class="outline-text-2" id="text-1">
<p>One of my colleagues is an academic physical therapist (PT), and he's working on a paper to his colleagues related to power, sample size, and navigating the thicket of trouble that surrounds those two things. We recently got together to walk through some of the issues, and I thought I would share some of the wildlife we observed along the way. If you just want the code and don't care about the harangue, see <a href="https://gist.github.com/1608265">this gist on GitHub</a>.
</p>
</div>
</div>
<div id="outline-container-2" class="outline-2">
<h2 id="sec-2">The problem</h2>
<div class="outline-text-2" id="text-2">
<p>Suppose you are a PT, and you've come up with a brand new exercise method that you think will decrease neck pain, say. How can you demonstrate that your method is effective? Of course, you collect data and show that people using your method have significantly lower neck pain than those from a control group.
</p>
<p>
The standard approach in the PT literature to analyze said data is repeated measures ANOVA. (Yes, those guys should really be using mixed-effects models, but those haven't quite taken off yet.) There are two groups: the "Treatment" group does your new exercise method, and a "Sham" group does nothing (or just the placebo exercise method). For each Subject, you measure their pain at time 0, 15 minutes, 48 hours, and 96 hours. Pain is measured by an index (there are several); the one we're using is something called NDI, which stands for "Neck Disability Index". The index ranges from 0 to 100 (more on this later). There is some brief information about the index <a href="http://www.chiro.org/LINKS/OUTCOME/Painter_1.shtml">here</a>.
</p>
<p>
Now comes the question: how many people should you recruit for your study? The answer is: it depends. "On what?" Well, it depends on how good the statistical test is, and how good your method is, but more to the point, it depends on <i>the effect size</i>, that is, how far apart the two groups are, given that the method actually works.
</p>
<p>
I encounter some variant of this question a lot. I used to go look for research papers where somebody's worked out the <i>F</i>-test and sample sizes required, and pore over tables and tables. Then I resorted to online calculators (the proprietary versions were too expensive for my department!), which are fine, but they all use different notation and it takes a lot of time poring through documentation (which is often poor, pardon the pun) to recall how it works. And I was never really sure whether I'd got it right, or if I had screwed up with a parameter somewhere.
</p>
<p>
Some of the calculators advertise <i>Cohen's effect sizes</i>, which are usually stated something like "small", "medium", and "large", with accompanying numerical values. <a href="http://www.stat.uiowa.edu/~rlenth/Power/">Russell Lenth</a> calls these "T-shirt effect sizes". I agree with him.
</p>
<p>
Nowadays the fashionable people say, "Just run a simulation and estimate the power," but the available materials online are scantly detailed. So my buddy and I worked it all out from start to finish for this simple example, in the hopes that by sharing this information people can get a better idea of how to do it the <b>right</b> way, the <b>first</b> time.
</p>
</div>
</div>
<div id="outline-container-3" class="outline-2">
<h2 id="sec-3">How to attack it</h2>
<div class="outline-text-2" id="text-3">
<p>
<b>The avenue of attack is simple:</b> for a given sample size,
</p><ol>
<li>use prior research and practitioner experience to decide what difference would be "meaningful" to detect,
</li>
<li>simulate data consistent with the above difference and run the desired statistical test to see whether or not it rejected, and
</li>
<li>repeat step 2 hundreds of times. An estimate of the power (for that sample size) is the proportion of times that the test rejected.
</li>
</ol>
<p>
If the power isn't high enough, then increase the given sample size and start over. The value we get is just an <i>estimate</i> of the power, but we can increase the precision of our estimate by increasing the number of repetitions in step 3.
</p>
<p>
What you find when you start down this path is that there is a <b>lot</b> of information required to be able to answer the question. Of course, this information had been hiding behind the scenes all along, even with those old research papers and online calculators, but the other methods make it easy to gloss over the details, or they're so complicated that researchers will give up and fall back to something like Cohen's T-shirt effect sizes.
</p>
</div>
</div>
<div id="outline-container-4" class="outline-2">
<h2 id="sec-4">Now for the legwork</h2>
<div class="outline-text-2" id="text-4">
<p>
The details we need include: A) prior knowledge of how average pain decreases for people in the <code>Sham</code> group, B) some idea about the variability of scores, C) how scores would be correlated with one another over time, and D) how much better the <code>Treat</code> group would need to be in order for the new procedure to be considered clinically meaningful.
</p>
<p>
As a first step, the PT sat down and filled in the following table.
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<caption></caption>
<colgroup><col class="left" /><col class="right" /><col class="right" /><col class="right" /><col class="right" />
</colgroup>
<tbody>
<tr><td class="left"></td><td class="right">0 hrs</td><td class="right">15 min</td><td class="right">48 hrs</td><td class="right">96 hrs</td></tr>
<tr><td class="left">Treat</td><td class="right">37</td><td class="right">32</td><td class="right">20</td><td class="right">15</td></tr>
<tr><td class="left">Sham</td><td class="right">37</td><td class="right">32</td><td class="right">25</td><td class="right">22</td></tr>
</tbody>
</table>
<p>
All of the entries in the above table represent population mean NDI scores for people in the respective groups at the respective measurement times, and were filled in based on prior research and educated guesses by the PT. It was known from other studies that NDI scores have a standard deviation of around 12, and those have been observed to decrease over time.
</p>
<p>
<b>Note:</b> we could have assumed a simpler model for the means. For example, we could have assumed that mean NDI was linear, with possibly different slopes/intercepts for the Treat/Sham groups. Prior info available to the PT said that such an assumption wasn't reasonable for this example.
</p>
<p>
Repeated measures designs assume sphericity for the exact <i>F</i> tests to hold, so we need to specify a variance for the differences, \(\mathrm{Var}(X_{i} - X_{j})\), and sphericity says this variance should be the same for all time points \(i\) and \(j\). As it turns out, this last choice implicitly determines all of the remaining covariance structure. We set this standard deviation to \(9\).
</p>
</div>
</div>
<div id="outline-container-5" class="outline-2">
<h2 id="sec-5">Finally we do some coding</h2>
<div class="outline-text-2" id="text-5">
<p>
We are now ready to turn on the computer. We first intialize the parameters we'll need, next we set up the independent variable data, then we do the simulation, and finally we rinse-and-repeat. Let's go.
</p>
<pre class="src src-R">set.seed(1)
nPerGroup <span style="color: #008b8b;"><-</span> 10
nTime <span style="color: #008b8b;"><-</span> 4
muTreat <span style="color: #008b8b;"><-</span> c(37, 32, 20, 15)
muSham <span style="color: #008b8b;"><-</span> c(37, 32, 25, 22)
stdevs <span style="color: #008b8b;"><-</span> c(12, 10, 8, 6)
stdiff <span style="color: #008b8b;"><-</span> 9
nSim <span style="color: #008b8b;"><-</span> 500
</pre>
<p>
All of the above should be self-explanatory. Next comes setting up the data - creatively named <code>theData</code> - for the independent variables. Just for the sake of argument I used code to generate the data frame, but we wouldn't have had to. We could have imported an external text file had we wished.
</p>
<pre class="src src-R">Subject <span style="color: #008b8b;"><-</span> factor(1:(nPerGroup*2))
Time <span style="color: #008b8b;"><-</span> factor(1:nTime, labels = c(<span style="color: #8b2252;">"0min"</span>, <span style="color: #8b2252;">"15min"</span>, <span style="color: #8b2252;">"48hrs"</span>, <span style="color: #8b2252;">"96hrs"</span>))
theData <span style="color: #008b8b;"><-</span> expand.grid(Time, Subject)
names(theData) <span style="color: #008b8b;"><-</span> c(<span style="color: #8b2252;">"Time"</span>, <span style="color: #8b2252;">"Subject"</span>)
tmp <span style="color: #008b8b;"><-</span> rep(c(<span style="color: #8b2252;">"Treat"</span>, <span style="color: #8b2252;">"Sham"</span>), each = nPerGroup * nTime)
theData$Method <span style="color: #008b8b;"><-</span> factor(tmp)
</pre>
<p>
Again, the above should be self-explanatory for the most part. The data are in "long" form, where each subject appears over multiple rows. In fact, let's take a look at the data frame to make sure it looks right.
</p>
<pre class="src src-R">head(theData)
</pre>
<pre class="example">
Time Subject Method
1 0min 1 Treat
2 15min 1 Treat
3 48hrs 1 Treat
4 96hrs 1 Treat
5 0min 2 Treat
6 15min 2 Treat
</pre>
<p>
Lookin' good. Now for the fun part. We generate the single remaining column, the NDI scores. The repeated measures model is multivariate normal. The population covariance matrix is a little bit tricky, but it's not too bad and to make things easy we'll assume both groups have the same covariance. See <a href="http://www.jstor.org/stable/2284340">the original paper by Huynh and Feldt</a> for details.
</p>
<pre class="src src-R"><span style="color: #b22222;"># </span><span style="color: #b22222;">to set up variance-covariance matrix</span>
ones <span style="color: #008b8b;"><-</span> rep(1, nTime)
A <span style="color: #008b8b;"><-</span> stdevs^2 %o% ones
B <span style="color: #008b8b;"><-</span> (A + t(A) + (stdiff^2)*(diag(nTime) - ones %o% ones))/2
</pre>
<p>
We simulate with the <code>mvrnorm</code> function from the <code>MASS</code> package.
</p>
<pre class="src src-R"><span style="color: #008b8b;">library</span>(MASS)
tmp1 <span style="color: #008b8b;"><-</span> mvrnorm(nPerGroup, mu = muTreat, Sigma = B)
tmp2 <span style="color: #008b8b;"><-</span> mvrnorm(nPerGroup, mu = muSham, Sigma = B)
theData$NDI <span style="color: #008b8b;"><-</span> c(as.vector(t(tmp1)), as.vector(t(tmp2)))
</pre>
<p>
Now that we have our data, we can run the test:
</p>
<pre class="src src-R">aovComp <span style="color: #008b8b;"><-</span> aov(NDI ~ Time*Method + Error(Subject/Time), theData)
summary(aovComp)
</pre>
<pre class="example">
Error: Subject
Df Sum Sq Mean Sq F value Pr(>F)
Method 1 157.9 157.9 1.499 0.237
Residuals 18 1896.2 105.3
Error: Subject:Time
Df Sum Sq Mean Sq F value Pr(>F)
Time 3 5082 1693.9 43.066 2.38e-14 ***
Time:Method 3 119 39.6 1.006 0.397
Residuals 54 2124 39.3
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
</pre>
<p>
Terrific! For these data, we observe a highly significant <code>Time</code> effect (this should be obvious given our table above), an insignificant <code>Method</code> fixed effect, and an insignificant <code>Time:Method</code> interaction. If we think about our model and what we're interested in, it's the interaction which we care about and that which we'd like to detect. If our significance level had been \(\alpha = 0.05\), we would not have rejected this time, but who knows what would happen next time.
</p>
<p>
Now it's time to rinse-and-repeat, which we accomplish with the <code>replicate</code> function. Before we get there, though, let's look at a plot. There are several relevant ones, but in the interest of brevity let's satisfy ourselves with an <code>interaction.plot</code>:
</p>
<pre class="src src-R">with(theData, interaction.plot(Time, Method, NDI))
</pre>
<div id="fig-interaction-plot" class="figure">
<p>
<img src="/images/120120.png" alt="/images/120120.png" />
</p>
</div>
<p>
Everything is going according to plan. There is definitely a <code>Time</code> effect (the lines both slope downward) but there isn't any evidence of an interaction (the lines have similar slopes).
</p>
<p>
On to rinse-and-repeat, we first set up the function that runs the test once:
</p>
<pre class="src src-R"><span style="color: #0000ff;">runTest</span> <span style="color: #008b8b;"><-</span> <span style="color: #a020f0;">function</span>(){
tmp1 <span style="color: #008b8b;"><-</span> mvrnorm(nPerGroup, mu = muTreat, Sigma = B)
tmp2 <span style="color: #008b8b;"><-</span> mvrnorm(nPerGroup, mu = muSham, Sigma = B)
theData$NDI <span style="color: #008b8b;"><-</span> c(as.vector(t(tmp1)), as.vector(t(tmp2)))
aovComp <span style="color: #008b8b;"><-</span> aov(NDI ~ Time*Method + Error(Subject/Time), theData)
b <span style="color: #008b8b;"><-</span> summary(aovComp)$<span style="color: #8b2252;">'Error: Subject:Time'</span>[[1]][2,5]
b < 0.05
}
</pre>
<p>
and finally do the repeating:
</p>
<pre class="src src-R">mean(replicate(nSim, runTest()))
</pre>
<pre class="example">
[1] 0.372
</pre>
<p>
Whoa! The power is 0.372? That's pretty low. We recall that this is just an <i>estimate</i> of power - how precise is the estimate? The standard error of \(\hat{p}\) is approximately \(\sqrt{\hat{p}(1 - \hat{p})/n}\), so in our case, our estimate's standard error is approximately 0.022. That means we are approximately 95% confident that the true power at this particular alternative is covered by the interval \([0.329,0.415]\).
</p>
<p>
Standard practice is to shoot for a power of around \(\beta = 0.80\), so our power isn't even close to what we'd need. We can increase power by increasing sample size (the parameter <code>nPerGroup</code>). A larger sample size means a longer time needed to run the simulation. Below are some results of running the above script at assorted sample sizes.
</p>
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<caption></caption>
<colgroup><col class="right" /><col class="right" /><col class="right" />
</colgroup>
<tbody>
<tr><td class="right"><code>nPerGroup</code></td><td class="right">Power (estimate)</td><td class="right">SE (approx)</td></tr>
<tr><td class="right">10</td><td class="right">0.372</td><td class="right">0.022</td></tr>
<tr><td class="right">20</td><td class="right">0.686</td><td class="right">0.021</td></tr>
<tr><td class="right">30</td><td class="right">0.886</td><td class="right">0.014</td></tr>
</tbody>
</table>
<p>
Now we're talking. It looks like somewhere between 20 and 30 subjects per group would be enough to detect the clinically meaningful difference proposed above with a power of 0.80.
</p>
<p>
Unfortunately, the joke is on us. Because, as it happens, it's no small order for a lone, practicing PT (around here) to snare 60 humans with neck pain for a research study. A person would need to be in (or travel to) a heavily populated area, and even <i>then</i> there would be dropout, people not showing up for subsequent appointments.
</p>
<p>
<b>So what can we do?</b>
</p><ol>
<li><b>Modify the research details.</b> If we take a closer look at the table, there isn't an expected difference in the means until 48 hours, so why not measure differently, say, at 0, 48, 96, and 144 hours? Is there something else about the measurement process we could change to decrease the variance?
</li>
<li><b>Use a different test.</b> We are going with boilerplate repeated-measures ANOVA here. Is that really the best choice? What would happen if we tried the mixed-effects approach?
</li>
<li><b>Take a second look at the model.</b> We should not only double-check our parameter choices, but rethink: is the repeated-measures model (multivariate normal) the most appropriate? Is it reasonable for the variance of differences at all time pairs to be identical? What about the covariance structure? There are others we could try, such as an autoregressive model (another arrow in the mixed-effects models' quiver).
</li>
</ol>
</div>
</div>
<div id="outline-container-6" class="outline-2">
<h2 id="sec-6">Other things to keep in mind</h2>
<div class="outline-text-2" id="text-6">
<ul>
<li>This example is simple enough to have done analytically; we didn't have to simulate anything at all.
</li>
<li>Even if the example hadn't been simple, we could still have searched for an <i>approximate</i> analytic solution which, if nothing else, might have given some insight into the power function's behavior.
</li>
<li>We could have adjusted all the means upward by 7 and nothing would have changed. We based our initial values on literature review and clinical expertise.
</li>
<li>We didn't bother with contrasts, functional means, or anything else. We just generated data consistent with our null and salient alternative and went on with our business.
</li>
<li>We could have used whatever test we liked yet the method of attack would have been the same. Multiple comparisons, nested tests, nonparametric tests, whatever. As long as we include the full procedure in <code>runTest</code>, we will get valid estimates of power for <i>that</i> procedure at <i>that</i> alternative.
</li>
<li>We need to be careful that the test we use (whatever it is) has its significance level controlled. This is easy to check in our example. We can set the means equal (<code>muTreat</code> = <code>muSham</code>) and run the simulation. We should get a power equal to 0.05 (within margin of error). Go ahead, check yourself. In fact, since we only care about the interaction, we could vertically offset the means by any fixed number, not necessarily zero.
</li>
<li>Had we not been careful with our <code>stdevs</code>, our simulated NDIs would have gone negative, particularly at the latter time points. That would not have been reasonable since NDI is nonnegative.
</li>
<li><b>Simulation is not a silver bullet.</b>
</li>
<li>Effective simulation requires substantial investment of thought into <b>both</b> the probability model and the parameter settings.
</li>
<li>Our model had 13 parameters, and we had 4 more we didn't even touch<sup><a class="footref" name="fnr.1" href="#fn.1">1</a></sup>. A person could be forgiven for wondering how in the world all of those parameters can be expressively spun into a T-shirt effect size. (They can't.)
</li>
<li>The complexity can get out of control quickly. Simulation run times can take forever. The more complicated the model/test the worse it gets.
</li>
<li>Informative simulation demands literature review and content expertise as a prerequisite. Some researchers are unable (due to lack of existing/quality studies) or unwilling (for all sorts of reasons, not all of which are good) to help the statistician fill in the details. For the statistician, this is a problem. If you don't know anything, then you can't say anything.
</li>
<li>We can address uncertainty in our parameter guesses with prior distributions on the parameters. This adds a layer of complexity to the simulation since we must first simulate the parameters before simulating the data. Sometimes there's no other choice.
</li>
<li>Theory tells us that the standard research designs (including our current one) can usually be re-parameterized by a single non-centrality parameter which ultimately determines the power at any particular alternative. Following our nose, it suggests that our problem is simpler than we're making it, that if we would just write down the non-centrality parameter (and the right numerator/denominator degrees of freedom), we'd be all set. Yep, we would. Good luck with all… that.
</li>
</ul>
</div>
</div>
<div id="outline-container-7" class="outline-2">
<h2 id="sec-7">References</h2>
<div class="outline-text-2" id="text-7">
<ul>
<li>See <a href="http://stats.stackexchange.com/questions/21237/calculating-statistical-power">this question</a> on <a href="http://stats.stackexchange.com/">CrossValidated</a> which came up while I was working on this document (I might not have answered so quickly otherwise). Thanks to all who contributed to that discussion.
</li>
<li><i>Conditions Under Which Mean Square Ratios in Repeated Measurements Designs Have Exact F-Distributions</i>. Huynh Huynh and Leonard S. Feldt, Journal of the American Statistical Association, Vol. 65, No. 332 (Dec., 1970), pp. 1582-1589, <a href="http://www.jstor.org/stable/2284340">stable link</a>.
</li>
<li>I found <a href="http://personality-project.org/r/r.anova.html">this website</a> while preparing for the initial meeting and got some inspiration from the discussion near the middle.
</li>
<li>There are several papers on <a href="http://www.stat.uiowa.edu/~rlenth/Power/">Russell Lenth</a>'s webpage which are good reading.
</li>
<li>I also like <a href="http://onlinelibrary.wiley.com/doi/10.1348/000711001159357/abstract">this paper</a>. Keselman, H. J., Algina, J. and Kowalchuk, R. K. (2001), <i>The analysis of repeated measures designs: A review.</i> British Journal of Mathematical and Statistical Psychology, 54: 1–20. doi: 10.1348/000711001159357
</li>
<li>Many of the concepts above are explained more formally in my <a href="https://github.com/gjkerns/STAT5840">Statistical Computing</a> course which you can get on GitHub with
<pre class="example">
git clone git://github.com/gjkerns/STAT5840.git
</pre>
</li>
<li>To learn more about Monte Carlo methods with R I recommend <a href="http://www.springer.com/statistics/computanional+statistics/book/978-1-4419-1575-7">Introducing Monte Carlo Methods with R</a> by Robert and Casella. I also like <a href="http://personal.bgsu.edu/~mrizzo/SCR.htm">Statistical Computing with R</a> by Rizzo which has a section about simulating power of statistical tests.
</li>
<li>For the record, here is my <code>sessionInfo</code>.
<pre class="example"> R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] MASS_7.3-16
</pre>
</li>
</ul>
<div id="footnotes">
<h2 class="footnotes">Footnotes: </h2>
<div id="text-footnotes">
<p class="footnote"><sup><a class="footnum" name="fn.1" href="#fnr.1">1</a></sup> John von Neumann once said, "With four parameters I can fit an elephant, and with five I can make him wiggle his trunk."
</p></div>
</div>
</div>
</div>
<img src="http://feeds.feedburner.com/~r/StatisticsPlainAndSample/~4/J4-mqdL0sgU" height="1" width="1" alt=""/>http://gjkerns.github.com/R/2012/01/20/power-sample-size.htmlEstimating a normal mean with a cauchy prior2011-08-24T00:00:00-07:00http://gjkerns.github.com/R/2011/08/24/estimate-normal-mean<div id="outline-container-1" class="outline-2">
<h2 id="sec-1">The setup</h2>
<div class="outline-text-2" id="text-1">
<p>When doing statistics the Bayesian way, we are sometimes bombarded with complicated integrals that do not lend themselves to closed-form solutions. This used to be a problem. Nowadays, not so much. This post illustrates how a person can use the Monte Carlo method (and R) to get a good estimate for an integral that might otherwise look unwieldy at first glance. Of course, in this example, the integral isn't very complicated. But the <i>method</i> works the same, regardless of the mess in which we find ourselves. The current example is derived from one in <a href="http://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-21239-5">Monte Carlo Statistical Methods</a> by Robert/Casella (in Chapter 4). For that matter, check out their <a href="http://www.springer.com/statistics/computanional+statistics/book/978-1-4419-1575-7">Introducing Monte Carlo Methods with R</a>.
</p>
<p>
Suppose we have one observation \( X \sim N(\theta,1) \) but we have a (robust) prior distribution on \(\theta\), namely, \( \theta \sim \mathrm{Cauchy}(0,1) \). We would like to update our beliefs about \(\theta\) based on the information provided by \(x\). So our likelihood is
\[
f(x|\theta) = \frac{1}{\sqrt{2\pi}}\exp \left[-\frac{1}{2}(x - \theta)^2 \right],
\]
and our prior is
\[
g(\theta) = \frac{1}{\pi}\frac{1}{(1 + \theta^{2})}.
\]
The posterior distribution is proportional to the likelihood times prior, that is,
\[
g(\theta|x) \propto \exp \left[-\frac{1}{2}(x - \theta)^2 \right] \frac{1}{(1 + \theta^{2})},
\]
with the proportionality constant being the reciprocal of
\[
C = \int \exp \left[-\frac{1}{2}(x - \theta)^2 \right] \frac{1}{(1 + \theta^{2})} \mathrm{d} \theta.
\]
Our point estimate (or best guess) for \(\theta\) will be just the <i>posterior mean</i>, given by
\[
E (\theta | \mbox{data}) = \frac{ \int \theta \exp \left[-\frac{1}{2}(x - \theta)^2 \right] \frac{1}{(1 + \theta^{2})} \mathrm{d} \theta }{C}.
\]
We notice that the integrand for \(C\) looks like <i>something</i> times a Cauchy PDF, where the <i>something</i> (let's call it \(h\)) is
\[
h(\theta) = \pi \exp \left[-\frac{1}{2}(x - \theta)^2 \right],
\]
so one way to use the Monte Carlo method follows.
</p><dl>
<dt>Procedure:</dt><dd>Given observed data \(X=x\),
<ol>
<li>Simulate a bunch of Cauchys, \(\theta_{1},\theta_{2},\ldots,\theta_{m}\), say.
</li>
<li>Estimate the integral in the denominator with
\[
\frac{1}{m}\sum_{i=1}^{m} \pi\exp\left[-\frac{1}{2}(x - \theta_{i})^2 \right].
\]
</li>
<li>Estimate the integral in the numerator with
\[
\frac{1}{m}\sum_{i=1}^{m} \pi\theta_{i} \exp \left[-\frac{1}{2}(x - \theta_{i})^2 \right].
\]
</li>
<li>Take the ratio, and we're done.
</li>
</ol>
</dd>
</dl>
<p>
The <a href="http://en.wikipedia.org/wiki/Strong_Law_of_Small_Numbers">Strong Law of Large Numbers</a> says that the averages in 2 and 3 both converge to where they should, so the ratio should converge to the right place as well.
</p>
</div>
</div>
<div id="outline-container-2" class="outline-2">
<h2 id="sec-2">How to do it with R</h2>
<div class="outline-text-2" id="text-2">
<p>The following is an R script which does the above. For laughs, let's suppose that we observed \(X=3\).
</p>
<pre class="src src-R"><span style="color: #b22222;"># </span><span style="color: #b22222;">cauchyprior.R</span>
set.seed(1) <span style="color: #b22222;"># </span><span style="color: #b22222;">makes the experiment reproducible</span>
m <span style="color: #008b8b;"><-</span> 2000 <span style="color: #b22222;"># </span><span style="color: #b22222;">number of simulated values</span>
x <span style="color: #008b8b;"><-</span> 3 <span style="color: #b22222;"># </span><span style="color: #b22222;">observed data</span>
<span style="color: #b22222;"># </span><span style="color: #b22222;">Now simulate some random variables</span>
theta <span style="color: #008b8b;"><-</span> rcauchy(m) <span style="color: #b22222;"># </span><span style="color: #b22222;">simulate m standard Cauchys</span>
h <span style="color: #008b8b;"><-</span> pi * exp(-0.5*(x - theta)^2) <span style="color: #b22222;"># </span><span style="color: #b22222;">who wants to write this over and over</span>
Constant <span style="color: #008b8b;"><-</span> mean(h) <span style="color: #b22222;"># </span><span style="color: #b22222;">estimate normalizing constant</span>
post.mean <span style="color: #008b8b;"><-</span> mean(theta * h)/mean(h) <span style="color: #b22222;"># </span><span style="color: #b22222;">estimate posterior mean</span>
</pre>
</div>
</div>
<div id="outline-container-3" class="outline-2">
<h2 id="sec-3">At the command prompt</h2>
<div class="outline-text-2" id="text-3">
<p>After copy-pasting the above into an R session we can see what the results were with an additional
</p>
<pre class="src src-R">Constant
post.mean
</pre>
<p>
For this simple example we can actually calculate what the true values are (to machine precision) with the following: for the constant \(C\) we get
</p>
<pre class="src src-R"><span style="color: #0000ff;">f</span> <span style="color: #008b8b;"><-</span> <span style="color: #a020f0;">function</span>(x) exp(-0.5*(x - 3)^2)/(1 + x^2)
integrate(f, -<span style="color: #228b22;">Inf</span>, <span style="color: #228b22;">Inf</span>)
</pre>
<p>
so our estimate of \(C\) overshot the mark by about 0.03, and in the posterior mean case we get
</p>
<pre class="src src-R"><span style="color: #0000ff;">g</span> <span style="color: #008b8b;"><-</span> <span style="color: #a020f0;">function</span>(x) x * f(x)
integrate(g, -<span style="color: #228b22;">Inf</span>, <span style="color: #228b22;">Inf</span>)$value / integrate(f, -<span style="color: #228b22;">Inf</span>, <span style="color: #228b22;">Inf</span>)$value
</pre>
<p>
so our estimate of the posterior mean was around 0.05 too high. If we would like to get better estimates, we could increase the value of <code>m = 2000</code> to something higher (assuming these things are actually converging someplace).
</p>
</div>
</div>
<div id="outline-container-4" class="outline-2">
<h2 id="sec-4">Are we waiting long enough?</h2>
<div class="outline-text-2" id="text-4">
<p>Our estimates were a little bit off; we might like to take a look at a plot to see how we're doing – is this thing really converging like we'd expect? We can look at a running average plot to assess convergence. If the plot bounces around indeterminably, that's bad, but if it settles down to a finite constant, that's better. Here's a quick way to check this out with base graphics.
</p>
<pre class="src src-R">rc <span style="color: #008b8b;"><-</span> cumsum(h)/seq_along(h) <span style="color: #b22222;"># </span><span style="color: #b22222;">running mean of C</span>
rpm <span style="color: #008b8b;"><-</span> cumsum(h * theta)/cumsum(h) <span style="color: #b22222;"># </span><span style="color: #b22222;">running posterior mean</span>
</pre>
<p>
Now we plot the results.
</p>
<pre class="src src-R">A <span style="color: #008b8b;"><-</span> data.frame(iter = 1:m, rc = rc, rpm=rpm)
<span style="color: #008b8b;">library</span>(reshape)
A.short <span style="color: #008b8b;"><-</span> melt(A[3:200, ], id=<span style="color: #8b2252;">"iter"</span>)
a <span style="color: #008b8b;"><-</span> ggplot(A.short, aes(iter, value, colour=variable)) + geom_line() +
opts(title = <span style="color: #8b2252;">"First 200"</span>)
A.long <span style="color: #008b8b;"><-</span> melt(A, id=<span style="color: #8b2252;">"iter"</span>)
b <span style="color: #008b8b;"><-</span> ggplot(A.long, aes(iter, value, colour=variable)) + geom_line() +
opts(title = <span style="color: #8b2252;">"All 2000 iterations"</span>)
grid.newpage()
pushViewport(viewport(layout = grid.layout(1, 2, widths = unit(c(3,5),<span style="color: #8b2252;">"null"</span>))))
<span style="color: #0000ff;">vplayout</span> <span style="color: #008b8b;"><-</span> <span style="color: #a020f0;">function</span>(x, y)
viewport(layout.pos.row = x, layout.pos.col = y)
print(a, vp = vplayout(1, 1))
print(b, vp = vplayout(1, 2))
</pre>
<div id="fig-yplot" class="figure">
<p><img src="/images/110824.png" alt="/images/110824.png" /></p>
<p>Running averages for assessing convergence of the estimators</p>
</div>
<p>
In this example, the estimates look to be still unstable at around <code>m = 200</code>, but by the time we reach <code>m = 2000</code> they look to have pretty much settled down. Here we knew what the true values were, so we could tell immediately how well we were doing. On the battlefield we are not so lucky. In general, with Monte Carlo estimates like these it is wise to take a look at some plots to judge the behavior of our estimators. If our plot looks more like the one on the left, then we should consider increasing the sample size. If our plot looks more like the one on the right, then maybe we would be satisfied with "close enough". (We can always wait longer, tight purse-strings notwithstanding.)
</p>
</div>
</div>
<div id="outline-container-5" class="outline-2">
<h2 id="sec-5">Other approaches</h2>
<div class="outline-text-2" id="text-5">
<p>When we were looking to estimate \(C\) we noticed that the integrand was <i>something</i> times a Cauchy distribution. If we look again, we can see that the same integrand also looks like a <i>normal</i> distribution times <i>something</i>. So, another approach would be to simulate a bunch of normals and average the new <i>somethings</i>. Do we get the same answer (in the limit)?
</p>
<p>
Yes, of course. It turns out, the approach simulating normals does a little bit better than the one simulating Cauchys, but they're really pretty close. Check out chapter 4 of <a href="http://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-21239-5">Monte Carlo Statistical Methods</a> for discussion on this.
</p>
</div>
</div>
<div id="outline-container-6" class="outline-2">
<h2 id="sec-6">Where to find more…</h2>
<div class="outline-text-2" id="text-6">
<p>
The above is a variant of an example we did in <a href="https://github.com/gjkerns/STAT5840">STAT 5840, Statistical Computing</a>. The entire course is available online at <a href="https://github.com/gjkerns/STAT5840">github</a>. Go to the Downloads for a <code>.zip</code> file or <code>.tar.gz</code>. Or, if you have <a href="http://git-scm.com/">git</a> installed, you can get (git?) it all with
</p><pre class="example">
git clone git://github.com/gjkerns/STAT5840.git
</pre>
</div>
</div>
<img src="http://feeds.feedburner.com/~r/StatisticsPlainAndSample/~4/FOTPqQN9-cM" height="1" width="1" alt=""/>http://gjkerns.github.com/R/2011/08/24/estimate-normal-mean.htmlMaiden voyage2011-08-23T00:00:00-07:00http://gjkerns.github.com/2011/08/23/first-post<div id="outline-container-1" class="outline-2">
<h2 id="sec-1">Who</h2>
<div class="outline-text-2" id="text-1">
<p><a href="http://people.ysu.edu/~gkerns/">Me</a>. I'm an associate professor of Statistics at <a href="http://web.ysu.edu/stem/math">Youngstown State University</a> in <a href="http://en.wikipedia.org/wiki/Youngstown,_Ohio">Youngstown, Ohio, USA</a>. I've been using <a href="http://www.r-project.org/">R</a> for about 7 years, <a href="http://www.gnu.org/software/emacs/">Emacs</a> about 3 years, <a href="http://git-scm.com/">git</a> about 1 year, and <a href="http://orgmode.org/">Org-Mode</a> for less than a year.
</p>
</div>
</div>
<div id="outline-container-2" class="outline-2">
<h2 id="sec-2">What</h2>
<div class="outline-text-2" id="text-2">
<p>I want this blog to be about statistics, plain and sample. No frills, no tomfoolery, just bare-boned statistics from beginning to end. Plus Emacs, ESS, Org-Mode, and R, but that goes without saying.
</p>
</div>
</div>
<div id="outline-container-3" class="outline-2">
<h2 id="sec-3">When</h2>
<div class="outline-text-2" id="text-3">
<p>I've wanted to do this for a long time, but had as of yet convinced myself that I didn't have time for it. A sabbatical coupled with the renewed energy of a <a href="http://www.warwick.ac.uk/statsdept/user-2011/">use R! conference</a> can change things considerably.
</p>
</div>
</div>
<div id="outline-container-4" class="outline-2">
<h2 id="sec-4">Emacs + Org mode + Jekyll + Github = Blog + R!</h2>
<div class="outline-text-2" id="text-4">
<p>After much fiddling and googling I have managed to figure out how to run a blog entirely through Emacs and Git. If you'd like to do the same I recommend reading <a href="http://orgmode.org/worg/org-tutorials/org-jekyll.html">here</a> and <a href="http://vitobotta.com/how-to-migrate-from-wordpress-to-jekyll/">here</a>, with liberal doses of <a href="http://blog.envylabs.com/2009/08/publishing-a-blog-with-github-pages-and-jekyll/">here</a> and <a href="https://github.com/mojombo/jekyll">here</a>. Ultimately, if you'd like to know how I do it then you can find the org-mode source code for this blog <a href="https://github.com/gjkerns/blog">here</a> and you can download the final result <a href="https://github.com/gjkerns/gjkerns.github.com">here</a> (which still is source code but is as close to final as possible).
</p>
<p>
The bottom line: with this setup I can effortlessly do R code like this:
</p>
<pre class="src src-R">rnorm(10)
</pre>
<pre class="example">
[1] -1.07503636 -0.11587837 0.64801870 -0.78416095 -0.05825559 -0.26152707
[7] 0.36192812 -0.63710301 -0.35185059 0.30624394
</pre>
<p>
And can include plots like this:
</p>
<div id="fig-yplot" class="figure">
<p><img src="/images/skiddaw.png" alt="/images/skiddaw.png" /></p>
<p>A plot to get things started</p>
</div>
<p>
all housed inside a simple, dynamic text file that I can edit with <a href="http://www.gnu.org/software/emacs/">Emacs</a> and can version-control with <a href="http://git-scm.com/">git</a>. On top of all this, I get LaTeX formatting in HTML via <a href="http://www.mathjax.org/">MathJax</a>. Life is good.
</p>
</div>
</div>
<img src="http://feeds.feedburner.com/~r/StatisticsPlainAndSample/~4/dE5O17FmbGs" height="1" width="1" alt=""/>http://gjkerns.github.com/2011/08/23/first-post.html