<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>All About Statistics</title>
	
	<link>http://www.statsblogs.com</link>
	<description />
	<lastBuildDate>Thu, 20 Jun 2013 08:38:56 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/statsblogs" /><feedburner:info uri="statsblogs" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId>statsblogs</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>SIMPSON’S PARADOX EXPLAINED</title>
		<link>http://feedproxy.google.com/~r/statsblogs/~3/CNub_iPTwsU/</link>
		<comments>http://www.statsblogs.com/2013/06/20/simpsons-paradox-explained/#comments</comments>
		<pubDate>Thu, 20 Jun 2013 08:38:56 +0000</pubDate>
		<dc:creator>normaldeviate</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://normaldeviate.wordpress.com/?p=431</guid>
		<description><![CDATA[<p>SIMPSON&#8217;S PARADOX EXPLAINED Imagine a treatment with the following properties: The treatment is good for men (E1) The treatment is good for women (E2) The treatment bad overall (E3) That&#8217;s the essence of Simpson&#8217;s paradox. But there is no such treatment. Statements (E1), (E2) and (E3) cannot all be true simultaneously. Simpson&#8217;s paradox occurs when [&#8230;]<img alt="" border="0" src="https://stats.wordpress.com/b.gif?host=normaldeviate.wordpress.com&#38;blog=36942929&#38;post=431&#38;subd=normaldeviate&#38;ref=&#38;feed=1" width="1" height="1" /></p><p>The post <a href="http://www.statsblogs.com/2013/06/20/simpsons-paradox-explained/">SIMPSON’S PARADOX EXPLAINED</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p>]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution">(This article was originally published at <a href="https://normaldeviate.wordpress.com">Normal Deviate</a>, and syndicated at <a href="http://www.statsblogs.com">StatsBlogs</a>.)
<br /></p>
<p><p align="center"> <b>SIMPSON&#8217;S PARADOX EXPLAINED</b> </p>
<p>
Imagine a treatment with the following properties:</p>
<p><p align="center">
<table align="center">
<tr>
<td align="left"> The treatment is good for men </td>
<td align="left"> (E1)</td>
</tr>
<tr>
<td align="left"> The treatment is good for women </td>
<td align="left"> (E2)</td>
</tr>
<tr>
<td align="left"> The treatment bad overall </td>
<td align="left"> (E3) </td>
</tr>
</table>
<p>
That&#8217;s the essence of <a class="snap_noshots" href="http://en.wikipedia.org/wiki/Simpsons_paradox">Simpson&#8217;s paradox</a>. But there is no such treatment. Statements (E1), (E2) and (E3) cannot all be true simultaneously.</p>
<p>
Simpson&#8217;s paradox occurs when people equate three probabilistic statements (P1), (P2), (P3) described below, with the statements (E1), (E2), (E3) above. It turns out that (P1), (P2), (P3) can all be true. But, to repeat: (E1), (E2), (E3) cannot all be true.</p>
<p>
The paradox is NOT that (P1), (P2), (P3) are all true. The paradox only occurs if you mistakenly equate (P1-P3) with (E1-E3).</p>
<p>
<p><b>1. Details </b></p>
<p><p>
Throughout this post I&#8217;ll assume we have essentially an infinite sample size. The confusion about Simpson&#8217;s paradox is about population quantities so we needn&#8217;t focus on sampling error.</p>
<p>
Assume that <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BY%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Y}' title='{Y}' class='latex' /> is binary. The key probability statements are:</p>
<p><p align="center">
<table align="center">
<tr>
<td align="left"> <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BP%28Y%3D1%7CX%3D1%2CZ%3D1%29+-+P%28Y%3D1%7CX%3D0%2CZ%3D1%29+%3E+0%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{P(Y=1|X=1,Z=1) - P(Y=1|X=0,Z=1) &gt; 0}' title='{P(Y=1|X=1,Z=1) - P(Y=1|X=0,Z=1) &gt; 0}' class='latex' /> </td>
<td align="left"> (P1)</td>
</tr>
<tr>
<td align="left"> <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BP%28Y%3D1%7CX%3D1%2CZ%3D0%29+-+P%28Y%3D1%7CX%3D0%2CZ%3D0%29+%3E+0%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{P(Y=1|X=1,Z=0) - P(Y=1|X=0,Z=0) &gt; 0}' title='{P(Y=1|X=1,Z=0) - P(Y=1|X=0,Z=0) &gt; 0}' class='latex' /> </td>
<td align="left"> (P2)</td>
</tr>
<tr>
<td align="left"> <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BP%28Y%3D1%7CX%3D1%29+-+P%28Y%3D1%7CX%3D0%29+%3C+0%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{P(Y=1|X=1) - P(Y=1|X=0) &lt; 0}' title='{P(Y=1|X=1) - P(Y=1|X=0) &lt; 0}' class='latex' /> </td>
<td align="left"> (P3) </td>
</tr>
</table>
<p>
Here, <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BY%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Y}' title='{Y}' class='latex' /> is the outcome (<img src='https://s-ssl.wordpress.com/latex.php?latex=%7BY%3D1%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Y=1}' title='{Y=1}' class='latex' /> means success, <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BY%3D0%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Y=0}' title='{Y=0}' class='latex' /> means failure), <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BX%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{X}' title='{X}' class='latex' /> is treatment (<img src='https://s-ssl.wordpress.com/latex.php?latex=%7BX%3D1%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{X=1}' title='{X=1}' class='latex' /> means treated, <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BX%3D0%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{X=0}' title='{X=0}' class='latex' /> means not-treated) and <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' /> is sex (<img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%3D1%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z=1}' title='{Z=1}' class='latex' /> means male, <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%3D0%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z=0}' title='{Z=0}' class='latex' /> means female).</p>
<p>
It is easy to construct numerical examples where (P1), (P2) and (P3) are all true. The confusion arises if we equate the three probability statements (P1-P3) with the English sentences (E1-E3).</p>
<p>
To summarize: it is possible for (P1), (P2), (P3) to all be true. It is NOT possible for (E1), (E2), (E3) to all be true. The error is in equating (P1-P3) with (E1-E3).</p>
<p>
To capture the English statements above, we need causal language, either counterfactuals or causal directed graphs. Either will do. I&#8217;ll use counterfactuals. (For an equivalent explanation using causal graphs, see Pearl 2000). Thus, we introduce <img src='https://s-ssl.wordpress.com/latex.php?latex=%7B%28Y_1%2CY_0%29%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{(Y_1,Y_0)}' title='{(Y_1,Y_0)}' class='latex' /> where <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BY_1%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Y_1}' title='{Y_1}' class='latex' /> is your outcome if treated and <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BY_0%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Y_0}' title='{Y_0}' class='latex' /> is your outcome if not treated. We observe
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++Y+%3D+X+Y_1+%2B+%281-X%29+Y_0.+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  Y = X Y_1 + (1-X) Y_0. ' title='&#92;displaystyle  Y = X Y_1 + (1-X) Y_0. ' class='latex' /></p>
<p> In other words, if <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BX%3D1%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{X=1}' title='{X=1}' class='latex' /> we observe <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BY_1%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Y_1}' title='{Y_1}' class='latex' /> and if <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BX%3D0%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{X=0}' title='{X=0}' class='latex' /> we observe <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BY_0%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Y_0}' title='{Y_0}' class='latex' />. We never observe both <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BY_1%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Y_1}' title='{Y_1}' class='latex' /> and <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BY_0%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Y_0}' title='{Y_0}' class='latex' /> on any person. The correct translation of (E1), (E2) and (E3) is:</p>
<p><p align="center">
<table align="center">
<tr>
<td align="left"> <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BP%28Y_1%3D1%7CZ%3D1%29+-+P%28Y_0%3D1%7CZ%3D1%29+%3E0%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{P(Y_1=1|Z=1) - P(Y_0=1|Z=1) &gt;0}' title='{P(Y_1=1|Z=1) - P(Y_0=1|Z=1) &gt;0}' class='latex' /> </td>
<td align="left"> (C1)</td>
</tr>
<tr>
<td align="left"> <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BP%28Y_1%3D1%7CZ%3D0%29+-+P%28Y_0%3D1%7CZ%3D0%29+%3E0%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{P(Y_1=1|Z=0) - P(Y_0=1|Z=0) &gt;0}' title='{P(Y_1=1|Z=0) - P(Y_0=1|Z=0) &gt;0}' class='latex' /> </td>
<td align="left"> (C2)</td>
</tr>
<tr>
<td align="left"> <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BP%28Y_1%3D1%29+-+P%28Y_0%3D1%29+%3C0%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{P(Y_1=1) - P(Y_0=1) &lt;0}' title='{P(Y_1=1) - P(Y_0=1) &lt;0}' class='latex' /> </td>
<td align="left"> (C3) </td>
</tr>
</table>
<p>
These three statements cannot simultaneously be true. Indeed, if the first two statements hold then
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%5Cbegin%7Barray%7D%7Brcl%7D++P%28Y_1%3D1%29+-+P%28Y_0%3D1%29+%26%3D%26+%5Csum_%7Bz%3D0%7D%5E1+%5BP%28Y_1%3D1%7CZ%3Dz%29+-+P%28Y_0%3D1%7CZ%3Dz%29%5D+P%28z%29%5C%5C+%26+%3E+%26+0.+%5Cend%7Barray%7D+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  &#92;begin{array}{rcl}  P(Y_1=1) - P(Y_0=1) &amp;=&amp; &#92;sum_{z=0}^1 [P(Y_1=1|Z=z) - P(Y_0=1|Z=z)] P(z)&#92;&#92; &amp; &gt; &amp; 0. &#92;end{array} ' title='&#92;displaystyle  &#92;begin{array}{rcl}  P(Y_1=1) - P(Y_0=1) &amp;=&amp; &#92;sum_{z=0}^1 [P(Y_1=1|Z=z) - P(Y_0=1|Z=z)] P(z)&#92;&#92; &amp; &gt; &amp; 0. &#92;end{array} ' class='latex' /></p>
<p> Thus, (C1)+(C2) implies (not C3). If the treatment is good for mean and good for women then of course it is good overall.</p>
<p>
To summarize, in general we have
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%28E1%29+%3D+%28C1%29+%5Cneq+%28P1%29+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  (E1) = (C1) &#92;neq (P1) ' title='&#92;displaystyle  (E1) = (C1) &#92;neq (P1) ' class='latex' /></p>
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%28E2%29+%3D+%28C2%29+%5Cneq+%28P2%29+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  (E2) = (C2) &#92;neq (P2) ' title='&#92;displaystyle  (E2) = (C2) &#92;neq (P2) ' class='latex' /></p>
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%28E3%29+%3D+%28C3%29+%5Cneq+%28P3%29+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  (E3) = (C3) &#92;neq (P3) ' title='&#92;displaystyle  (E3) = (C3) &#92;neq (P3) ' class='latex' /></p>
<p> and, moreover (E3) cannot hold if both (E1) and (E2) hold.</p>
<p>
The key is that, in general,
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%5Cbegin%7Barray%7D%7Brcl%7D++P%28Y%3D1%7CX%3D1%2CZ%3D1%29+%26-%26+P%28Y%3D1%7CX%3D0%2CZ%3D1%29%5C%5C+%26+%5Cneq+%26+P%28Y_1%3D1%7CZ%3D1%29+-+P%28Y_0%3D1%7CZ%3D1%29+%5Cend%7Barray%7D+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  &#92;begin{array}{rcl}  P(Y=1|X=1,Z=1) &amp;-&amp; P(Y=1|X=0,Z=1)&#92;&#92; &amp; &#92;neq &amp; P(Y_1=1|Z=1) - P(Y_0=1|Z=1) &#92;end{array} ' title='&#92;displaystyle  &#92;begin{array}{rcl}  P(Y=1|X=1,Z=1) &amp;-&amp; P(Y=1|X=0,Z=1)&#92;&#92; &amp; &#92;neq &amp; P(Y_1=1|Z=1) - P(Y_0=1|Z=1) &#92;end{array} ' class='latex' /></p>
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%5Cbegin%7Barray%7D%7Brcl%7D++P%28Y%3D1%7CX%3D1%2CZ%3D0%29+%26-%26+P%28Y%3D1%7CX%3D0%2CZ%3D0%29%5C%5C+%26+%5Cneq+%26+P%28Y_1%3D1%7CZ%3D0%29+-+P%28Y_0%3D1%7CZ%3D0%29+%5Cend%7Barray%7D+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  &#92;begin{array}{rcl}  P(Y=1|X=1,Z=0) &amp;-&amp; P(Y=1|X=0,Z=0)&#92;&#92; &amp; &#92;neq &amp; P(Y_1=1|Z=0) - P(Y_0=1|Z=0) &#92;end{array} ' title='&#92;displaystyle  &#92;begin{array}{rcl}  P(Y=1|X=1,Z=0) &amp;-&amp; P(Y=1|X=0,Z=0)&#92;&#92; &amp; &#92;neq &amp; P(Y_1=1|Z=0) - P(Y_0=1|Z=0) &#92;end{array} ' class='latex' /></p>
<p> and
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++P%28Y%3D1%7CX%3D1%29+-+P%28Y%3D1%7CX%3D0%29+%5Cneq+P%28Y_1%3D1%29+-+P%28Y_0%3D1%29.+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  P(Y=1|X=1) - P(Y=1|X=0) &#92;neq P(Y_1=1) - P(Y_0=1). ' title='&#92;displaystyle  P(Y=1|X=1) - P(Y=1|X=0) &#92;neq P(Y_1=1) - P(Y_0=1). ' class='latex' /></p>
<p> In other words, correlation (left hand side) is not equal to causation (right hand side).</p>
<p>
Now, if treatment is randomly assigned, then <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BX%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{X}' title='{X}' class='latex' /> is independent of <img src='https://s-ssl.wordpress.com/latex.php?latex=%7B%28Y_0%2CY_1%29%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{(Y_0,Y_1)}' title='{(Y_0,Y_1)}' class='latex' /> and
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++P%28Y%3D1%7CX%3D1%2CZ%3Dz%29+%3D+P%28Y_1%3D1%7CX%3D1%2CZ%3Dz%29+%3D+P%28Y_1%3D1%7CZ%3Dz%29+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  P(Y=1|X=1,Z=z) = P(Y_1=1|X=1,Z=z) = P(Y_1=1|Z=z) ' title='&#92;displaystyle  P(Y=1|X=1,Z=z) = P(Y_1=1|X=1,Z=z) = P(Y_1=1|Z=z) ' class='latex' /></p>
<p> and so we will not observe the reversal, even for the correlation statements. That is, when <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BX%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{X}' title='{X}' class='latex' /> is randomly assigned, (P1-P3) cannot all hold.</p>
<p>
In the non-randomized case, we can only recover the causal effect by conditioning on all possible <em>confounding variables</em> <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BW%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{W}' title='{W}' class='latex' />. (Recall a confounding variable is a variable that affects both <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BX%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{X}' title='{X}' class='latex' /> and <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BY%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Y}' title='{Y}' class='latex' />.) This is because <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BX%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{X}' title='{X}' class='latex' /> is independent of <img src='https://s-ssl.wordpress.com/latex.php?latex=%7B%28Y_0%2CY_1%29%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{(Y_0,Y_1)}' title='{(Y_0,Y_1)}' class='latex' /> conditional on <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BW%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{W}' title='{W}' class='latex' /> (that&#8217;s what it means to control for confounders) and we have
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%5Cbegin%7Barray%7D%7Brcl%7D++P%28Y_1%3D1%29+%26%3D%26+%5Csum_w+P%28Y_1%3D1%7CW%3Dw%29+P%28W%3Dw%29%5C%5C+%26%3D%26+%5Csum_w+P%28Y_1%3D1%7CX%3D1%2CW%3Dw%29+P%28W%3Dw%29+%5C%5C+%26%3D%26+%5Csum_w+P%28Y%3D1%7CX%3D1%2CW%3Dw%29+P%28W%3Dw%29+%5C%5C+%5Cend%7Barray%7D+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  &#92;begin{array}{rcl}  P(Y_1=1) &amp;=&amp; &#92;sum_w P(Y_1=1|W=w) P(W=w)&#92;&#92; &amp;=&amp; &#92;sum_w P(Y_1=1|X=1,W=w) P(W=w) &#92;&#92; &amp;=&amp; &#92;sum_w P(Y=1|X=1,W=w) P(W=w) &#92;&#92; &#92;end{array} ' title='&#92;displaystyle  &#92;begin{array}{rcl}  P(Y_1=1) &amp;=&amp; &#92;sum_w P(Y_1=1|W=w) P(W=w)&#92;&#92; &amp;=&amp; &#92;sum_w P(Y_1=1|X=1,W=w) P(W=w) &#92;&#92; &amp;=&amp; &#92;sum_w P(Y=1|X=1,W=w) P(W=w) &#92;&#92; &#92;end{array} ' class='latex' /></p>
<p> and similarly, <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BP%28Y_0%3D1%29+%3D+%5Csum_w+P%28Y%3D1%7CX%3D0%2CW%3Dw%29+P%28W%3Dw%29%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{P(Y_0=1) = &#92;sum_w P(Y=1|X=0,W=w) P(W=w)}' title='{P(Y_0=1) = &#92;sum_w P(Y=1|X=0,W=w) P(W=w)}' class='latex' /> and so
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%5Cbegin%7Barray%7D%7Brcl%7D++P%28Y_1%3D1%29+%26-%26+P%28Y_0%3D1%29%5C%5C+%26+%3D+%26+%5Csum_w+%5BP%28Y%3D1%7CX%3D1%2CW%3Dw%29+-+P%28Y%3D1%7CX%3D0%2CW%3Dw%29+%5D+P%28W%3Dw%29+%5Cend%7Barray%7D+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  &#92;begin{array}{rcl}  P(Y_1=1) &amp;-&amp; P(Y_0=1)&#92;&#92; &amp; = &amp; &#92;sum_w [P(Y=1|X=1,W=w) - P(Y=1|X=0,W=w) ] P(W=w) &#92;end{array} ' title='&#92;displaystyle  &#92;begin{array}{rcl}  P(Y_1=1) &amp;-&amp; P(Y_0=1)&#92;&#92; &amp; = &amp; &#92;sum_w [P(Y=1|X=1,W=w) - P(Y=1|X=0,W=w) ] P(W=w) &#92;end{array} ' class='latex' /></p>
<p> which reduces the causal effect into a formula involving only observables. This is usually called the adjusted treatment effect. Now, if it should happen that there is only one confounding variable and it happens to be our variable <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' /> then
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%5Cbegin%7Barray%7D%7Brcl%7D++P%28Y_1%3D1%29+%26-%26+P%28Y_0%3D1%29%5C%5C+%26%3D%26+%5Csum_w+%5BP%28Y%3D1%7CX%3D1%2CZ%3Dz%29+-+P%28Y%3D1%7CX%3D0%2CZ%3Dz%29+%5D+P%28Z%3Dz%29.+%5Cend%7Barray%7D+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  &#92;begin{array}{rcl}  P(Y_1=1) &amp;-&amp; P(Y_0=1)&#92;&#92; &amp;=&amp; &#92;sum_w [P(Y=1|X=1,Z=z) - P(Y=1|X=0,Z=z) ] P(Z=z). &#92;end{array} ' title='&#92;displaystyle  &#92;begin{array}{rcl}  P(Y_1=1) &amp;-&amp; P(Y_0=1)&#92;&#92; &amp;=&amp; &#92;sum_w [P(Y=1|X=1,Z=z) - P(Y=1|X=0,Z=z) ] P(Z=z). &#92;end{array} ' class='latex' /></p>
<p> In this case we get the correct causal conclusion by conditioning on <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' />. That&#8217;s why people usually call the conditional answer correct and the unconditional statement misleading. But this is only true if <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' /> is a confounding variable and, in fact, is the only confounding variable.</p>
<p>
<p><b>2. What&#8217;s the Right Answer? </b></p>
<p><p>
Some texts make it seem as if the conditional answers (P2) and (P3) are correct and (P1) is wrong. This is not necessarily true. There are several possibilities:</p>
<p><ol>
<li> <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' /> is a confounder and is the only confounder. Then (P3) is misleading and (P1) and (P2) are correct causal statements.</p>
<li> There is no confounder. Moreover, conditioning on <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' /> <em>causes confounding.</em> Yes, contrary to popular belief, conditioning on a non-confounder can sometimes cause confounding. (I discuss this more below.) In this case, (P3) is correct and (P1) and (P2) are misleading.
<p><li> <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' /> is a confounder but there are other unobserved confounders. In this case, none of (P1), (P2) or (P3) are causally meaningful.
</ol>
<p>
Without causal language&#8212; counterfactuals or causal graphs&#8212; it is impossible to describe Simpson&#8217;s paradox correctly. For example, Lindley and Novick (1981) tried to explain Simpson&#8217;s paradox using exchangeability. It doesn&#8217;t work. This is not meant to impugn Lindley or Novick&#8212; known for their important and influential work&#8212; but just to point out that you need the right language to correctly resolve a paradox. In this case, you need the language of causation.</p>
<p>
<p><b>3. Conditioning on Nonconfounders </b></p>
<p><p>
I mentioned that conditioning on a non-confounder can actually create confounding. Pearl calls this <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BM%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{M}' title='{M}' class='latex' />-bias. (For those familar with causal graphs, this is bascially the fact that conditioning on a collider creates dependence.)</p>
<p>
To elaborate, suppose I want to estimate the causal effect
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%5Ctheta+%3D+P%28Y_1%3D1%29+-+P%28Y_1%3D0%29.+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  &#92;theta = P(Y_1=1) - P(Y_1=0). ' title='&#92;displaystyle  &#92;theta = P(Y_1=1) - P(Y_1=0). ' class='latex' /></p>
<p> If <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' /> is a confounder (and is the only confounder) then we have the identity
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%5Ctheta+%3D+%5Csum_z+%5BP%28Y%3D1%7CX%3D1%2CZ%3Dz%29-P%28Y%3D1%7CX%3D0%2CZ%3Dz%29%5D+P%28Z%3Dz%29+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  &#92;theta = &#92;sum_z [P(Y=1|X=1,Z=z)-P(Y=1|X=0,Z=z)] P(Z=z) ' title='&#92;displaystyle  &#92;theta = &#92;sum_z [P(Y=1|X=1,Z=z)-P(Y=1|X=0,Z=z)] P(Z=z) ' class='latex' /></p>
<p> that is, the causal effect is equal to the adjusted treatment effect. Let us write
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%5Ctheta+%3D+g%5B+p%28y%7Cx%2Cz%29%2Cp%28z%29%5D+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  &#92;theta = g[ p(y|x,z),p(z)] ' title='&#92;displaystyle  &#92;theta = g[ p(y|x,z),p(z)] ' class='latex' /></p>
<p> to indicate that the formula for <img src='https://s-ssl.wordpress.com/latex.php?latex=%7B%5Ctheta%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{&#92;theta}' title='{&#92;theta}' class='latex' /> is a function of the distributions <img src='https://s-ssl.wordpress.com/latex.php?latex=%7Bp%28y%7Cx%2Cz%29%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{p(y|x,z)}' title='{p(y|x,z)}' class='latex' /> and <img src='https://s-ssl.wordpress.com/latex.php?latex=%7Bp%28z%29%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{p(z)}' title='{p(z)}' class='latex' />.</p>
<p>
But, if <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' /> is not a confounder, does the equality still hold? To simplify the discussion assume that there are no other confounders. Either <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' /> is a confounder or there are no confounders. What is the correct identity for <img src='https://s-ssl.wordpress.com/latex.php?latex=%7B%5Ctheta%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{&#92;theta}' title='{&#92;theta}' class='latex' />? Is it
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%5Ctheta+%3D+P%28Y%3D1%7CX%3D1%29+-+P%28Y%3D1%7CX%3D0%29+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  &#92;theta = P(Y=1|X=1) - P(Y=1|X=0) ' title='&#92;displaystyle  &#92;theta = P(Y=1|X=1) - P(Y=1|X=0) ' class='latex' /></p>
<p> or
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%5Ctheta+%3D+%5Csum_z+%5BP%28Y%3D1%7CX%3D1%2CZ%3Dz%29-P%28Y%3D1%7CX%3D0%2CZ%3Dz%29%5D+P%28Z%3Dz%29%3F+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  &#92;theta = &#92;sum_z [P(Y=1|X=1,Z=z)-P(Y=1|X=0,Z=z)] P(Z=z)? ' title='&#92;displaystyle  &#92;theta = &#92;sum_z [P(Y=1|X=1,Z=z)-P(Y=1|X=0,Z=z)] P(Z=z)? ' class='latex' /></p>
<p> The answer (under these assumptions) is this: if <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' /> is not a confounder then the first identity is correct and if <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' /> is a confounder then the second identity is correct. In the first case <img src='https://s-ssl.wordpress.com/latex.php?latex=%7B%5Ctheta+%3D+g%5Bp%28y%7Cx%29%5D%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{&#92;theta = g[p(y|x)]}' title='{&#92;theta = g[p(y|x)]}' class='latex' />.</p>
<p>
Now, when there are no confounders, the first identity is correct. But is the second actually incorrect or will it give the same answer as the first? The answer is: sometimes they gave the same answer but it is possible to construct situations where
<p align="center"><img src='https://s-ssl.wordpress.com/latex.php?latex=%5Cdisplaystyle++%5Ctheta+%5Cneq+%5Csum_z+%5BP%28Y%3D1%7CX%3D1%2CZ%3Dz%29-P%28Y%3D1%7CX%3D0%2CZ%3Dz%29%5D+P%28Z%3Dz%29.+&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='&#92;displaystyle  &#92;theta &#92;neq &#92;sum_z [P(Y=1|X=1,Z=z)-P(Y=1|X=0,Z=z)] P(Z=z). ' title='&#92;displaystyle  &#92;theta &#92;neq &#92;sum_z [P(Y=1|X=1,Z=z)-P(Y=1|X=0,Z=z)] P(Z=z). ' class='latex' /></p>
<p> (This is something that Judea Pearl has often pointed out.) In these cases, the correct formula for the causal effect is the first one and it does not involve conditioning on <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' />. Put simply, conditioning on a non-confounder can (in certain situations) actually cause confounding.</p>
<p>
<p><b>4. Continuous Version </b></p>
<p><p>
A continuous version of Simpson&#8217;s paradox, sometimes called the ecological fallacy, looks like this:</p>
<p><a href="http://normaldeviate.files.wordpress.com/2013/06/splot.png"><img src="http://normaldeviate.files.wordpress.com/2013/06/splot.png?w=800&#038;h=800" alt="splot" width="800" height="800" class="aligncenter size-medium wp-image-439" /></a></p>
<p>
Here we see that increasing doses of drug <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BX%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{X}' title='{X}' class='latex' /> lead to poorer outcomes (left plot). But when we separate the data by sex (<img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' />) the drug shows better outcomes for higher doses for both males and females.</p>
<p>
<p><b>5. A Blog Argument Resolved? </b></p>
<p><p>
We saw that in some cases <img src='https://s-ssl.wordpress.com/latex.php?latex=%7B%5Ctheta%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{&#92;theta}' title='{&#92;theta}' class='latex' /> is a function of <img src='https://s-ssl.wordpress.com/latex.php?latex=%7Bp%28y%7Cx%2Cz%29%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{p(y|x,z)}' title='{p(y|x,z)}' class='latex' /> but in other cases it is only a function of <img src='https://s-ssl.wordpress.com/latex.php?latex=%7Bp%28y%7Cx%29%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{p(y|x)}' title='{p(y|x)}' class='latex' />. This fact led to an interesting exchange between Andrew Gelman and Judea Pearl and, later, Pearl&#8217;s student Elias Bareinboim. See, for example, <a class="snap_noshots" href="http://andrewgelman.com/2012/07/23/examples-of-the-use-of-hierarchical-modeling-to-generalize-to-new-settings/">here</a> and <a class="snap_noshots" href="http://andrewgelman.com/2012/07/16/long-discussion-about-causal-inference-and-the-use-of-hierarchical-models-to-bridge-between-different-inferential-settings/">here</a> <a class="snap_noshots" href="http://andrewgelman.com/2009/07/23/pearls_and_gelm/">and here</a>.</p>
<p>
As I recall (<b>Warning! my memory could be wrong</b>), Pearl and Bareinboim were arguing that in some cases, the correct formula for the causal effect was the first one above which does not involve conditioning on <img src='https://s-ssl.wordpress.com/latex.php?latex=%7BZ%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{Z}' title='{Z}' class='latex' />. Andrew was arguing that conditioning was a good thing to do. This led to a heated exchange.</p>
<p>
But I think they were talking past each other. When they said that one should not condition, they meant that the formula for the causal effect <img src='https://s-ssl.wordpress.com/latex.php?latex=%7B%5Ctheta%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{&#92;theta}' title='{&#92;theta}' class='latex' /> does not involve the conditional distribution <img src='https://s-ssl.wordpress.com/latex.php?latex=%7Bp%28y%7Cx%2Cz%29%7D&amp;bg=ffffff&amp;fg=000000&amp;s=0' alt='{p(y|x,z)}' title='{p(y|x,z)}' class='latex' />. Andrew was talking about conditioning as a tool in data analysis. They were each using the word conditioning but they were referring to two different things. At least, that&#8217;s how it appeared to me.</p>
<p>
<p><b>6. References </b></p>
<p><p>
For numerical examples of Simpson&#8217;s paradox, see the <a class="snap_noshots" href="http://en.wikipedia.org/wiki/Simpsons_paradox">Wikipedia article</a>.</p>
<p>
Lindley, Dennis V and Novick, Melvin R. (1981). The role of exchangeability in inference. <em>The Annals of Statistics</em>, 9, 45-58.</p>
<p>
Pearl, J. (2000). <em>Causality: models, reasoning and inference</em>, {Cambridge Univ Press}.</p>
<p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/normaldeviate.wordpress.com/431/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/normaldeviate.wordpress.com/431/" /></a> <img alt="" border="0" src="https://stats.wordpress.com/b.gif?host=normaldeviate.wordpress.com&#038;blog=36942929&%23038;post=431&%23038;subd=normaldeviate&%23038;ref=&%23038;feed=1" width="1" height="1" />
<p class="syndicated-attribution"><br />
<br />
<font color=#8c1717><b>Please comment on the article here:</b></font> <a href="https://normaldeviate.wordpress.com/2013/06/20/simpsons-paradox-explained/"><b>Normal Deviate</b></a>
<br />
<br /></p><p>The post <a href="http://www.statsblogs.com/2013/06/20/simpsons-paradox-explained/">SIMPSON’S PARADOX EXPLAINED</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p><img src="http://feeds.feedburner.com/~r/statsblogs/~4/CNub_iPTwsU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.statsblogs.com/2013/06/20/simpsons-paradox-explained/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="https://0.gravatar.com/avatar/37312c618a28c7d016d4bbe4060f23b1?s=96&amp;amp;d=identicon&amp;amp;r=G" length="" type="" />
<enclosure url="http://normaldeviate.files.wordpress.com/2013/06/splot.png?w=300" length="" type="" />
		<feedburner:origLink>http://www.statsblogs.com/2013/06/20/simpsons-paradox-explained/</feedburner:origLink></item>
		<item>
		<title>Useful Softwares for Research</title>
		<link>http://feedproxy.google.com/~r/statsblogs/~3/dU_O8TarjHk/</link>
		<comments>http://www.statsblogs.com/2013/06/20/useful-softwares-for-research/#comments</comments>
		<pubDate>Thu, 20 Jun 2013 07:15:07 +0000</pubDate>
		<dc:creator>vinuct</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[academic research]]></category>
		<category><![CDATA[Data Analysis]]></category>
		<category><![CDATA[documentation]]></category>
		<category><![CDATA[productivity]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://blog.vinux.in/?p=294</guid>
		<description><![CDATA[<p><p>I took a session yesterday on &#8216;Useful software for research&#8217; for the new batch of research students. I hope sharing the slides may be useful to other research students. The software suggestions are prepared based on my experience and my &#8230; <a href="http://blog.vinux.in/softwares-research/">Continue reading <span>&#8594;</span></a></p><p>The post <a href="http://blog.vinux.in/softwares-research/">Useful Softwares for Research</a> appeared first on <a href="http://blog.vinux.in">Fiddling with data and code</a>.</p></p><p>The post <a href="http://www.statsblogs.com/2013/06/20/useful-softwares-for-research/">Useful Softwares for Research</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p>]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution">(This article was originally published at <a href="http://blog.vinux.in">Fiddling with data and code</a>, and syndicated at <a href="http://www.statsblogs.com">StatsBlogs</a>.)
<br /></p>
<p style="text-align: justify;">I took a session yesterday on &#8216;Useful software for research&#8217; for the new batch of research students. I hope sharing the slides may be useful to other research students. The software suggestions are prepared based on my experience and my friends experience. The slide is prepared in svg (<a href="http://code.google.com/p/jessyink/">jessyink</a> template) and it is better viewed in firefox/chrome.</p>
<p style="text-align: justify;">Let me know if you have any suggestions. I could incorporate those for future sessions.</p>
<p><a href="http://viz.vinux.in/texr_soft.svg"><img title="Useful softwares for research" alt="Useful softwares for research" src="http://viz.vinux.in/texr_soft.svg" width="614" height="461" /></a></p>
<p>The post <a href="http://blog.vinux.in/softwares-research/">Useful Softwares for Research</a> appeared first on <a href="http://blog.vinux.in">Fiddling with data and code</a>.</p>
<p class="syndicated-attribution"><br />
<br />
<font color=#8c1717><b>Please comment on the article here:</b></font> <a href="http://blog.vinux.in/softwares-research/"><b>Fiddling with data and code</b></a>
<br />
<br /></p><p>The post <a href="http://www.statsblogs.com/2013/06/20/useful-softwares-for-research/">Useful Softwares for Research</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p><img src="http://feeds.feedburner.com/~r/statsblogs/~4/dU_O8TarjHk" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.statsblogs.com/2013/06/20/useful-softwares-for-research/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.statsblogs.com/2013/06/20/useful-softwares-for-research/</feedburner:origLink></item>
		<item>
		<title>Stanley Young: better p-values through randomization in microarrays</title>
		<link>http://feedproxy.google.com/~r/statsblogs/~3/QJeYLdnH97g/</link>
		<comments>http://www.statsblogs.com/2013/06/19/stanley-young-better-p-values-through-randomization-in-microarrays/#comments</comments>
		<pubDate>Wed, 19 Jun 2013 23:54:37 +0000</pubDate>
		<dc:creator>Mayo</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[microarrays]]></category>
		<category><![CDATA[p values]]></category>
		<category><![CDATA[randomization]]></category>
		<category><![CDATA[Statistical assumptions]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://errorstatistics.com/?p=10843</guid>
		<description><![CDATA[<p>I wanted to locate some uncluttered lounge space for one of the threads to emerge in comments from 6/14/13. Thanks to Stanley Young for permission to post this.   S. Stanley Young, PhD Assistant Director for Bioinformatics National Institute of Statistical Sciences Research Triangle Park, NC There is a relatively unknown problem with microarray experiments, in [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=errorstatistics.com&#38;blog=30994953&#38;post=10843&#38;subd=errorstatistics&#38;ref=&#38;feed=1" width="1" height="1" /></p><p>The post <a href="http://www.statsblogs.com/2013/06/19/stanley-young-better-p-values-through-randomization-in-microarrays/">Stanley Young: better p-values through randomization in microarrays</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p>]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution">(This article was originally published at <a href="http://errorstatistics.com">Error Statistics Philosophy » Statistics</a>, and syndicated at <a href="http://www.statsblogs.com">StatsBlogs</a>.)
<br /></p>
<p><em><span style="color:#993300;">I wanted to locate some uncluttered lounge space for one of the threads to emerge in comments <a href="http://errorstatistics.com/2013/06/14/p-values-cant-be-trusted-except-when-used-to-argue-that-p-values-cant-be-trusted/%20Thanks%20to%20Stanley%20Young%20for%20letting%20me%20post."><span style="color:#993300;">from 6/14/13. </span></a>Thanks to Stanley Young for permission to post this. </span></em></p>
<p><a href="http://errorstatistics.files.wordpress.com/2013/03/youngphoto2008.jpg"><img class="alignleft  wp-image-9753" alt="YoungPhoto2008" src="http://errorstatistics.files.wordpress.com/2013/03/youngphoto2008.jpg?w=108&#038;h=144" width="108" height="144" /></a><strong> S. Stanley Young, PhD</strong><br />
Assistant Director for Bioinformatics<br />
National Institute of Statistical Sciences<br />
Research Triangle Park, NC</p>
<blockquote><p>There is a relatively unknown problem with microarray experiments, in addition to the multiple testing problems. Samples should be randomized over important sources of variation; otherwise p-values may be flawed. Until relatively recently, the microarray samples were not sent through assay equipment in random order. Clinical trial statisticians at GSK insisted that the samples go through assay in random order. Rather amazingly the data became less messy and p-values became more orderly. The story is given here:<br />
<a href="http://blog.goldenhelix.com/?p=322"><br />
http://blog.goldenhelix.com/?p=322<br />
</a><br />
Essentially all the microarray data pre-2010 is unreliable. For another example, Mass spec data was analyzed Petrocoin. The samples were not randomized that claims with very small p-values failed to replicate. See K.A. Baggerly et al., &#8220;Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments,&#8221; Bioinformatics, 20:777-85, 2004. So often the problem is not with p-value technology, but with the design and conduct of the study.</p>
<p><a href="http://errorstatistics.files.wordpress.com/2013/06/experim_design6.jpg"><img class="aligncenter size-thumbnail wp-image-10867" alt="experim_design6" src="http://errorstatistics.files.wordpress.com/2013/06/experim_design6.jpg?w=150&#038;h=132" width="150" height="132" /></a></p></blockquote>
<p><span style="color:#993300;">Please check other comments on microarrays <em><a href="http://errorstatistics.com/2013/06/14/p-values-cant-be-trusted-except-when-used-to-argue-that-p-values-cant-be-trusted/%20Thanks%20to%20Stanley%20Young%20for%20letting%20me%20post."><span style="color:#993300;">from 6/14/13.</span></a></em></span></p>
<br />Filed under: <a href='http://errorstatistics.com/category/p-values/'>P-values</a>, <a href='http://errorstatistics.com/category/statistics/'>Statistics</a> Tagged: <a href='http://errorstatistics.com/tag/microarrays/'>microarrays</a>, <a href='http://errorstatistics.com/tag/randomization/'>randomization</a>, <a href='http://errorstatistics.com/tag/statistical-assumptions/'>Statistical assumptions</a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=errorstatistics.com&#038;blog=30994953&%23038;post=10843&%23038;subd=errorstatistics&%23038;ref=&%23038;feed=1" width="1" height="1" />
<p class="syndicated-attribution"><br />
<br />
<font color=#8c1717><b>Please comment on the article here:</b></font> <a href="http://errorstatistics.com/2013/06/19/stanley-young-better-p-values-through-randomization-in-microarrays/"><b>Error Statistics Philosophy » Statistics</b></a>
<br />
<br /></p><p>The post <a href="http://www.statsblogs.com/2013/06/19/stanley-young-better-p-values-through-randomization-in-microarrays/">Stanley Young: better p-values through randomization in microarrays</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p><img src="http://feeds.feedburner.com/~r/statsblogs/~4/QJeYLdnH97g" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.statsblogs.com/2013/06/19/stanley-young-better-p-values-through-randomization-in-microarrays/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://errorstatistics.files.wordpress.com/2013/03/youngphoto2008.jpg" length="" type="" />
<enclosure url="http://errorstatistics.files.wordpress.com/2013/06/experim_design6.jpg?w=150" length="" type="" />
<enclosure url="http://0.gravatar.com/avatar/00528d540b993fde500821ec3a02441f?s=96&amp;amp;d=http://0.gravatar.com/avatar/ad516503a11cd5ca435acc9bb6523536?s=96&amp;amp;r=PG" length="" type="" />
		<feedburner:origLink>http://www.statsblogs.com/2013/06/19/stanley-young-better-p-values-through-randomization-in-microarrays/</feedburner:origLink></item>
		<item>
		<title>Infographics and ISOTYPE and NSIs</title>
		<link>http://feedproxy.google.com/~r/statsblogs/~3/kaPszV1Z4V8/</link>
		<comments>http://www.statsblogs.com/2013/06/19/infographics-and-isotype-and-nsis/#comments</comments>
		<pubDate>Wed, 19 Jun 2013 14:25:12 +0000</pubDate>
		<dc:creator>Armin Grossenbacher</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[031 Data visualization]]></category>
		<category><![CDATA[033 Statistical literacy]]></category>
		<category><![CDATA[Alan Smith]]></category>
		<category><![CDATA[Isotype]]></category>
		<category><![CDATA[ons]]></category>
		<category><![CDATA[UNECE]]></category>
		<category><![CDATA[United Kingdom]]></category>

		<guid isPermaLink="false">http://blogstats.wordpress.com/?p=6298</guid>
		<description><![CDATA[<p>ISOTYPE Good infographics for statistical matters do not only need diagrams (like histograms or bar charts) but also lots of &#8230;<p><a href="http://blogstats.wordpress.com/2013/06/19/infographics-and-isotype-and-nsis/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blogstats.wordpress.com&#38;blog=214834&#38;post=6298&#38;subd=blogstats&#38;ref=&#38;feed=1" width="1" height="1" /></p><p>The post <a href="http://www.statsblogs.com/2013/06/19/infographics-and-isotype-and-nsis/">Infographics and ISOTYPE and NSIs</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p>]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution">(This article was originally published at <a href="http://blogstats.wordpress.com">Blog about Stats</a>, and syndicated at <a href="http://www.statsblogs.com">StatsBlogs</a>.)
<br /></p>
<h3>ISOTYPE</h3>
<p>Good infographics for statistical matters do not only need diagrams (like histograms or bar charts) but also lots of icons and symbols helping to illustrate the topic.</p>
<p>A pioneer in this field was Otto Neurath.</p>
<p><a href="http://blogstats.files.wordpress.com/2013/06/2013-06-15_neurath.jpg"><img class="aligncenter size-full wp-image-6317" alt="2013-06-15_neurath" src="http://blogstats.files.wordpress.com/2013/06/2013-06-15_neurath.jpg?w=529"   /></a></p>
<p>&#8216;ISOTYPE – the International System Of TYpographicPicture Education &#8211; &#8230; was an early infographical form, originated in the 1930s by Austrian philosopher and curator Otto Neurath “as a symbolic way of representing quantitative information via easily interpretable icons&#8221;.&#8217; <a title="Isotype" href="http://www.informationisbeautiful.net/2011/vintage-infoporn-no-1/" >(Information is beautiful</a>)</p>
<p><a href="http://blogstats.files.wordpress.com/2013/06/animals-life.jpg"><img class="aligncenter size-full wp-image-6323" alt="Animals life" src="http://blogstats.files.wordpress.com/2013/06/animals-life.jpg?w=529&#038;h=750" width="529" height="750" /></a></p>
<p>More about transforming data into visualisations also in Marie Neuraths &#8216;<em>The Transformer: Principles of Making Isotype Charts&#8217; </em></p>
<h3><a href="http://blogstats.files.wordpress.com/2013/06/2013-06-16_marieneurathtransformer.jpg"><img class="aligncenter size-medium wp-image-6319" alt="2013-06-16_marieneurathtransformer" src="http://blogstats.files.wordpress.com/2013/06/2013-06-16_marieneurathtransformer.jpg?w=290&#038;h=300" width="290" height="300" /></a>Today: ONS as an example of good visualisations for statistics</h3>
<p>Statistical Agencies use visualisations in their daily information work. During the <a title="UNECE Berlin 2013" href="http://www.unece.org/stats/documents/2013.05.dissemination.html" >UNECE Work Session on the Communication of Statistics</a> in Berlin (27-29 May 2013) Alan Smith OBE, Office for National Statistics (ONS,UK) gave a short overview and insight in this topic.</p>
<p>Two statements from his paper &#8216;<a title="ONS Visualisation" href="http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.45/2013/Session_2_-_Alan_SmithFINAL.pdf" >Data Visualisation for the Citizen User: Making Better Graphics Quicker</a>&#8216;:</p>
<p><span style="font-style:inherit;line-height:1.625;">&#8216;Data visualisation appeals to National Statistics Institutes (NSIs) because of its ability to engage users and increase the potential outreach of official statistics. But data visualisation is a broad field, with content types ranging from simple infographics through to sophisticated tools for exploratory data analysis. &#8230;&#8230;</span></p>
<p>More broadly, data visualisation offers NSIs an opportunity to exploit their expertise in formats which boost user engagement and readership. It also carries with it the highly desirable side effects of boosting relationships with the media and reputational benefits virtually everywhere else. A final note of caution, however, is that these visualisations should be centred on the expertise of the NSI, not based on a notion of style over content – others do that better.&#8217;</p>
<p><a title="UK trade interactive" href="http://www.neighbourhood.statistics.gov.uk/HTMLDocs/ITIS/index.html#UK,nat,to" ><img class="aligncenter size-medium wp-image-6326" alt="2013-06-16_UKTrade" src="http://blogstats.files.wordpress.com/2013/06/2013-06-16_uktrade.jpg?w=300&#038;h=184" width="300" height="184" /></a>.</p>
<p><a title="UK well-being interactive" href="http://www.ons.gov.uk/ons/interactive/well-being-wheel-of-measures/index.html" ><img class="aligncenter size-medium wp-image-6327" alt="2013-06-16_ukwellbeing" src="http://blogstats.files.wordpress.com/2013/06/2013-06-16_ukwellbeing.jpg?w=300&#038;h=184" width="300" height="184" /></a></p>
<br />Filed under: <a href='http://blogstats.wordpress.com/category/031-data-visualization/'>031 Data visualization</a>, <a href='http://blogstats.wordpress.com/category/033-statistical-literacy/'>033 Statistical literacy</a>, <a href='http://blogstats.wordpress.com/category/unece/'>UNECE</a>, <a href='http://blogstats.wordpress.com/category/united-kingdom/'>United Kingdom</a> Tagged: <a href='http://blogstats.wordpress.com/tag/alan-smith/'>Alan Smith</a>, <a href='http://blogstats.wordpress.com/tag/isotype/'>Isotype</a>, <a href='http://blogstats.wordpress.com/tag/ons/'>ons</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/blogstats.wordpress.com/6298/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/blogstats.wordpress.com/6298/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blogstats.wordpress.com&#038;blog=214834&%23038;post=6298&%23038;subd=blogstats&%23038;ref=&%23038;feed=1" width="1" height="1" />
<p class="syndicated-attribution"><br />
<br />
<font color=#8c1717><b>Please comment on the article here:</b></font> <a href="http://blogstats.wordpress.com/2013/06/19/infographics-and-isotype-and-nsis/"><b>Blog about Stats</b></a>
<br />
<br /></p><p>The post <a href="http://www.statsblogs.com/2013/06/19/infographics-and-isotype-and-nsis/">Infographics and ISOTYPE and NSIs</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p><img src="http://feeds.feedburner.com/~r/statsblogs/~4/kaPszV1Z4V8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.statsblogs.com/2013/06/19/infographics-and-isotype-and-nsis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://1.gravatar.com/avatar/79f40fd78ef244c0fb71d62c5f74aab9?s=96&amp;amp;d=identicon&amp;amp;r=G" length="" type="" />
<enclosure url="http://blogstats.files.wordpress.com/2013/06/2013-06-15_neurath.jpg" length="" type="" />
<enclosure url="http://blogstats.files.wordpress.com/2013/06/animals-life.jpg" length="" type="" />
<enclosure url="http://blogstats.files.wordpress.com/2013/06/2013-06-16_marieneurathtransformer.jpg?w=290" length="" type="" />
<enclosure url="http://blogstats.files.wordpress.com/2013/06/2013-06-16_uktrade.jpg?w=300" length="" type="" />
<enclosure url="http://blogstats.files.wordpress.com/2013/06/2013-06-16_ukwellbeing.jpg?w=300" length="" type="" />
		<feedburner:origLink>http://www.statsblogs.com/2013/06/19/infographics-and-isotype-and-nsis/</feedburner:origLink></item>
		<item>
		<title>“Behind a cancer-treatment firm’s rosy survival claims”</title>
		<link>http://feedproxy.google.com/~r/statsblogs/~3/1-9zsF5ok44/</link>
		<comments>http://www.statsblogs.com/2013/06/19/behind-a-cancer-treatment-firms-rosy-survival-claims/#comments</comments>
		<pubDate>Wed, 19 Jun 2013 13:55:10 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Public Health]]></category>

		<guid isPermaLink="false">http://andrewgelman.com/?p=18690</guid>
		<description><![CDATA[<p><p>Brett Keller points to a recent news article by Sharon Begley and Robin Respaut: A lot of doctors, hospitals and other healthcare providers in the United States decline to treat people who can&#8217;t pay, or have inadequate insurance, among other reasons. What sets CTCA [Cancer Treatment Centers of America] apart is that rejecting certain patients [...]</p><p>The post <a href="http://andrewgelman.com/2013/06/19/behind-a-cancer-treatment-firms-rosy-survival-claims/">&#8220;Behind a cancer-treatment firm&#8217;s rosy survival claims&#8221;</a> appeared first on <a href="http://andrewgelman.com">Statistical Modeling, Causal Inference, and Social Science</a>.</p></p><p>The post <a href="http://www.statsblogs.com/2013/06/19/behind-a-cancer-treatment-firms-rosy-survival-claims/">“Behind a cancer-treatment firm’s rosy survival claims”</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p>]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution">(This article was originally published at <a href="http://andrewgelman.com">Statistical Modeling, Causal Inference, and Social Science</a>, and syndicated at <a href="http://www.statsblogs.com">StatsBlogs</a>.)
<br /></p>
<p>Brett Keller points to a recent <a href="http://www.reuters.com/article/2013/03/06/us-usa-cancer-ctca-idUSBRE9250L820130306">news article</a> by Sharon Begley and Robin Respaut:</p>
<blockquote><p>A lot of doctors, hospitals and other healthcare providers in the United States decline to treat people who can&#8217;t pay, or have inadequate insurance, among other reasons. What sets CTCA [Cancer Treatment Centers of America] apart is that rejecting certain patients and, even more, culling some of its patients from its survival data lets the company tout in ads and post on its website patient outcomes that look dramatically better than they would if the company treated all comers. These are the rosy survival numbers . . .</p></blockquote>
<p>Details:</p>
<blockquote><p>CTCA reports on its website that the percentage of its patients who are alive after six months, a year, 18 months and longer regularly tops national figures. For instance, 60 percent of its non-small-cell lung cancer patients are alive at six months, CTCA says, compared to 38 percent nationally. And 64 percent of its prostate cancer patients are alive at three years, versus 38 percent nationally.</p>
<p>Such claims are misleading, according to nine experts in cancer and medical statistics whom Reuters asked to review CTCA&#8217;s survival numbers and its statistical methodology.</p>
<p>The experts were unanimous that CTCA&#8217;s patients are different from the patients the company compares them to, in a way that skews their survival data. It has relatively few elderly patients, even though cancer is a disease of the aged. It has almost none who are uninsured or covered by Medicaid &#8211; patients who tend to die sooner if they develop cancer and who are comparatively numerous in national statistics. . . . Accepting only selected patients and calculating survival outcomes from only some of them &#8220;is a huge bias and gives an enormous advantage to CTCA,&#8221; said biostatistician Donald Berry . . .</p></blockquote>
<p>What I really like about this article is how it combines quantitative information with qualitative interviews:</p>
<blockquote><p>Carolyn Holmes, a former CTCA oncology information specialist in Tulsa, Oklahoma, said she and others routinely tried to turn away people who &#8220;were the wrong demographic&#8221; because they were less likely to have an insurance policy that CTCA preferred. Holmes said she would try to &#8220;let those people down easy.&#8221; . . .</p>
<p>The ads also challenge viewers to &#8220;compare our treatment results to national averages.&#8221; Doing so, on the company&#8217;s website, shows that CTCA&#8217;s reported survival outcomes regularly beat those averages.</p>
<p>Experts in medical data who reviewed CTCA&#8217;s claims for Reuters say those claims are suspect because of what they called deviations from best practices in statistics &#8211; in particular, comparing its carefully selected patients to those nationwide.</p>
<p>&#8220;It makes their data look better than it is,&#8221; said Robert Strawderman, professor and chairman of biostatistics at the University of Rochester. &#8220;So the comparisons used to suggest that CTCA has better survival rates are pretty meaningless.&#8221;</p></blockquote>
<p>This is horrible!  Of course, we do something similar at Columbia&#8212;we accept only the very best students&#8212;but that&#8217;s a bit different, I think, in that we are providing an educational experience that is designed to work with the most-prepared students.  In contrast, I doubt that there&#8217;s any particular reason for  CTCA to restrict its cancer treatments to the least-sick patients.</p>
<p>The post <a href="http://andrewgelman.com/2013/06/19/behind-a-cancer-treatment-firms-rosy-survival-claims/">&#8220;Behind a cancer-treatment firm&#8217;s rosy survival claims&#8221;</a> appeared first on <a href="http://andrewgelman.com">Statistical Modeling, Causal Inference, and Social Science</a>.</p>
<p class="syndicated-attribution"><br />
<br />
<font color=#8c1717><b>Please comment on the article here:</b></font> <a href="http://andrewgelman.com/2013/06/19/behind-a-cancer-treatment-firms-rosy-survival-claims/"><b>Statistical Modeling, Causal Inference, and Social Science</b></a>
<br />
<br /></p><p>The post <a href="http://www.statsblogs.com/2013/06/19/behind-a-cancer-treatment-firms-rosy-survival-claims/">“Behind a cancer-treatment firm’s rosy survival claims”</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p><img src="http://feeds.feedburner.com/~r/statsblogs/~4/1-9zsF5ok44" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.statsblogs.com/2013/06/19/behind-a-cancer-treatment-firms-rosy-survival-claims/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.statsblogs.com/2013/06/19/behind-a-cancer-treatment-firms-rosy-survival-claims/</feedburner:origLink></item>
		<item>
		<title>Hard work pays off</title>
		<link>http://feedproxy.google.com/~r/statsblogs/~3/JAUl1QJbxgU/</link>
		<comments>http://www.statsblogs.com/2013/06/19/hard-work-pays-off/#comments</comments>
		<pubDate>Wed, 19 Jun 2013 12:01:00 +0000</pubDate>
		<dc:creator>junkcharts</dc:creator>
				<category><![CDATA[Data Visualization]]></category>
		<category><![CDATA[Animation]]></category>
		<category><![CDATA[Bubble chart]]></category>
		<category><![CDATA[Crime]]></category>
		<category><![CDATA[Current Affairs]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[interactive]]></category>
		<category><![CDATA[Map]]></category>
		<category><![CDATA[time series]]></category>

		<guid isPermaLink="false">http://www.statsblogs.com/?guid=acc2fd3cb51c69990d60bbba1149fca7</guid>
		<description><![CDATA[<p>At the NY Tech Meetup, Andrei Scheinkman showed off some work his team at Huffington Post did relating to gun violence in America. Interactive version is here. The animation shows day by day, where the victims of gun violence were...</p><p>The post <a href="http://www.statsblogs.com/2013/06/19/hard-work-pays-off/">Hard work pays off</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p>]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution">(This article was originally published at <a href="http://junkcharts.typepad.com/junk_charts/">Junk Charts</a>, and syndicated at <a href="http://www.statsblogs.com">StatsBlogs</a>.)
<br /></p>

<div><p>At the NY Tech Meetup, Andrei Scheinkman showed off some work his team at Huffington Post did relating to gun violence in America.</p>
<p>
<a class="asset-img-link" href="http://junkcharts.typepad.com/.a/6a00d8341e992c53ef0191038224fd970c-pi" style="display: inline;"><img alt="Huff_gunviolencemap" class="asset  asset-image at-xid-6a00d8341e992c53ef0191038224fd970c" src="http://junkcharts.typepad.com/.a/6a00d8341e992c53ef0191038224fd970c-450wi" style="width: 450px;" title="Huff_gunviolencemap"></a></p>
<p> </p>
<p>Interactive version is <a href="http://data.huffingtonpost.com/2013/03/gun-deaths"  title="link to Huffington Post">here</a>. The animation shows day by day, where the victims of gun violence were located. The table below contains the details of each victim, and links to the news story covering the event.</p>
<p>***</p>
<p>What is not seen on the chart is even more impressive. Andrei described how they looked around for databases that would provide them the raw materials for creating this chart but no timely source exists. This means that a team of 15 (if I heard correctly) spent a month or so manually collecting all the data on a spreadsheet. </p>
<p>It&#039;s also the reason why they cannot continue the map indefinitely, as people have other things to do.</p>
<p>Andrei also contrasted this visualization with a text article that describes the state of gun violence in words. You guessed it, the visual presentation is hands-down more compelling.</p></div>

<p class="syndicated-attribution"><br />
<br />
<font color=#8c1717><b>Please comment on the article here:</b></font> <a href="http://junkcharts.typepad.com/junk_charts/2013/06/hard-work-pays-off.html"><b>Junk Charts</b></a>
<br />
<br /></p><p>The post <a href="http://www.statsblogs.com/2013/06/19/hard-work-pays-off/">Hard work pays off</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p><img src="http://feeds.feedburner.com/~r/statsblogs/~4/JAUl1QJbxgU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.statsblogs.com/2013/06/19/hard-work-pays-off/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.statsblogs.com/2013/06/19/hard-work-pays-off/</feedburner:origLink></item>
		<item>
		<title>Macros and loops in the SAS/IML language</title>
		<link>http://feedproxy.google.com/~r/statsblogs/~3/3dP285aJQUA/</link>
		<comments>http://www.statsblogs.com/2013/06/19/macros-and-loops-in-the-sasiml-language/#comments</comments>
		<pubDate>Wed, 19 Jun 2013 09:26:37 +0000</pubDate>
		<dc:creator>Rick Wicklin</dc:creator>
				<category><![CDATA[SAS]]></category>
		<category><![CDATA[SAS Programming]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blogs.sas.com/content/iml/?p=8494</guid>
		<description><![CDATA[<p>I am not a big fan of the macro language, and I try to avoid it when I write SAS/IML programs. I find that the programs with many macros are hard to read and debug. Furthermore, the SAS/IML language supports loops and indexing, so many macro constructs can be replaced [...]</p><p>The post <a href="http://www.statsblogs.com/2013/06/19/macros-and-loops-in-the-sasiml-language/">Macros and loops in the SAS/IML language</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p>]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution">(This article was originally published at <a href="http://blogs.sas.com/content/iml">The DO Loop</a>, and syndicated at <a href="http://www.statsblogs.com">StatsBlogs</a>.)
<br /></p>
<p>
I am not a big fan of the macro language, and I try to avoid it when I write SAS/IML programs. I find that the programs with many macros are hard to read and debug. Furthermore, the SAS/IML language supports loops and indexing, so many macro constructs can be replaced by standard SAS/IML syntax.
</p>
<p>
Nevertheless, many SAS customers use macro constructs as part of their daily SAS programming tasks, and that practice often continues when they write SAS/IML programmers.  A customer recently asked a question about the macro language that required knowledge of the way that macro variables are handled within a SAS/IML loop. This post shares my response.
</p><p>
Here's the crux of the customer's question. Run the following SAS/IML program and see if you can understand why it behaves as it does:
</p>


<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">proc iml;
i = 7;
call symputx(&quot;j&quot;, i);    /* 1. Put value of i into macro variable j */
y1 = &amp;j;                 /* 2. Assign y1 the value of &amp;j            */
print y1;                /* success! */
&nbsp;
y = j(1,4,.);
do i = 1 to ncol(y);     /* 3. Start processing the DO block of statements */
   call symputx(&quot;j&quot;, i); /* 4. Put value of i into macro variable j */
   y[i] = &amp;j;            /* 5. Hmmmm, what does this do inside the loop? */
end;
print y;                 /* Not what you might expect? */</pre></div></div>




<img src="http://blogs.sas.com/content/iml/files/2013/06/macroloop.png" alt="" width="76" height="128" class="aligncenter size-full wp-image-8502" />

<p>
As you can see from the output, the first use of the macro variable (outside the DO loop), works as expected. But the second does not. The customer wanted to know why the elements of <tt>y</tt> are not 
set to 1, 2, 3, 4 within the loop.
</p><p>
The key point to remember about macro variables is that SAS code never sees them. Macro variables are evaluated by the macro preprocessor <em>at parse time</em>, not at run time. The SAS/IML code never sees &amp;j, only the constant value that the preprocessor substitutes for &amp;j.
</p><p>
It is also important to remember that PROC IML is an interactive procedure. (The "I" in IML stands for interactive!) Each statement or block of statements is parsed as it is encountered, as opposed to the DATA step, which parses the entire program before beginning execution.
</p><p>
Let's examine the program step-by-step to understand why the first construct works but the second does not. The following steps refer to the numbers in the program comments:
</p>
<ol>
<li>The value of the SAS/IML scalar <tt>i</tt> is copied (as text) into the macro variable <tt>j</tt>.</li>
<li>The statement is encountered. The value of the macro variable <tt>j</tt> is substituted by the macro preprocesser. Then the statement is executed. The SAS/IML variable <tt>y1</tt> is assigned to the value 7.</li>
<li>A DO loop is encountered by the SAS/IML parser. The parser finds the matching END statement and proceeds to parse the <em>entire</em> body of the loop in order to check for syntax errors.  This parsing phase occurs exactly one time.  Because the block of statements contain a macro variable, the macro preprocessor substitutes the value of the macro variable <tt>j</tt>, which is 7.
</li>
<li>For each iteration, the value of the SAS/IML scalar <tt>i</tt> is copied (as text) into the macro variable <tt>j</tt>.</li>
<li>For each iteration, the <em>i</em>th element of the <tt>y</tt> vector is assigned the value 7.  In particular, this statement does not contain a reference to the macro varible <tt>j</tt>.</li>
</ol>

<p>
To the casual reader of the program, it looks like &amp;j will have a different value during each step of the iteration. But but it doesn't. The expression &amp;j is resolved at <em>parse time</em>. SAS/IML parses the entire body of the DO loop once, before any execution occurs, and at parse time the expression &amp;j is 7.
</p>
<p>
There is a way to get what the customer wants. The <a href="http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#n08h8unph3lz0un1ap3kqru4iym0.htm">SYMGET function</a> retrieves the value of a macro variable at run time. Therefore the following statements fill the vector <tt>y</tt> with the values 1 through 4:
</p>


<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">do i = 1 to ncol(y);
   call symputx(&quot;j&quot;, i);
   y[i] = num(symget(&quot;j&quot;));  /* get macro value at run time */
end;
print y;                     /* Yes! This is what we want! */</pre></div></div>




<img src="http://blogs.sas.com/content/iml/files/2013/06/macroloop2.png" alt="" width="77" height="59" class="aligncenter size-full wp-image-8501" />

<p>
For me, this blog post emphasizes three facts:
</p>
<ul>
<li>Always remember that macro substitution is done by a preprocessor, which operates at parse time.</li>
<li>The SAS/IML language parses an entire block of statements (between the DO and END statements) one time before executing the block.</li>
<li>Mixing macro code and SAS/IML statements can be confusing and hard to debug.  When you have the option, use SAS/IML language features instead of relying on macro language constructs.</li>
</ul>
<div class="entry-utility"><span class="tag-links">tags: <a href="http://blogs.sas.com/content/iml/tag/sasprogramming/">SAS Programming</a></span></div><div class="feedflare">
<a href="http://feeds.feedburner.com/~ff/TheDoLoop?a=8bjPF-m8ENc:JxNajsARg0Q:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/TheDoLoop?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/TheDoLoop?a=8bjPF-m8ENc:JxNajsARg0Q:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/TheDoLoop?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/TheDoLoop?a=8bjPF-m8ENc:JxNajsARg0Q:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/TheDoLoop?i=8bjPF-m8ENc:JxNajsARg0Q:gIN9vFwOqvQ" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/TheDoLoop?a=8bjPF-m8ENc:JxNajsARg0Q:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/TheDoLoop?i=8bjPF-m8ENc:JxNajsARg0Q:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/TheDoLoop?a=8bjPF-m8ENc:JxNajsARg0Q:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/TheDoLoop?i=8bjPF-m8ENc:JxNajsARg0Q:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/TheDoLoop?a=8bjPF-m8ENc:JxNajsARg0Q:l6gmwiTKsz0"><img src="http://feeds.feedburner.com/~ff/TheDoLoop?d=l6gmwiTKsz0" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/TheDoLoop?a=8bjPF-m8ENc:JxNajsARg0Q:TzevzKxY174"><img src="http://feeds.feedburner.com/~ff/TheDoLoop?d=TzevzKxY174" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/TheDoLoop/~4/8bjPF-m8ENc" height="1" width="1"/>
<p class="syndicated-attribution"><br />
<br />
<font color=#8c1717><b>Please comment on the article here:</b></font> <a href="http://feedproxy.google.com/~r/TheDoLoop/~3/8bjPF-m8ENc/"><b>The DO Loop</b></a>
<br />
<br /></p><p>The post <a href="http://www.statsblogs.com/2013/06/19/macros-and-loops-in-the-sasiml-language/">Macros and loops in the SAS/IML language</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p><img src="http://feeds.feedburner.com/~r/statsblogs/~4/3dP285aJQUA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.statsblogs.com/2013/06/19/macros-and-loops-in-the-sasiml-language/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.statsblogs.com/2013/06/19/macros-and-loops-in-the-sasiml-language/</feedburner:origLink></item>
		<item>
		<title>Le Monde puzzle [#825]</title>
		<link>http://feedproxy.google.com/~r/statsblogs/~3/q5nSbCeo29o/</link>
		<comments>http://www.statsblogs.com/2013/06/18/le-monde-puzzle-825/#comments</comments>
		<pubDate>Tue, 18 Jun 2013 22:13:43 +0000</pubDate>
		<dc:creator>xi'an</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[books]]></category>
		<category><![CDATA[heapsort]]></category>
		<category><![CDATA[Kids]]></category>
		<category><![CDATA[Le Monde]]></category>
		<category><![CDATA[mathematical puzzle]]></category>
		<category><![CDATA[pictures]]></category>
		<category><![CDATA[pracma]]></category>
		<category><![CDATA[quicksort]]></category>

		<guid isPermaLink="false">http://xianblog.wordpress.com/?p=20927</guid>
		<description><![CDATA[<p>Yet another puzzle which first part does not require R programming, even though it is a programming question in essence: Given five real numbers x1,&#8230;,x5, what is the minimal number of pairwise comparisons needed to rank them? Given 33 real numbers, what is the minimal number of pairwise comparisons required to find the three largest ones? [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xianblog.wordpress.com&#38;blog=5051449&#38;post=20927&#38;subd=xianblog&#38;ref=&#38;feed=1" width="1" height="1" /></p><p>The post <a href="http://www.statsblogs.com/2013/06/18/le-monde-puzzle-825/">Le Monde puzzle [#825]</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p>]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution">(This article was originally published at <a href="http://xianblog.wordpress.com">Xi'an's Og » R</a>, and syndicated at <a href="http://www.statsblogs.com">StatsBlogs</a>.)
<br /></p>
<p style="text-align:justify;"><strong><img class="aligncenter" style="margin-top:4px;margin-bottom:4px;" alt="" src="https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcRsQd3v7efgSlmsNPKddzl-oQS9fZOF8a68V6KPmf5dnmI_EJ66rw" width="273" height="185" />Y</strong>et another puzzle which first part does not require R programming, even though it is a programming question in essence:</p>
<blockquote>
<p style="text-align:justify;"><em>Given five real numbers x<sub>1</sub>,&#8230;,x<sub>5</sub>, what is the minimal number of </em><em>pairwise comparisons needed to rank them? Given 33 real numbers, what is the minimal number of pairwise comparisons required to find the three largest ones?</em></p>
</blockquote>
<p style="text-align:justify;"><strong>I</strong> do not see a way out of considering the first question as the quickest possible sorting of a real sample. Using either <a href="http://en.wikipedia.org/wiki/Quicksort">quicksort</a> or <a href="http://en.wikipedia.org/wiki/Heapsort">heapsort</a>, I achieve sorting the 5 numbers in exactly 6 comparisons for any order of the initial sample. <em>(Now, there may be an even faster way based on comparing partial sums first&#8230; I just do not see how!)</em> <strong>Update:</strong> Oops! I realised I made my reasoning based on a reasonable case, the correct answer is indeed 7!!!<em><br />
</em></p>
<p style="text-align:justify;"><strong><a href="http://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Max-Heap.svg/240px-Max-Heap.svg.png"><img class="aligncenter" alt="" src="http://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Max-Heap.svg/240px-Max-Heap.svg.png" width="240" height="178" /></a>F</strong>or the second part, let us start from the remark that 32 comparisons are needed to find the largest number, then at most 31 for the second largest, and at most 30 for the third largest (since we can take advantage of the partial ordering resulting from the determination of the largest number). This is poor. If I instead use a heap algorithm, I need O(n log{n}) comparisons to build this binary tree whose parents are always larger than their siblings, as in the above example. <em>(I can produce a sort of heap structure, although non-binary, in an average 16&#215;2.5=40 steps. And a maximum 16&#215;3=48 steps.)</em> The resulting tree provides the largest number (100 in the above example) and at least the second largest number (36 in the above). To get the third largest number, I first need a comparison between the one-before-last terms of the heap (19 vs. 36 in the above), and one or two extra comparisons (25 vs. 19 and maybe 25 vs. 1 in the above). <em>(This would induce an average 1.5 extra comparison and a maximum 2 extra comparisons, resulting in a total of 41.5 average and 49.5 maximum comparisons with my sub-optimal heap construction.)</em>  Once again, using comparisons of sums may help in speeding up the process, for instance comparing numbers by groups of 3, but I did not pursue this solution&#8230;</p>
<p style="text-align:justify;"><strong><a href="http://xianblog.files.wordpress.com/2013/06/quick3.jpg"><img class="aligncenter size-full wp-image-20934" title="histogram of the number of comparisons involved in the quick3 execution, figure obtained with the par(bg=&quot;black&quot;,col.axis=&quot;wheat&quot;,col.lab=&quot;wheat&quot;) options and 10000 simulations" alt="quick3" src="http://xianblog.files.wordpress.com/2013/06/quick3.jpg?w=450&#038;h=450" width="450" height="450" /></a>I</strong>f instead I try to adapt quicksort to this problem, I can have a dynamic pivot that always keep at most two terms above it, providing the three numbers as a finale result. Here is an R code to check its performances:</p>
<pre class="brush: r; gutter: false; title: ; notranslate">
quick3=function(x){

comp=0
i=1
lower=upper=NULL
pivot=x[1]

for (i in 2:length(x)){

if (x[i]&lt;pivot){ lower=c(lower,x[i])
}else{

upper=c(upper,x[i])
if (length(upper)&gt;1) comp=comp+1}

comp=comp+1

if (length(upper)==3){

pivot=min(upper)
upper=sort(upper)[-1]
}}

if (length(upper)&lt;3) upper=c(pivot,upper)

list(top=upper,cost=comp)
}
</pre>
<p style="text-align:justify;">When running this R code on 10⁴ random sequences of 33 terms, I obtained the following statistics, I obtained the following statistics on the computing costs</p>
<pre class="brush: r; gutter: false; title: ; notranslate">&lt;/p&gt;
&lt;p style=&quot;text-align: justify;&quot;&gt;&gt; summary(costs)
Min.   1st Qu.  Median  Mean    3rd Qu.  Max.
32.00   36.00   38.00   37.89   40.00   49.00
</pre>
<p style="text-align:justify;">and the associated histogram represented above. Interestingly, the minimum is the number of comparisons needed to produce the maximum!</p>
<p style="text-align:justify;"><strong>R</strong>eading the solution in <a href="http://www.lemonde.fr/defis-mathematiques/">Le Monde</a> in the train to <a title="morning run" href="http://xianblog.wordpress.com/2013/02/07/morning-run/">London</a> and <a title="Bayes 250 in London" href="http://xianblog.wordpress.com/2013/03/20/bayes-250-in-london/">Bayes 250</a>, I discovered that the author suggests a 7 comparison solution in the first case <strong><em>[compare A and B, C and D = 2 comparisons; if A&gt;B and C&gt;D, compare A and C and get, say A&gt;C&gt;D = 1 comparison; insert E in this ordering by comparing with C and then A or D = 2 comparisons, obtaining e.g. A&gt;E&gt;C&gt;D; conclude by inserting B by first comparing with C then with D or E = 2 comparisons]</em></strong> and a 41 comparison solution in the second case. <del>He is (or I am!)</del> I was clearly mistaken in the first case while the quick3 algorithm does 41 or less most of the time (90%)  but not always.</p>
<br />Filed under: <a href='http://xianblog.wordpress.com/category/books/'>Books</a>, <a href='http://xianblog.wordpress.com/category/kids/'>Kids</a>, <a href='http://xianblog.wordpress.com/category/pictures/'>pictures</a>, <a href='http://xianblog.wordpress.com/category/statistics/r-statistics/'>R</a> Tagged: <a href='http://xianblog.wordpress.com/tag/heapsort/'>heapsort</a>, <a href='http://xianblog.wordpress.com/tag/le-monde/'>Le Monde</a>, <a href='http://xianblog.wordpress.com/tag/mathematical-puzzle/'>mathematical puzzle</a>, <a href='http://xianblog.wordpress.com/tag/pracma/'>pracma</a>, <a href='http://xianblog.wordpress.com/tag/quicksort/'>quicksort</a>, <a href='http://xianblog.wordpress.com/tag/r/'>R</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/xianblog.wordpress.com/20927/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/xianblog.wordpress.com/20927/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=xianblog.wordpress.com&#038;blog=5051449&%23038;post=20927&%23038;subd=xianblog&%23038;ref=&%23038;feed=1" width="1" height="1" />
<p class="syndicated-attribution"><br />
<br />
<font color=#8c1717><b>Please comment on the article here:</b></font> <a href="http://xianblog.wordpress.com/2013/06/19/le-monde-puzzle-825/"><b>Xi'an's Og » R</b></a>
<br />
<br /></p><p>The post <a href="http://www.statsblogs.com/2013/06/18/le-monde-puzzle-825/">Le Monde puzzle [#825]</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p><img src="http://feeds.feedburner.com/~r/statsblogs/~4/q5nSbCeo29o" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.statsblogs.com/2013/06/18/le-monde-puzzle-825/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://xianblog.files.wordpress.com/2013/06/quick3.jpg" length="" type="" />
<enclosure url="http://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Max-Heap.svg/240px-Max-Heap.svg.png" length="" type="" />
<enclosure url="https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcRsQd3v7efgSlmsNPKddzl-oQS9fZOF8a68V6KPmf5dnmI_EJ66rw" length="" type="" />
<enclosure url="http://2.gravatar.com/avatar/ba847ef5873101769043f6260d57282a?s=96&amp;amp;d=http://s0.wp.com/i/mu.gif" length="" type="" />
		<feedburner:origLink>http://www.statsblogs.com/2013/06/18/le-monde-puzzle-825/</feedburner:origLink></item>
		<item>
		<title>Creating a wordpress single or multisite install using cloudformation and ansible</title>
		<link>http://feedproxy.google.com/~r/statsblogs/~3/9H4sl32xlwA/</link>
		<comments>http://www.statsblogs.com/2013/06/18/creating-a-wordpress-single-or-multisite-install-using-cloudformation-and-ansible/#comments</comments>
		<pubDate>Tue, 18 Jun 2013 22:00:00 +0000</pubDate>
		<dc:creator>Vik Paruchuri</dc:creator>
				<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://vikparuchuri.com/blog/creating-a-wordpress-single-or-multisite-install-using-cloudformation-and-ansible</guid>
		<description><![CDATA[<p>Intro

I recently had to create some sites quickly.  After evaluating a few options, setting up a wordpress multisite seemed like a good option.

In order to make this change, I setup a wordpress multisite installation with domain mapping.  A multisite...</p><p>The post <a href="http://www.statsblogs.com/2013/06/18/creating-a-wordpress-single-or-multisite-install-using-cloudformation-and-ansible/">Creating a wordpress single or multisite install using cloudformation and ansible</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p>]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution">(This article was originally published at <a href="http://vikparuchuri.com">Vik's Blog</a>, and syndicated at <a href="http://www.statsblogs.com">StatsBlogs</a>.)
<br /></p>
<h2>Intro</h2>

<p>I recently had to create some sites quickly.  After evaluating a few options, setting up a wordpress multisite seemed like a good option.</p>

<p>In order to make this change, I setup a wordpress multisite installation with domain mapping.  A multisite installation is when one wordpress install lets you run multiple websites.  I like multisite because it enables me to flexibly manage multiple websites with less duplication of effort than a single wordpress installation for each website would allow me.</p>

<p>Wordpress multisite normally works with subdomains (ie mail.google.com), but I combined the multisite mode with <a href="http://wordpress.org/plugins/wordpress-mu-domain-mapping/">domain mapping</a> to enable top-level domains to be used for each sub-site (so, we have vikparuchuri.com).</p>

<p>I am not a sysadmin by trade (what am I by trade, anyways?), but some new tools make it really simple to build repeatable configurations.  I can&rsquo;t stress the repeatable part enough.  If you setup an installation &ldquo;by hand&rdquo; and run a lot of manual system commands, it will be extremely hard to reproduce if you need to run another site, or if you want to backup and re-initialize your site with different hardware.  It may take more initial work to make something repeatable, but it is well worth doing.</p>

<p>Feel free to contact me if you have any questions about this process.  Note that these steps have been tested only on Ubuntu 12.10.  It will most likely work with windows, but some steps may need to be modified.</p>

<h2>High level questions</h2>

<h3>What will this post help me do?</h3>

<ul>
<li>Setup your own wordpress server for your personal blog.</li>
<li>Setup your own wordpress server for several of your own websites.</li>
<li>Setup a wordpress server that allows users to register their own sites automatically.</li>
</ul>


<h3>Why setup your own wordpress server?</h3>

<ul>
<li>Often cheaper than using a hosted solution.</li>
<li>You get more control over your data.</li>
<li>Can install plugins/themes as you want.</li>
<li>Quicker to make system-level changes such as package installation, switching domains, etc.</li>
<li>Easier and more customizable scaling.</li>
</ul>


<h3>Why use multisite?</h3>

<p>These instructions can be used to setup a single site or a multisite, but there are some advantages to using a multisite rather than setting up several single sites.</p>

<ul>
<li>Higher upfront time to setup, but much easier to setup each individual site after that.</li>
<li>Central plugin and backup management.</li>
<li>Easier to scale and manage.</li>
</ul>


<h2>Technologies Used</h2>

<p>To enable the configuration steps to be repeatable, I used the following technologies:</p>

<ul>
<li><a href="http://aws.amazon.com/ec2/">EC2</a> is an Amazon service that lets you create or shutdown servers on an on demand basis.  It is great for deploying websites in seconds.</li>
<li><a href="http://aws.amazon.com/route53/">Route 53</a> Route 53 is Amazon&rsquo;s DNS service.  It makes it incredibly simple to setup DNS records.</li>
<li><a href="http://aws.amazon.com/cloudformation/">CloudFormation</a> is another Amazon service that lets you use templates to create resource &ldquo;stacks&rdquo;.  These resources can be servers (EC2), databases (RDS), and so on.  This lets you easily create and manage configurable resources.  In our case, it allows us to create our server and associated resources very easily.</li>
<li><a href="http://www.ansibleworks.com/">Ansible</a> is an open source project that allows for idempotent commands to be run by sshing into a server or group of servers.  In our case, it allows us to very easily configure our wordpress server.</li>
</ul>


<h2>Getting the code</h2>

<p>To get started, we first need to grab the wordpress-deployment repository:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">git</span> <span class="n">clone</span> <span class="n">git</span><span class="nd">@github.com</span><span class="p">:</span><span class="n">VikParuchuri</span><span class="o">/</span><span class="n">wp</span><span class="o">-</span><span class="n">deployment</span><span class="o">.</span><span class="n">git</span>
</span></code></pre></td></tr></table></div></figure>


<p>Git is a version control system, and github is a social coding tool.  If you haven&rsquo;t used git before, <a href="https://help.github.com/articles/set-up-git">github</a> has some good tutorials.</p>

<p>The git clone command will make a new directory where you cloned it called wp-deployment.</p>

<h2>Starting the cloudformation stack</h2>

<p>After you have cloned the repository, you will be able to find the cloudformation template at <code>wp-deployment/cloudformation/wordpress.json</code> .</p>

<p>We will now need to login to an existing AWS account to use this template.  See <a href="http://aws.amazon.com/">Amazon AWS</a> for details on making an account.</p>

<p>One you login, you should be at the <a href="https://console.aws.amazon.com/console/home">management console</a>.  The management console allows you to interact with AWS resources.  In this case, we care about the <a href="https://console.aws.amazon.com/cloudformation/home">cloudformation section</a>.</p>

<p>Once you are in the cloudformation console, you will be able to click on &ldquo;create stack&rdquo;.  <strong>Note that creating a stack will cost money.</strong>  You can use <a href="http://calculator.s3.amazonaws.com/calc5.html">this tool</a> to estimate your cost.</p>

<p><img src="http://vikparuchuri.com/blog/images/wordpress-install/cf_console.png" alt="create_stack" /></p>

<p>After clicking on &ldquo;create stack&rdquo;, you will need to choose to upload the wordpress.json template.</p>

<p><img src="http://vikparuchuri.com/blog/images/wordpress-install/cf_template.png" alt="upload template" /></p>

<p>Once you fill in the stack name and click &ldquo;next&rdquo;, you will come to the &ldquo;specify parameters&rdquo; screen.  This is where the extensibility of cloudformation comes to the fore.  This template lets you specify a few different variables.</p>

<ul>
<li>SSHLocation &ndash; If you want to restrict SSH access, set this.  Otherwise, leave it at the default.</li>
<li>EnvironmentTag &ndash; If you want a stage or sandbox instance to test with, change this.  Otherwise, leave at prod.  This will not change anything with the instance, it will only change how the instance is tagged.  Ansible finds instances by how they are tagged (more on this later), so this is only if you want multiple types of servers with different configurations.</li>
<li>ApplicationTag &ndash; Again, this is only a tag, so it only affects how the instance is discovered.  I would recommend leaving it as the default.</li>
<li>KeyName &ndash; In your EC2 setup, you will need to specify keypairs that can SSH into your instances.  See the <a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/generating-a-keypair.html">user guide</a> for more info on this.  The KeyName is the name of the keypair that you want to use.  You will need to have it registered with EC2, and the private key will need to be on your computer.</li>
<li>InstanceType &ndash; How large of an <a href="http://aws.amazon.com/ec2/instance-types/#instance-details">EC2 instance</a> you want to create.  I recommend not using the t1.micro, as it has not been fully tested (but if you are adventurous, feel free to test!).</li>
</ul>


<p>Once you have set these, you will need to check the box that says &ldquo;I acknowledge that this template may created IAM resources.&rdquo;  <a href="http://aws.amazon.com/iam/">IAM resources</a> are an Amazon feature that allow for multiple user roles with various access permissions.  In this case, we are creating a user for our wordpress server that has limited access.</p>

<p>You can now hit &ldquo;continue&rdquo; through the next two screens (add tags and review).  Amazon will now get to work creating your stack!</p>

<h2>What is in the stack?</h2>

<p>While the stack is being created, let&rsquo;s talk about what is actually being made.</p>

<h3>IAM User</h3>

<p>As I alluded to before, cloudformation is making an IAM user.  IAM gives us greater security, because we are not keeping our main AWS credentials on the server we are making.  In this case, we are making a user that can access Amazon S3 (file storage), and send email via Amazon SES.  Access to S3 allows the instance to be &ldquo;bootstrapped&rdquo; with some needed applications when it is created.  Basically, the application downloads some basic applications from S3 and does some initial configuration.  This makes it simpler for us down the line.  Access to SES will let our wordpress instance send email to users (if we enable it), and it will let us backup our wordpress installation to S3 (again, if we enable it).</p>

<h3>EC2 Server</h3>

<p>The template will also create a server.  This server will run wordpress, once we set everything up.  It will need to have access to a database (or run a database locally).  It will run the Ubuntu OS.</p>

<h3>ELB</h3>

<p>An ELB is an elastic load balancer.  It basically redirects from an external facing URL to a group of servers.  It is generally intended to balance load between several machines.  In our case, it serves two purposes.  Amazon Route 53 (which we will go into later) does not allow naked domain redirection (ie vikparuchuri.com instead of www.vikparuchuri.com) unless we use an ELB. The ELB also allows us to swap servers in/out on the backend however we want later.</p>

<h3>Security Group</h3>

<p>A security group determines who can access our server, and from what ports.</p>

<h2>The database</h2>

<p>You may have noticed that I did not include a database in the previous description of the stack.  This is because I wanted to make it as flexible as possible what database you use.  Also, including the database in a stack is a bit fraught with danger, because you risk losing all of your data if you accidentally delete a stack.</p>

<p>You can either create a database locally on the EC2 server you have just made, or make a separate database server using <a href="http://aws.amazon.com/rds/">Amazon RDS</a>.  In my case, I used Amazon RDS because it makes it more flexible to add/remove EC2 servers without losing data.</p>

<p>Feel free to use either option.  For the purposes of this tutorial, I will go with a local database, because RDS is a bit more complex to setup.</p>

<h2>Stack outputs</h2>

<p>The stack should now have finished coming up.  If you see a red light and the status &ldquo;ROLLBACK_COMPLETE&rdquo;, you should check &ldquo;Events&rdquo; under the stack detail to find the reason for the rollback.  Often, it is something like having typed a configuration setting in incorrectly.</p>

<p>Now that the stack has been created, you can click on the stack in the console and check on the outputs.</p>

<p><img src="http://vikparuchuri.com/blog/images/wordpress-install/cf_outputs.png" alt="outputs" /></p>

<p>The outputs will tell you the ELB address (referred to from hereon as ELBAddress) and the EC2 server address (referred to from hereon as ServerAddress).  Save both of these somewhere: you can access them here anytime, but we will need them in several places.</p>

<h2>Setting up a local database</h2>

<p>We can now setup a local database for the wordpress instance.  Feel free to skip this if you will be using an external database.  Regardless of what database server you are using, it is still nice to have a user just for wordpress with limited access, though (this will automatically be setup later on).</p>

<h3>Install mysql server</h3>

<p>Let&rsquo;s ssh into our instance:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">ssh</span> <span class="n">ubuntu</span><span class="nd">@ServerAddress</span>
</span></code></pre></td></tr></table></div></figure>


<p>We will need to install mysql:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">sudo</span> <span class="n">apt</span><span class="o">-</span><span class="n">get</span> <span class="n">install</span> <span class="n">mysql</span><span class="o">-</span><span class="n">server</span>
</span></code></pre></td></tr></table></div></figure>


<p>Set whatever root password you want, but make sure you save it somewhere.</p>

<h3>Setup database/user/permissions</h3>

<p><strong>The following steps in this section (creating a database, a user, and granting the relevant permissions), will be done automatically.  Only do it yourself if you want to understand the process or debug.</strong></p>

<p>Now, let&rsquo;s create a database, a database user, and give the user the right permissions.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">mysql</span> <span class="o">-</span><span class="n">u</span> <span class="n">root</span> <span class="o">-</span><span class="n">p</span>
</span></code></pre></td></tr></table></div></figure>


<p>Type in the password you set during the installation of mysql-server when it asks for a password.</p>

<p>Once you are in the mysql shell:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">CREATE</span> <span class="n">DATABASE</span> <span class="n">wordpress</span><span class="p">;</span>
</span><span class='line'><span class="n">CREATE</span> <span class="n">USER</span> <span class="n">wordpress</span> <span class="n">IDENTIFIED</span> <span class="n">BY</span> <span class="s">&#39;[insert password here]&#39;</span><span class="p">;</span>
</span><span class='line'><span class="n">GRANT</span> <span class="n">SELECT</span><span class="p">,</span><span class="n">INSERT</span><span class="p">,</span><span class="n">UPDATE</span><span class="p">,</span><span class="n">DELETE</span> <span class="n">ON</span> <span class="n">wordpress</span><span class="o">.*</span> <span class="n">TO</span> <span class="s">&#39;wordpress&#39;</span><span class="err">@</span><span class="s">&#39;localhost&#39;</span><span class="p">;</span>
</span></code></pre></td></tr></table></div></figure>


<p>Save your wordpress user password somewhere.</p>

<p>We are now done setting up the database, and we can move on to the other steps.  Use <code>exit</code> to terminate the ssh connection.</p>

<h2>Setup Route53 to forward our domain to the ELB</h2>

<p>Route53 is Amazon&rsquo;s DNS management system, as I mentioned earlier.  It will allow us to point users who go to www.oursite.com to our ELB.  <strong>Feel free to skip this if you don&rsquo;t need this, or if you plan to manage your DNS some other way.  You will always be able to access your site via ELBAddress.</strong></p>

<p>To use Route53, first go to the <a href="https://console.aws.amazon.com/route53/home">Route53 Console</a>.</p>

<p>Then, click on &ldquo;Create Hosted Zone&rdquo;.</p>

<p>This will pop up a box on the right that lets you enter your domain name. Enter your domain name without the www, and then click create hosted zone at the bottom.</p>

<p><img src="http://vikparuchuri.com/blog/images/wordpress-install/53_hosted.png" alt="hosted zones" /></p>

<p>Your domain will now appear in the center panel.  Select your domain, and then look at &ldquo;delegation sets&rdquo; at the right.
<img src="http://vikparuchuri.com/blog/images/wordpress-install/53_delegation.png" alt="dns entries" /></p>

<p>These are your namesevers, and you will need to set these as the DNS records with your registrar.  The AWS help at the top right will give you information on this if you have questions.</p>

<p>Once you have setup the nameserver configuration, you will need to setup the records to point at your wordpress elb (which will point to your server).</p>

<p>To do this, you will need to setup two alias records, one for the &ldquo;naked&rdquo; domain (vikparuchuri.com), and one for the full domain (www.vikparuchuri.com).</p>

<p>Select your domain in the Route 53 control panel, and the click on &ldquo;go to record sets&rdquo; at the top left.  This will bring you to a detail view of the record set.</p>

<p>Now, you can click on &ldquo;Create record set&rdquo;, which will pop up a box at the right.</p>

<p>We will be making an alias record that points at our ELBAddress.
<img src="http://vikparuchuri.com/blog/images/wordpress-install/53_ndomain.png" alt="naked domain redirect" /></p>

<p>Click create record set when you are done, and then do the same for the full domain.
<img src="http://vikparuchuri.com/blog/images/wordpress-install/53_domain.png" alt="full domain redirect" /></p>

<p>We are now setup as far as what we need for the wordpress install.</p>

<h2>Setup for wordpress deployment</h2>

<p>We are now ready to do our basic wordpress configuration with ansible.  This will setup a basic single-site wordpress install.</p>

<h3>Setup ansible</h3>

<p>On our local machine, let&rsquo;s go to the wp-deployment folder that we created earlier with git clone:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">cd</span> <span class="n">wp</span><span class="o">-</span><span class="n">deployment</span>
</span></code></pre></td></tr></table></div></figure>


<p>You will need to be running python 2.6+ before doing the following step.  See <a href="http://askubuntu.com/questions/101591/how-do-i-install-python-2-7-2-on-ubuntu">this link</a> for how to install python on ubuntu.  You will also need <code>pip</code>, which you can get by doing <code>easy_install pip</code> after python is installed.  See <a href="http://myadventuresincoding.wordpress.com/2011/09/11/python-upgrading-python-with-easy_install-pip-and-virtualenv-on-a-mac/">this link</a> for a description of how to do this on a mac, and <a href="http://stackoverflow.com/questions/4750806/how-to-install-pip-on-windows">this link</a> for how to do this on windows.</p>

<p>Now, using the python package manager pip, let&rsquo;s install the requirements.  Feel free to use a virtualenv or not use one for this.  A <a href="http://www.virtualenv.org/en/latest/">virtualenv</a> is a python tool that allows you to isolate environments for each of your applications.  I highly recommend looking at <a href="http://virtualenvwrapper.readthedocs.org/en/latest/">virtualenvwrapper</a> if you choose to use a virtualenv.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">pip</span> <span class="n">install</span> <span class="o">-</span><span class="n">r</span> <span class="n">requirements</span><span class="o">.</span><span class="n">txt</span>
</span></code></pre></td></tr></table></div></figure>


<h3>Set secret variables</h3>

<p>Ansible has a concept called playbooks.  Each playbook will match a certain subset of the potential hosts (in this case, the hosts are our ec2 instances, and subsets are the ones with the correct tags, which is why it was important to set tags earlier).</p>

<p>Playbooks will run certain commands on a certain set of hosts.  In this case, we want to run the commands to correctly setup wordpress.</p>

<p>Get the template file from <code>secrets/vars/wordpress_prod_vars.yml.template</code>.  Then, edit the values to reflect what you need.</p>

<p>The values from <code>auth_key</code> to <code>nonce_salt</code> are wordpress internal secret variables (salts).  You can generate random ones <a href="https://api.wordpress.org/secret-key/1.1/salt/">here</a>.  When you insert them into the template, make sure that you put them between the quotes.  The single quotes need to be there.</p>

<p>Your final template should look something like this (<strong>don&rsquo;t use these salts, generate your own!</strong>).</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">database_user</span><span class="p">:</span> <span class="n">wordpress</span>
</span><span class='line'><span class="n">database_password</span><span class="p">:</span> <span class="n">INSERT_PASSWORD_HERE</span>
</span><span class='line'>
</span><span class='line'><span class="n">database_host</span><span class="p">:</span> <span class="n">localhost</span>
</span><span class='line'><span class="n">database_root_user</span><span class="p">:</span> <span class="n">root</span>
</span><span class='line'><span class="n">database_root_password</span><span class="p">:</span> <span class="n">INSERT_ROOT_PASSWORD_HERE</span>
</span><span class='line'>
</span><span class='line'><span class="n">auth_key</span><span class="p">:</span> <span class="s">&#39;!UCVyMfA4q6~GOt]hr9!{9H/Ec8*rs9,9Ow|~n0pEbRacmLnD~Bb]GC9DW9/n+;k&#39;</span>
</span><span class='line'><span class="n">secure_auth_key</span><span class="p">:</span> <span class="s">&#39;:rg-V/}0h`0]Sx]/tR6YByYwulzT&lt;TXd_tD^&amp;CBY$+b$H`yxA{*`Bv! -Mmwqp (&#39;</span>
</span><span class='line'><span class="n">logged_in_key</span><span class="p">:</span> <span class="s">&#39;/QA}nvBwLjVIY&gt;CPH(.}FQ%&amp;)@e((yCn_V`RrVg&gt;@[YiyX]DX9{q@&amp;@H|!O7PiE@&#39;</span>
</span><span class='line'><span class="n">nonce_key</span><span class="p">:</span> <span class="s">&#39;O!,(tiA;C,$]v&amp;:,N6^60wT&lt;-hq/p sF])0)Y9(pQ`u0!.J,;KYU]n</span><span class="si">%o</span><span class="s">TD]$o{Oq&#39;</span>
</span><span class='line'><span class="n">auth_salt</span><span class="p">:</span> <span class="s">&#39;&gt;@E7siI)e|;rc@ qwo^9GX}D O+9DEh6@hd%PifC/yyvaH?c8)+7swV-D9=%WJF1&#39;</span>
</span><span class='line'><span class="n">secure_auth_salt</span><span class="p">:</span> <span class="s">&#39;YLKK+W&amp;Lrx-/4@r&lt;1[AW9&gt;v&amp;Sg|HnZ. c)N`NNvBe!gc=`5[bqjSisslF+:L1x G&#39;</span>
</span><span class='line'><span class="n">logged_in_salt</span><span class="p">:</span> <span class="s">&#39;@AzQQr#}5;QD&lt;iDSu#)|wM(UQ7?LV#I|F` ]=:LnFN`-!)1i!!I&gt;a96iaqE*y[1+&#39;</span>
</span><span class='line'><span class="n">nonce_salt</span><span class="p">:</span> <span class="s">&#39;39OSa0v~!=vh0j[YXOmR?,tq-G]x:uphoNaO)Hj..&amp;|2Dg@G:S|#}QZ@49+b46j5&#39;</span>
</span><span class='line'>
</span><span class='line'><span class="n">elb_address</span><span class="p">:</span> <span class="n">INSERT_ELBAddress_HERE</span>
</span></code></pre></td></tr></table></div></figure>


<p>A note on the elb_address.  This is the site where your server will be accessed from.  If you have a domain that you want to access your site from that you just setup with route53 (ie vikparuchuri.com), then use that.  For multisite, make it the primary address you will be accessing from.  If you don&rsquo;t have a domain, using the ELBAddress is fine.</p>

<p>Once your template looks good, you can save it without the .template extension (<code>wordpress_prod_vars.yml</code>).</p>

<h2>Boto configuration</h2>

<p>Ansible needs to find your server, which it does via boto, which is a python utility to connect to Amazon AWS.  To configure boto, you will need to make a .boto file with your AWS credentials.</p>

<p>Please see <a href="http://boto.readthedocs.org/en/latest/boto_config_tut.html">these instructions</a> for more information on configuring boto.</p>

<p>If boto is setup correctly, you will be able to run:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">cd</span> <span class="n">wp</span><span class="o">-</span><span class="n">deployment</span><span class="o">/</span><span class="n">playbooks</span>
</span><span class='line'><span class="o">./</span><span class="n">ec2</span><span class="o">.</span><span class="n">py</span>
</span></code></pre></td></tr></table></div></figure>


<p>This should show you all your running EC2 servers.</p>

<h2>Deploying wordpress via ansible</h2>

<p>These instructions will setup a single site via wordpress, which can then be extended to be a multisite.</p>

<h3>Inspecting the playbooks</h3>

<p>We will need to first go to the playbooks directory.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">cd</span> <span class="n">wp</span><span class="o">-</span><span class="n">deployment</span><span class="o">/</span><span class="n">playbooks</span>
</span></code></pre></td></tr></table></div></figure>


<p>If you look at the folder, you will see that there are .yml files, which are the playbooks, and a roles folder, which contains the code that the playbooks actually run on the hosts.</p>

<p>If you look at the roles folder, you will see two roles: &ldquo;mu&rdquo; and &ldquo;wp&rdquo;.  The wp role will make a wordpress installation on a site, and the mu role will convert it to be a multisite installation.</p>

<p>Both roles have associated variables in their vars folder that you can edit if you want, but there is no real reason not to use the defaults.</p>

<h3>Running the playbook</h3>

<p>We can now run <code>ansible-playbook -vvv --user=ubuntu  wp_prod.yml -i ./ec2.py  -c ssh</code>, and it will connect to our EC2 server and configure it with wordpress.</p>

<p>If the command creates an error, you may have done the cloudformation configuration incorrectly and tagged your machines improperly.  You can look at the tags and fix this in the &ldquo;hosts&rdquo; section of the <code>wp_prod.yml</code> playbook.</p>

<p>Once the playbook finishes running, you can go to <code>YOUR_SERVER_NAME/wp-admin/install.php</code> on your server to begin wordpress installation.  <code>YOUR_SERVER_NAME</code> should be what you entered for <code>elb_address</code> in the template.  In this example, I set up route 53 to point to my ELB, and used the route 53 domain.</p>

<h2>Setting up wordpress</h2>

<p>Once you can get to <code>YOUR_SERVER_NAME/wp-admin/install.php</code>, you can setup your site through the interface.  One note of caution : <strong>do not use admin as your username.</strong>  There are people who scan for the admin username and try to guess the password through brute force attacks.</p>

<p>Once you finish the setup screens, congratulations!  You have setup a single user wordpress site.</p>

<h2>Activating multisite mode</h2>

<h3>Activation</h3>

<p>Now, to activate multisite mode for your wordpress install, you will have to go to the wp-admin, and then click on tools/network setup.  You can also go to the url <code>YOUR_SERVER_NAME/wp-admin/network.php</code>.</p>

<p>You can then type in some settings to setup network mode.  Make sure you select sub-domains!</p>

<p><img src="http://vikparuchuri.com/blog/images/wordpress-install/wp_network.png" alt="activate network mode" /></p>

<h3>Post activation</h3>

<p>After you hit the install button, you will come to a screen that asks you to do some configuration.  You can ignore those steps for now, as we will be doing that through an ansible playbook.</p>

<p>If you see a &ldquo;Warning! Wildcard DNS may not be configured correctly!&rdquo; in red at the top, this means that you have not setup <code>*.YOUR_SERVER_ADDRESS.com</code> to redirect to the ELB.  You can do this using Route 53 alias records (see the previous section on this), or you can skip it for now.  All this means is that whenever you make another site in the multisite network, you will need to setup a redirect from <code>MULTISITE_NAME.YOUR_SERVER_ADDRESS.com</code> to the ELB (you need to do this even if you are using separate domain names for each of your sites on multisite).</p>

<h2>Configuring multisite mode</h2>

<p>After activating multisite mode, you can then run the mu playbook to setup multisite.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class='python'><span class='line'><span class="n">cd</span> <span class="n">wp</span><span class="o">-</span><span class="n">deployment</span><span class="o">/</span><span class="n">playbooks</span>
</span><span class='line'><span class="n">ansible</span><span class="o">-</span><span class="n">playbook</span> <span class="o">-</span><span class="n">vvv</span> <span class="o">--</span><span class="n">user</span><span class="o">=</span><span class="n">ubuntu</span>  <span class="n">mu_prod</span><span class="o">.</span><span class="n">yml</span> <span class="o">-</span><span class="n">i</span> <span class="o">./</span><span class="n">ec2</span><span class="o">.</span><span class="n">py</span>  <span class="o">-</span><span class="n">c</span> <span class="n">ssh</span>
</span></code></pre></td></tr></table></div></figure>


<p>And now you will be able to go to wp-admin on your site, and login.  Multisite mode is now active, and you will be able to make a network of sites with their own domain names.</p>

<h2>Making new sites</h2>

<h3>Set up a new site with a subdomain</h3>

<p>To make a new site, go to <code>YOUR_SERVER_ADDRESS/wp-admin/network/site-new.php</code>, and make a new site with a subdomain.</p>

<p><img src="http://vikparuchuri.com/blog/images/wordpress-install/wp_addsite.png" alt="new site" /></p>

<p>Set up the DNS for that subdomain to redirect to your ELB, even if you are not using Route 53.  If you already have a wildcard redirect, then you can skip this.</p>

<h3>Using a top level domain</h3>

<p>If you want to use a top level domain, such as test.com, for the site:</p>

<p>Go to domain mapping in <code>YOUR_SERVER_ADDRESS/wp-admin/network/settings.php?page=dm_domains_admin</code> and set the id of the site (you can get this from the url when you click on a site in sites), and the domain.  Make sure that primary is checked.</p>

<p><img src="http://vikparuchuri.com/blog/images/wordpress-install/wp_adddomain.png" alt="new domain" /></p>

<p>You will have to redirect the domain (www.test.com) and the apex domain (test.com) to your ELB.  You can do this via Route 53, and the instructions are above.</p>

<h2>Modifications/Contributions</h2>

<p>The code for the cloudformation template and the ansible playbooks is on <a href="https://github.com/vikparuchuri/wp-deployment">github</a>, so please feel free to fork and submit a pull request if you want to change something.</p>

<h2>Future post topics</h2>

<p>There are some other useful things that I did to make wordpress easier to manage, and these include:
* Setting up email via Amazon SES
* Automated backups via xcloner
* Migrating from blogger to wordpress and setting up redirects
* Setting up a database using Amazon RDS
* Setting up email/calendar using Google Apps</p>

<p>I intend to post about these down the line, as time permits.  Please let me know if any of them are particularly interesting.</p>

<p class="syndicated-attribution"><br />
<br />
<font color=#8c1717><b>Please comment on the article here:</b></font> <a href="http://vikparuchuri.com/blog/creating-a-wordpress-single-or-multisite-install-using-cloudformation-and-ansible/"><b>Vik's Blog</b></a>
<br />
<br /></p><p>The post <a href="http://www.statsblogs.com/2013/06/18/creating-a-wordpress-single-or-multisite-install-using-cloudformation-and-ansible/">Creating a wordpress single or multisite install using cloudformation and ansible</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p><img src="http://feeds.feedburner.com/~r/statsblogs/~4/9H4sl32xlwA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.statsblogs.com/2013/06/18/creating-a-wordpress-single-or-multisite-install-using-cloudformation-and-ansible/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.statsblogs.com/2013/06/18/creating-a-wordpress-single-or-multisite-install-using-cloudformation-and-ansible/</feedburner:origLink></item>
		<item>
		<title>re:log: Tracking the Movements of Conference Attendees via WiFi</title>
		<link>http://feedproxy.google.com/~r/statsblogs/~3/m899dlWcU3Y/</link>
		<comments>http://www.statsblogs.com/2013/06/18/relog-tracking-the-movements-of-conference-attendees-via-wifi/#comments</comments>
		<pubDate>Tue, 18 Jun 2013 20:44:26 +0000</pubDate>
		<dc:creator>information aesthetics</dc:creator>
				<category><![CDATA[Data Visualization]]></category>
		<category><![CDATA[collection]]></category>

		<guid isPermaLink="false">http://infosthetics.com/archives/2013/06/relog_tracking_the_movements_of_conference_attendees_via_wifi.html</guid>
		<description><![CDATA[<p>
re:log [opendatacity.de] by German data designers OpenDataCity reveals the movements of about 6,700 different electronic devices during re:publica 2013, a prestigious European conference on the topic of Digital Society.

A dynamic map of the conferenc...</p><p>The post <a href="http://www.statsblogs.com/2013/06/18/relog-tracking-the-movements-of-conference-attendees-via-wifi/">re:log: Tracking the Movements of Conference Attendees via WiFi</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p>]]></description>
			<content:encoded><![CDATA[<p class="syndicated-attribution">(This article was originally published at <a href="http://infosthetics.com/">information aesthetics</a>, and syndicated at <a href="http://www.statsblogs.com">StatsBlogs</a>.)
<br /></p>
<p><img alt="relog_wifi_location.jpg" src="http://infosthetics.com/archives/relog_wifi_location.jpg" width="600" height="300" class="mt-image-none" style="" /><br />
<a href="http://apps.opendatacity.de/relog/">re:log</a> [opendatacity.de] by German data designers <a href="http://www.opendatacity.de/">OpenDataCity</a> reveals the movements of about 6,700 different electronic devices during <a href="http://www.re-publica.de/">re:publica 2013</a>, a prestigious European conference on the topic of Digital Society.</p>

<p>A dynamic map of the conference location shows the approximate locations of the devices when they were connected to the local WiFi hotspots. An interactive timeline underneath allows to explore the dynamic changes over time, while a rectangular area can be drawn to more specifically highlight and follow a smaller amount of dots. </p>

<p>The visualization was based on tracking the MAC addresses of the devices according to the WiFi hotspot they were connected to. This data, which can be downloaded, was fully anonymized, yet the authors mention their desire to allow people to look up their own MAC address in the future.</p>

<p></p><div class="feedflare">
<a href="http://feeds.infosthetics.com/~ff/infosthetics?a=y-4zIUvMROY:-mvIzOMiRWU:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/infosthetics?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.infosthetics.com/~ff/infosthetics?a=y-4zIUvMROY:-mvIzOMiRWU:nQ_hWtDbxek"><img src="http://feeds.feedburner.com/~ff/infosthetics?d=nQ_hWtDbxek" border="0"></img></a> <a href="http://feeds.infosthetics.com/~ff/infosthetics?a=y-4zIUvMROY:-mvIzOMiRWU:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/infosthetics?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.infosthetics.com/~ff/infosthetics?a=y-4zIUvMROY:-mvIzOMiRWU:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/infosthetics?d=7Q72WNTAKBA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/infosthetics/~4/y-4zIUvMROY" height="1" width="1"/>
<p class="syndicated-attribution"><br />
<br />
<font color=#8c1717><b>Please comment on the article here:</b></font> <a href="http://feeds.infosthetics.com/~r/infosthetics/~3/y-4zIUvMROY/relog_tracking_the_movements_of_conference_attendees_via_wifi.html"><b>information aesthetics</b></a>
<br />
<br /></p><p>The post <a href="http://www.statsblogs.com/2013/06/18/relog-tracking-the-movements-of-conference-attendees-via-wifi/">re:log: Tracking the Movements of Conference Attendees via WiFi</a> appeared first on <a href="http://www.statsblogs.com">All About Statistics</a>.</p><img src="http://feeds.feedburner.com/~r/statsblogs/~4/m899dlWcU3Y" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.statsblogs.com/2013/06/18/relog-tracking-the-movements-of-conference-attendees-via-wifi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.statsblogs.com/2013/06/18/relog-tracking-the-movements-of-conference-attendees-via-wifi/</feedburner:origLink></item>
	</channel>
</rss>
