<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
    <title>Statistical Modeling, Causal Inference, and Social Science</title>
    <link rel="alternate" type="text/html" href="http://www.stat.columbia.edu/~cook/movabletype/mlm/" />
    
    <id>tag:www.stat.columbia.edu,2008-11-24:/~cook/movabletype/mlm/1</id>
    <updated>2009-07-06T18:57:16Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.25</generator>

<link rel="self" href="http://feeds.feedburner.com/StatisticalModelingCausalInferenceAndSocialScience" type="application/atom+xml" /><entry>
    <title>This week's New York R Meetup is about Bayes</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/VopmOYP9RGE/this_weeks_new.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2549</id>

    <published>2009-07-06T18:48:10Z</published>
    <updated>2009-07-06T18:57:16Z</updated>

    <summary>This Thursday at 7pm Jake Hofman and Suresh Velagapundi will present a session at New York R Statistical Programming Meetup at NYU - Silver Center (100 Washington Square East, Room 401). Here's the outline: Background: Conditional probability &amp; Bayes' Rule...</summary>
    <author>
        <name>Aleks Jakulin</name>
        <uri>http://stat.columbia.edu/~jakulin</uri>
    </author>
    
        <category term="Bayesian Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;This Thursday at 7pm &lt;a href="http://www.jakehofman.com/"&gt;Jake Hofman&lt;/a&gt; and Suresh Velagapundi will present a session at &lt;a href="http://www.meetup.com/nyhackr/calendar/10644414/"&gt;New York R Statistical Programming Meetup&lt;/a&gt; at NYU - Silver Center (100 Washington Square East, Room 401). Here's the outline:&lt;/p&gt;

&lt;p&gt;
&lt;blockquote&gt;
&lt;strong&gt;Background:&lt;/strong&gt;&lt;ul&gt;
    &lt;li&gt;Conditional probability &amp; Bayes' Rule&lt;/li&gt;
    &lt;li&gt;Treating parameters as random variables &amp; putting distributions on them&lt;/li&gt;
    &lt;li&gt;Bayesian inference: from priors &amp; likelihoods to posteriors&lt;/li&gt;
&lt;/ul&gt;

&lt;strong&gt;From Principles to Practice:&lt;/strong&gt;&lt;ul&gt;
    &lt;li&gt;Simple plan; difficult to execute (normalization)&lt;/li&gt;
    &lt;li&gt;Resort to approximation methods (variational &amp; MCMC)&lt;/li&gt;
    &lt;li&gt;Model selection / complexity control a la Bayes (time permitting)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;&lt;/p&gt;
        
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/VopmOYP9RGE" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/07/this_weeks_new.html</feedburner:origLink></entry>

<entry>
    <title>Does Medicare actually have higher administrative costs than private insurers?</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/qIAqPN2e_jk/does_medicare_a.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2548</id>

    <published>2009-07-06T13:15:46Z</published>
    <updated>2009-07-06T21:04:54Z</updated>

    <summary>Greg Mankiw links to an article that illustrates the challenges of interpreting raw numbers causally. This would really be a great example for your introductory statistics or economics classes, because the article, by Robert Book, starts off by identifying a...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Causal Inference" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Decision Theory" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Economics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Teaching" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;Greg Mankiw &lt;a href="http://gregmankiw.blogspot.com/2009/07/does-medicare-have-lower-administrative.html"&gt;links to&lt;/a&gt; an article that illustrates the challenges of interpreting raw numbers causally.  This would really be &lt;strong&gt;a great example for your introductory statistics or economics classes&lt;/strong&gt;, because &lt;a href="http://www.heritage.org/Research/HealthCare/wm2505.cfm"&gt;the article&lt;/a&gt;, by Robert Book, starts off by identifying a statistical error and then goes on to make a nearly identical error of its own!  Fun stuff.&lt;/p&gt;
        &lt;p&gt;Book sets up the story:&lt;/p&gt;

&lt;blockquote&gt;Many advocates . . . claim that a public health plan will save money compared to private health insurance because "everyone knows" that the largest government health program, Medicare, has lower administrative costs than private insurance. . . . Advocates of a public plan assert that Medicare has administrative costs of 3 percent (or 6 to 8 percent if support from other government agencies is included), compared to 14 to 22 percent for private employer-sponsored health insurance . . .&lt;/blockquote&gt;

&lt;p&gt;He then points out the problem with this raw comparison:&lt;/p&gt;

&lt;blockquote&gt;Medicare patients are by definition elderly, disabled, or patients with end-stage renal disease, and as such have higher average patient care costs, so expressing administrative costs as a percentage of total costs gives a misleading picture of relative efficiency. Administrative costs are incurred primarily on a fixed or per-beneficiary basis; this approach spreads Medicare's costs over a larger base of patient care cost.&lt;/blockquote&gt;

&lt;p&gt;Excellent point.  Don't forget about the denominator, as we always tell our students.&lt;/p&gt;

&lt;p&gt;The next step, I'd think, is to compare costs for different groups of potential patients, characterized by age, health status, and socioeconomic and demographic background variables.&lt;/p&gt;

&lt;p&gt;But that's not what Book does--instead he just compares average administrative costs per patient:  $509 per primary beneficiary for Medicare, $453 for private insurers.  But this can't be right:  of course, Medicare patients, who are older, sicker, and are going to the doctor and hospital more often, will have higher administrative costs!  It seems silly to jump all over the first set of unadjusted numbers and then take the second set of unadjusted numbers at face value, leading to this claim:&lt;/p&gt;

&lt;blockquote&gt;If recent cost history is any guide, switching the more than 200 million Americans with private insurance to a public plan will not save money but will actually increase health care administrative costs by several billion dollars.&lt;/blockquote&gt;

&lt;p&gt;I don't buy it--for essentially the same reason that I find Book's first argument persuasive.  It would seem to make more sense to compare comparable groups of people.  (But see Book's comment below, defending his calculations.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I'm no expert in health policy.  These are just my impressions as a teacher of statistics.  It's great to find such examples that are so relevant to policy.  I was surprised to see Mankiw quote the above article without criticism; but I'm pretty sure he's studied these issues in a lot more detail than I have, and so perhaps he has additional knowledge that makes him confident in the substance of Book's reasoning.&lt;/p&gt;

&lt;p&gt;In particular, I expect that Mankiw has spent some time talking with the faculty at Harvard's world-class &lt;a href="http://www.hcp.med.harvard.edu/research_menu/4/102"&gt;Department of Heath Care Policy&lt;/a&gt;.  I don't know if any of their professors are Eagle Scouts, but they do have &lt;a href="http://www.hcp.med.harvard.edu/people/hcp_core_faculty/joseph_newhouse"&gt;this guy&lt;/a&gt;, who was the founding editor of the Journal of Health Economics, a member of the editorial board of the New England Journal of Medicine, vice chair of the Medicare Payment Advisory Commission, etc etc.  Also on the board of directors of Aetna so it looks like he has experience on both sides.  Perhaps Newhouse or one of his colleagues has done a more detailed study that support's Book's conclusions.&lt;/p&gt;

&lt;p&gt;P.S.  Krugman &lt;a href="http://krugman.blogs.nytimes.com/2009/07/06/administrative-costs/"&gt;links&lt;/a&gt; to &lt;a href="http://institute.ourfuture.org/files/Jacob_Hacker_Public_Plan_Choice.pdf"&gt;an article&lt;/a&gt; by political scientist Jacob Hacker that questions Book's numbers.  I doubt Mankiw reads my blog very often, but I expect he'll respond to Krugman, so perhaps he'll supply some more data to rebut this.  Could be interesting.&lt;/p&gt;

&lt;p&gt;P.P.S.  Book &lt;a href="http://krugman.blogs.nytimes.com/2009/07/06/administrative-costs/"&gt;responds&lt;/a&gt; on Krugman's blog, explaining why in his opinion the Hacker report does not shoot down Book's numbers.  I'm still skeptical of his claim that, "The appropriate measure is administrative cost per person."  Perhaps he can address this.&lt;/p&gt;

&lt;p&gt;P.P.P.S.  Book does address the cost-per-person issue in a comment below.  At this point, I'll just have to say that the answer depends on specifics of billing, administrative costs, etc.  I don't think I have anything further to add.  But I still think it will be a great teaching example.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/qIAqPN2e_jk" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/07/does_medicare_a.html</feedburner:origLink></entry>

<entry>
    <title>Resolving disputes between J. Pearl and D. Rubin on causal inference</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/rOYJypy7a8A/disputes_about.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2547</id>

    <published>2009-07-05T16:32:15Z</published>
    <updated>2009-07-05T18:58:11Z</updated>

    <summary>This is a pretty long one. It's an attempt to explore some of the differences between Judea Pearl's and Don Rubin's approaches to causal inference, and is motivated by recent article by Pearl. Pearl sent me a link to this...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Causal Inference" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;This is a pretty long one.  It's an attempt to explore some of the differences between Judea Pearl's and Don Rubin's approaches to causal inference, and is motivated by recent article by Pearl.&lt;/p&gt;

&lt;p&gt;Pearl sent me a link to &lt;a href="http://www.cs.ucla.edu/~kaoru/r348.pdf"&gt;this piece&lt;/a&gt; of his, writing:&lt;/p&gt;

&lt;blockquote&gt;I [Pearl] would like to encourage a blog-discussion on the main points raised there. For example:

&lt;p&gt;Whether graphical methods are in some way "less principled" than  other methods of analysis.&lt;/p&gt;

&lt;p&gt;Whether confounding bias can only decrease by conditioning on a new covariate.&lt;/p&gt;

&lt;p&gt;Whether the M-bias, when it occurs, is merely a mathematical curiosity, unworthy of researchers attention. &lt;/p&gt;

&lt;p&gt;Whether Bayesianism instructs us to condition on all available measurements.&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;I've never been able to understand Pearl's notation:  notions such as a "collider of an M-structure" remain completely opaque to me.  I'm not saying this out of pride--I expect I'd be a better statistician if I understood these concepts--but rather to give a sense of where I'm coming from.  I was a student of Rubin and have used his causal ideas for awhile, starting with &lt;a href="http://www.stat.columbia.edu/~gelman/research/published/ajps1990.pdf"&gt;this article&lt;/a&gt; from 1990 on estimating the incumbency advantage in politics.  I'm pleased to see these ideas gaining wider acceptance.  In many areas (including studying incumbency, in fact), I think the most helpful feature of Rubin's potential-outcome framework is to get you, as a researcher, to think hard about what you are in fact trying to estimate.  In much of the current discussion of identification strategies, regression discontinuities, differences in differences, and the like, I think there's too much focus on technique and not enough thought put into what the estimates are really telling you.  That said, it makes sense that other theoretical perspectives such as Pearl's could be useful too.&lt;/p&gt;

&lt;p&gt;To return to the article at hand:  Pearl is clearly frustrated by what he views as Rubin's bobbing and weaving to avoid a direct settlement of their technical dispute.  From the other direction, I think Rubin is puzzled by Pearl's approach and is not clear what the point of it all is.&lt;/p&gt;

&lt;p&gt;I can't resolve the disagreements here, but maybe I can clarify some technical issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Controlling for pre-treatment and post-treatment variables&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Much of Pearl's discussion turns upon notions of "bias," which in a Bayesian context is tricky to define.  We certainly aren't talking about the classical-statistical "unbiasedness," in which E(theta.hat | theta) = theta for all theta, an idea that breaks down horribly in all sorts of situations (see page 248 of Bayesian Data Analysis).  Statisticians are always trying to tell people, Don't do this, Don't do that, but the rules for saying this can be elusive.  This is not just a problem for Pearl:  my own work with Rubin suffers from similar problems.  In chapter 7 of Bayesian Data Analysis (a chapter that is pretty much my translation of Rubin's ideas), we talk about how you can't do this and you can't do that.  We avoid the term "bias," but then it can be a bit unclear what our principles are.  For example, we recommend that your model should, if possible, include all variables that affect the treatment assignment.  This is good advice, but really we could go further and just recommend that an appropriate analysis should include &lt;em&gt;all variables&lt;/em&gt; that are potentially relevant, to avoid omitted-variable bias (or the Bayesian equivalent).  Once you've considered a variable, it's hard to go back to the state of innocence in which that information was never present.&lt;/p&gt;

&lt;p&gt;If I'm reading his article correctly, Pearl is making two statistical points, both in opposition to Rubin's principle that a Bayesian analysis (and, by implication, any statistical analysis) should condition on all available information:&lt;/p&gt;

&lt;p&gt;1.  When it comes to causal inference, Rubin says not to control for &lt;em&gt;post&lt;/em&gt;-treatment variables (that is, intermediate outcomes), which seems to contradict Rubin's more general advice as a Bayesian to condition on everything.&lt;/p&gt;

&lt;p&gt;2.  Rubin (and his collaborators such as Paul Rosenbaum) state unequivocally that a model should control for all &lt;em&gt;pre&lt;/em&gt;-treatment variables, even though including such variables, in Pearl's words, "may create spurious associations between&lt;br /&gt;
treatment and outcome and this, in turns, may increase or decrease confounding bias."&lt;/p&gt;

&lt;p&gt;Let me discuss each of these criticisms, as best as I can understand them.  Regarding the first point, a Bayesian analysis can control for intermediate outcomes--that's ok--but then the causal effect of interest won't be summarized by a single parameter--a "beta"--from the model.  In our book, Jennifer and I recommend not controlling for intermediate outcomes, and a few years ago I heard Don Rubin make a similar point in a public lecture (giving an example where the great R. A. Fisher made this mistake).  Strictly speaking, though, you can control for anything; you just then should suitably postprocess your inferences to get back to your causal inferences of interest.&lt;/p&gt;

&lt;p&gt;I don't fully understand Pearl's second critique, in which he says that it's not always a good idea to control for pre-treatment variables.  My best reconstruction is that Pearl's thinking about a setting where you could estimate a causal effect in a messy observational setting in which there are some important unobserved confounders, and it could well happen that controlling for a particular pre-treatment variable happens to make the confounding worse.  The idea, I think, is that if you have an analysis where various problems cancel each other out, then fixing one of these problems (by controlling for one potential counfounder) could result in a net loss.  I can believe this could happen in practice, but I'm wary of setting this up as a principle.  I'd rather control for all the pre-treatment predictors that I can, and then make adjustments if necessary to attempt to account for remaining problems in the model.  Perhaps Pearl's position and mine are not so far apart, however, if his approach of not controlling for a covariate could be seen as an approximation to a fuller model that controls for it while also adjusting for other, unobserved, confounders.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The sum of unidentifiable components can be identifiable&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At other points, Pearl seems to be displaying a misunderstanding of Bayesian inference (at least, as I see it).  For example, he writes:&lt;/p&gt;

&lt;blockquote&gt;For example, if we merely wish to predict whether a given person is a smoker, and we have data on the smoking behavior of seat-belt users and non-users, we should condition our prior probability P(smoking) on whether that person is a "seat-belt user" or not. Likewise, if we wish to predict the causal effect of smoking for a person known to use seat-belts, and we have separate data on how smoking affects seat-belt users and non-users, we should use the former in our prediction. . . . However, if our interest lies in the average causal effect over the entire population, then there is nothing in Bayesianism that compels us to do the analysis in each subpopulation separately and then average the results. The class-specific analysis may actually fail if the causal effect in each class is not identifiable.&lt;/blockquote&gt;

&lt;p&gt;I think this discussion misses the point in two ways.&lt;/p&gt;

&lt;p&gt;First, at the technical level, yes you definitely can estimate the treatment effect in two separate groups and then average.  Pearl is worried that the two separate estimates might bot be identifiable--in Bayesian terms, that they will individually have large posterior uncertainties.  But, if the study really is being done in a setting where the average treatment effect is identifiable, then the uncertainties in the two separate groups should cancel out when they're being combined to get the average treatment effect.  If the uncertainties &lt;em&gt;don't&lt;/em&gt; cancel, it sounds to me like there must be some additional ("prior") information that you need to add.&lt;/p&gt;

&lt;p&gt;The second way that I disagree with Pearl's example is that I don't think it makes sense to estimate the smoking behavior separately for seat-belt users and non-users.  This just seems like a weird thing to be doing.  I guess I'd have to see more about the example to understand why someone would do this.  I have a lot of confidence in Rubin, so if he actually did this, I expect he had a good reason.  But I'd have to see the example first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final thoughts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hal Stern once told me the real division in statistics was not between the Bayesians and non-Bayesians, but between the modelers and the non-modelers.  The distinction isn't completely clear--for example, where does the "Bell Labs school" of Cleveland, Hastie, Tibshirani, etc. fall?--but I like the idea of sharing a category as all the modelers over the years--even those who have not felt the need to use Bayesian methods.&lt;/p&gt;

&lt;p&gt;Reading Pearl's article, however, reminded me of another distinction, this time between discrete models and continuous models.  I have a taste for continuity and always like setting up my model with smooth parameters.  I'm just about never interested in testing whether a parameter equals zero; instead, I'd rather infer about the parameter in a continuous space.  To me, this makes particular sense in the sorts of social and environmental statistics problems where I work.  For example, is there an interaction between &lt;a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/who_wants_schoo.html"&gt;income, religion, and state of residence&lt;/a&gt; in predicting one's attitude toward school vouchers?  Yes.  I knew this ahead of time.  Nothing is zero, everything matters to some extent.  As discussed in chapter 6 of Bayesian Data Analysis, I prefer continuous model expansion to discrete model averaging.&lt;/p&gt;

&lt;p&gt;In contrast, Pearl, like many other Bayesians I've encountered, seems to prefer discrete models and procedures for finding conditional independence.  In some settings, this can't matter much:  if a source of variation is small, then maybe not much is lost by setting it to zero.  But it changes one's focus, pointing Pearl toward goals such as "eliminating bias" and "covariate selection" rather than toward the goals of modeling the relations between variables.  I think graphical models are a great idea, but given my own preferences toward continuity, I'm not a fan of the sorts of analyses that attempt to discover whether variables X and Y really have a link between them in the graph.  My feeling is, if X and Y &lt;em&gt;might&lt;/em&gt; have a link, then they &lt;em&gt;do&lt;/em&gt; have a link.  The link might be weak, and I'd be happy to use Bayesian multilevel modeling to estimate the strength of the link, partially pool it toward zero, and all the rest--but I don't get much out of statistical procedures that seek to estimate whether the link is there or not.&lt;/p&gt;

&lt;p&gt;Finally, I'd like to steal something I &lt;a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2007/10/how_bayesian_am.html"&gt;wrote&lt;/a&gt; a couple years ago regarding disputes over statistical methodology:&lt;/p&gt;

&lt;blockquote&gt;Different statistical methods can be used successfully in applications--there are many roads to Rome--and so it is natural for anyone (myself included) to believe that our methods are particularly good for applications. For example, Adrian Raftery does excellent applied work using discrete model averaging, whereas I don't feel comfortable with that approach. Brad Efron has used bootstrapping to help astronomers solve their statistical problems. Etc etc. I don't think that Adrian's methods are particularly appropriate to sociology, or Brad's to astronomy--these are just powerful methods that can work in a variety of fields. Given that we each have successes, it's unsurprising that we can each feel strongly in the superiority of our own approaches. And I certainly don't feel that the approaches in Bayesian Data Analysis are the end of the story. In particular, nonparametric methods such as those of David Dunson, Ed George, and others seem to have a lot of advantages.&lt;/blockquote&gt;

&lt;p&gt;Similarly, Pearl has achieved a lot of success and so it would be silly for me to argue, or even to think, that he's doing everything all wrong.  I think this expresses some of Pearl's frustration as well:  Rubin's ideas have clearly been successful in applied work, so it would be awkward to argue that Rubin is actually doing the wrong thing in the problems he's worked on.  It's more that any theoretical system has holes, and the expert practitioners in any system know how to work around these holes.&lt;/p&gt;
        
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/rOYJypy7a8A" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/07/disputes_about.html</feedburner:origLink></entry>

<entry>
    <title>Confusing reliability with validity</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/iVQEcuFWngo/this_note_by_st.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2546</id>

    <published>2009-07-03T21:11:55Z</published>
    <updated>2009-07-03T22:28:39Z</updated>

    <summary>This note by Steve Hsu on the history of the Wranglers (winners of a mathematics competition held each year from 1753-1909 at Cambridge University) reminded me of my experience in the U.S. math olympiad training program in high school. At...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Teaching" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;&lt;a href="http://infoproc.blogspot.com/2009/07/wranglers.html"&gt;This note&lt;/a&gt; by Steve Hsu on the history of the Wranglers (winners of a mathematics competition held each year from 1753-1909 at Cambridge University) reminded me of my experience in the U.S. math olympiad training program in high school.  At the time, it seemed clear that we were clearly ordered by ability (with my position somewhere between 15th and 20th out of 24!).  In retrospect, I think there are a lot of tricks to solving and writing up solutions to "Olympiad problems," and I didn't know a lot of these tricks.&lt;/p&gt;

&lt;p&gt;It was the usual paradox of measurement:  I was confusing reliability with validity, as they say in the psychometric literature.&lt;/p&gt;
        &lt;p&gt;I&lt;a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2008/12/could_we_publis.html"&gt;n retrospect&lt;/a&gt;, it worked out well for me to learn (even if falsely) that there were 15 or 20 kids my age better than me in math.  This made me realize that a career as a "mathematician" (to the extent I understood what this meant, based on my experiences up to the age of 16) was not for me.  Given what I know now, I think I would've wanted to be a statistician even if I'd been the #1 kid at the Olympiad.  Luckily this didn't happen to me.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;And now for the most important part&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One of the tricks that I didn't know about in the math olympiad training program is to plod along without giving up.  Sometimes the direct approach works, solving a problem by eliminating all alternatives.  That's a "trick" that's useful in a lot of areas of academic life.  For years I've been trying to get this message out to students:  If you get stuck right away, don't just stare at your desk and give up.  Instead, work actively.  This is a point I made in chapter 19 of the ARM book (in particular, the graph on page 416).  My theory is that students at top universities have succeeded pretty well by being able to solve problems quickly; they haven't really needed to develop the tools to solve problems systematically by brute force, the way I like to do it.&lt;/p&gt;

&lt;p&gt;I think they did try to explain this principle to us at the olympiad program (How to Solve It, and all that), but I didn't ever get the point, partly I think because the problems were so artificial that there only seemed to be a point to solving them if it could be done easily or through some clever trick.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/iVQEcuFWngo" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/07/this_note_by_st.html</feedburner:origLink></entry>

<entry>
    <title>How does statistical analysis differ when analyzing the entire population rather than a sample?</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/rj2wuuWhHDQ/how_does_statis.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2442</id>

    <published>2009-07-03T11:46:10Z</published>
    <updated>2009-07-03T11:50:13Z</updated>

    <summary>Daljit Dhadwal writes: On the Ask Metafilter site, someone asked the following: How does statistical analysis differ when analyzing the entire population rather than a sample? I need to do some statistical analysis on legal cases. I happen to have...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Miscellaneous Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;Daljit Dhadwal writes:&lt;/p&gt;

&lt;blockquote&gt;On &lt;a href="http://ask.metafilter.com/122797/Statistics-on-the-entire-population"&gt;the Ask Metafilter site&lt;/a&gt;, someone asked the following:

&lt;p&gt;How does statistical analysis differ when analyzing the entire population rather than a sample? I need to do some statistical analysis on legal cases. I happen to have the entire population rather than a sample. I'm basically interested in the relationship between case outcomes and certain features (e.g., time, the appearance of certain words or phrases in the opinion, the presence or absence of certain issues). Should I do anything different than I would if I were using a sample? For example, is a p-value meaningful in this kind of case?&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;My reply:&lt;/p&gt;

&lt;p&gt;This is a question that comes up a lot.  For example, what if you're running a regression on the 50 states.  These aren't a sample from a larger number of states; they're the whole population.&lt;/p&gt;

&lt;p&gt;To get back to the question at hand, it might be that you're thinking of these cases as a sample from a larger population that includes future cases as well.  Or, to put it another way, maybe you're interested in making predictions about future cases, in which case the relevant uncertainty comes from the year-to-year variation.  That's what we did when estimating the seats-votes curve:  we set up a hierarchical model with year-to-year variation estimated from a separate analysis.  (Original model is &lt;a href="http://www.stat.columbia.edu/~gelman/research/published/electoral2.pdf"&gt;here&lt;/a&gt;, later version is &lt;a href="http://www.stat.columbia.edu/~gelman/research/published/unified2.pdf"&gt;here&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;So, one way of framing the problem is to think of your "entire population" as a sample from a larger population, potentially including future cases.  Another frame is to think of there being an underlying probability model.  If you're trying to understand the factors that predict case outcomes, then the implicit full model includes unobserved factors (related to the notorious "error term") that contribute to the outcome.  If you set up a model including a probability distribution for these unobserved outcomes, standard errors will emerge.&lt;/p&gt;
        
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/rj2wuuWhHDQ" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/07/how_does_statis.html</feedburner:origLink></entry>

<entry>
    <title>Statistics on hiring statisticians</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/1ONeqKOHjws/statistics_on_h.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2545</id>

    <published>2009-07-02T21:30:01Z</published>
    <updated>2009-07-02T21:36:55Z</updated>

    <summary>Via Business Insider:...</summary>
    <author>
        <name>Aleks Jakulin</name>
        <uri>http://stat.columbia.edu/~jakulin</uri>
    </author>
    
        <category term="Miscellaneous Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;Via &lt;a href="http://www.businessinsider.com/where-businesses-are-hiring-for-tech-jobs-2009-7#statisticians-13"&gt;Business Insider&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;span class="mt-enclosure mt-enclosure-image" style="display: inline;"&gt;&lt;img alt="statisticians.jpg" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/statisticians.jpg" width="400" height="300" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /&gt;&lt;/span&gt;&lt;/p&gt;
        
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/1ONeqKOHjws" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/07/statistics_on_h.html</feedburner:origLink></entry>

<entry>
    <title>Arthur Jensen:  "the possible indicators of g are of unlimited diversity . . ."</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/mdTW-zTenqE/the_arthur_jens.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2540</id>

    <published>2009-07-02T13:05:21Z</published>
    <updated>2009-07-02T14:06:56Z</updated>

    <summary>After finding the Howard Wainer interview, I looked up the entire series of Profiles in Research published by the Journal of Educational and Behavioral Statistics. I don't have much to say about most of these interviews: some of these people...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Miscellaneous Science" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Miscellaneous Statistics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Sociology" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;After finding the &lt;a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/he_shared_an_of.html"&gt;Howard Wainer interview&lt;/a&gt;, I looked up the entire series of &lt;a href="https://www.aera.net/publications/Default.aspx?menu_id=40&amp;id=6198"&gt;Profiles in Research&lt;/a&gt; published by the Journal of Educational and Behavioral Statistics.  I don't have much to say about most of these interviews:  some of these people I'd never heard of, and I don't really have much research overlap with the others.  Probably I have the most overlap with R. D. Bock, who's done a lot of work on multilevel modeling, but, for whatever reason, his stories didn't grab my interest.&lt;/p&gt;

&lt;p&gt;But I was curious about the interview with Arthur Jensen.  I've never met him--he gave a talk at the Berkeley statistics department once when I was there, but for some reason I wasn't able to attend the talk.  But I've heard of him.  As the interviewers (Daniel Robinson and Howard Wainer) state:&lt;/p&gt;
        &lt;blockquote&gt;Dr. Jensen has authored over 435 articles, books, and book chapters and is perhaps best known for his controversial 123-page article that appeared in the Harvard Educational Review in 1969. In the article, Dr. Jensen concluded that the differences between Whites and Blacks on IQ tests were attributable to inherent intellectual differences between the two races. In 1980, his Bias in Mental Testing book concluded that intelligence tests were not biased against Blacks, resulting in even more controversy.&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="http://jeb.sagepub.com/cgi/reprint/31/3/327.pdf?ijkey=jrQgD7VSKj3RU&amp;keytype=ref&amp;siteid=spjeb"&gt;The interview&lt;/a&gt; had some interesting bits.  First, something on education and individual differences:&lt;/p&gt;

&lt;blockquote&gt;The problems of schooling illustrate the first and second laws of individual differences. I call them laws because they are demonstrated without exception both in the psychological laboratory and in "real life." Unfortunately, they happen to contradict the popular faith in education as the "great leveler." The first law is that individual differences in learning and performance increase as task complexity increases. The second law is that individual differences in performance increase with continuing practice and experience, unless the particular task imposes an artificially low ceiling on proficiency.

&lt;p&gt;One notable consequence of these laws is that successful attempts to raise performance by improving methods and amounts of instruction raises the overall mean of the treated group but at the same time widens the distribution of individual differences. The very same effect also applies to group differences. A benefit of raising the overall educational level of the whole population is that it moves a greater proportion of the population above the threshold levels of knowledge and skill required for gainful employment. The downside is the resulting increase in individual and group differences.&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;Then there are his ruminations on "g":&lt;/p&gt;

&lt;blockquote&gt;The educated public today knows of Newton's law of gravitation, Darwin's natural selection, and Einstein's equivalence of mass and energy. They should also know about Spearman's g. Discovered in 1904, g is an essential concept for understanding variation in human abilities. . . . At the top of the factor hierarchy is g, the most general factor. Every cognitive ability that shows individual differences is loaded on the g factor. Tests differ in their g loadings, but their g loadings are not related to any particular knowledge or skills assessed by the various tests. So the possible indicators of g are of unlimited diversity. . . . 

&lt;p&gt;It is also important to understand what g is not. It is not a mixture or average of a number of diverse tests representing many different abilities. Rather, it is a distillate, representing the single factor that all different manifestations of cognition have in common. In fact, g is not really an ability at all. It does not reflect the tests' contents per se, or any particular kind of performance. &lt;em&gt;It defies description in psychological terms.&lt;/em&gt; [italics added] Actually, it reflects some properties of the brain that cause diverse forms of cognitive activity to be positively correlated, not only in psychometric tests but in all of life's mental demands. IQ scores are an attempt to estimate g. But because IQ is just a vehicle for g, it inevitably reflects other broad factors as well, such as verbal, numerical, and spatial abilities, and the specific properties of the particular IQ test. Yet, g is the sine qua non of all IQ tests.&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;A bit over the top, no?  I mean, I'm a political scientist and I think party id and ideology are important, and I even talk about our conceptual model in which each person has a position on a left-right scale and can get shifted by valence issues etc etc--but we know not to take that stuff too seriously!&lt;/p&gt;

&lt;p&gt;The most interesting part of the interview, from a historical perspective, was Jensen's discussion of the reaction to his papers.  He tells a story in which he was giving a professional lecture in Chicago that was disrupted by 100 protesters who had infiltrated themselves into the crowd--and then he was rescued from the demonstration by a group of 10 police officers who had infiltrated themselves into the infiltrators.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/mdTW-zTenqE" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/07/the_arthur_jens.html</feedburner:origLink></entry>

<entry>
    <title>More on the median voter</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/rmnn0ywvIXg/more_on_the_med.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2542</id>

    <published>2009-07-01T20:03:43Z</published>
    <updated>2009-07-01T19:58:59Z</updated>

    <summary>A correspondent read my recent note on the limited influence of the median voter and writes: My understanding of median voter theorem is that each election has its own median voter, and that the median voter's influence is limited to...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;A correspondent read my recent note on &lt;a href="http://www.fivethirtyeight.com/2009/06/limited-influence-of-median-voter.html"&gt;the limited influence of the median voter&lt;/a&gt; and writes:&lt;/p&gt;

&lt;blockquote&gt;My understanding of median voter theorem is that each election has its own median voter, and that the median voter's influence is limited to the outcome of that election only. I don't understand, then, why the graph in your post is evidence that the median voter has little influence. It seems to me that there are two elections being considered in that graph, with two different median voters. The graph appears to consider "moderation" to be having a moderate voting record in Congress, but it seems to me that the median voter in Congress is likely quite different from the median voter in any particular Congressional district. The power of the median voter in Congress, it seems to me, is to affect the outcome of Congressional votes, not to improve his own chances for re-election, which are determined by his proximity to the median voter in his district. Thus, I'm not sure why we would expect moderation, as measured by the median Congressional voter, to translate into electoral success, which we would expect to be determined by the median district voter.&lt;/blockquote&gt;

&lt;p&gt;My reply:&lt;/p&gt;
        &lt;p&gt;Yes, there are two medians:  the median congressmember (or maybe the 60th-most-liberal senator), and the median voter in any congressional district or state.&lt;/p&gt;

&lt;p&gt;I definitely agree with your point about the median congressmember.  As I wrote in the blog entry you cited, "Certainly the median congressmember is important: by definition, it's that marginal vote you need to get a majority. But where do the median congressmember's positions come from?"&lt;/p&gt;

&lt;p&gt;What our graph showed was that it's not as important as you might think for a congressmember to be near the median voter in his or her congressional district.  This was the point that I was focusing on, because this was the point being made by various pundits:  Ben Nelson can't be too liberal because he's representing the people of Nebraska; or, Many Democrats in Congress represent moderate-to-conservative districts, so therefore they can't be too liberal; or, There's no way Olympia Snowe can get away with voting against Obama all the time, given that Maine is a strongly Democratic state; etc.  These arguments have some force--Ben Nelson, Olympia Snowe, etc., certainly could lose their seats--but the evidence shows that the benefits from moderation aren't huge.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/rmnn0ywvIXg" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/07/more_on_the_med.html</feedburner:origLink></entry>

<entry>
    <title>Should Mark Sanford resign?</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/Ewgm7XyJ5RA/should_mark_san.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2543</id>

    <published>2009-07-01T03:20:14Z</published>
    <updated>2009-07-01T03:36:31Z</updated>

    <summary>At our sister blog, Tom Schaller says no: Is Sanford a cad for bolting his family on Father's Day weekend? Of course, but that is a private, moral failing, rather than a failure of public duty. . . . I...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Statistical graphics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;At our sister blog, Tom Schaller says &lt;a href="http://www.fivethirtyeight.com/2009/06/should-sanford-resign.html"&gt;no&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;Is Sanford a cad for bolting his family on Father's Day weekend? Of course, but that is a private, moral failing, rather than a failure of public duty. . . .
&lt;p&gt;
I [Schaller] oppose most of what Mr. Sanford stands for politically. His showy rejection of federal stimulus money targeted for his state was a crass publicity stunt designed to garner national attention for Mr. Sanford at the expense of his constituents, many of whom are struggling economically. . . . Should Mr. Sanford's ambitions founder on the shoals of a personal scandal, however, yet another opportunity will be lost to establish the long-overdue separation between private comportment and public service. So here's hoping he doesn't resign or, if he does, it is a matter of personal choice rather than him bowing to political pressure.&lt;/blockquote&gt;

&lt;p&gt;I see where Schaller is coming from.  Lots of people have complicated personal lives, and it's not clear at all that these difficulties have much if anything to do with governing.  But I don't know if I agree with him on the wall of separation between private comportment and public service.&lt;/p&gt;

&lt;p&gt;Consider the Sanford case.  Schaller's a Democrat, so he can evaluate Sanford on his policies.  But if Schaller were a Republican, he might very well want Sanford out of there because he tarnishes the brand, makes the party a laughingstock, etc.  Also makes it harder for Sanford to convincingly follow a "family values" agenda which Schaller (if he were a Republican) might want.  These are legitimate concerns for a Republican to have.  Even if you don't think Sanford's personal indiscretions are important, you might want him gone and replaced by a more effective Republican.  Just as, from the other direction, a Democrat would've preferred a zipped-fly version of Bill Clinton.&lt;/p&gt;
        &lt;p&gt;But the first thing I noticed in Schaller's otherwise excellent post were the ugly pie charts.  Boy are they ugly.  Damn!  Some quick points:&lt;br /&gt;
- The wedges aren't labeled directly.  Instead, the reader has to go back and forth, back and forth, between the chart and the legend.&lt;br /&gt;
- The color schemes are a mess.  The top graph goes from blue to purple to yellow to green??&lt;br /&gt;
- The responses are ordered, and the pie obscures this by being circular.  For example, in the top graph, the natural order is More, Same, Less (with Don't Know as a separate category); in the second graph, Yes, Not Sure, No.&lt;br /&gt;
- The goofy orientation of the second graph makes it hard to see that the blue area ("Yes") is larger than the red area ("No").&lt;br /&gt;
- On the plus side, the charts are reasonably sided (not too large, not too small), have clear titles, are unambiguously labeled, and are not tilted or 3-D (thus, areas actually do represent proportions).&lt;/p&gt;

&lt;p&gt;These aren't hard-and-fast rules.  The real point is that it's hard for me to just look at the pie charts and see what's going on.  There are too many colors, legends, numbers, etc., floating around.  When all is said and done, I guess the charts aren't horrible, but they're the graphical equivalent of meandering, hard-to-follow paragraphs.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/Ewgm7XyJ5RA" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/should_mark_san.html</feedburner:origLink></entry>

<entry>
    <title>Visualizing correlations circularly</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/M6FyqgjWx1Y/visualizing_tab.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2372</id>

    <published>2009-06-30T14:29:34Z</published>
    <updated>2009-06-30T14:37:26Z</updated>

    <summary>Some time ago FlowingData had an article on visualizing tables - which really is about visualizing spreadsheets in terms of correlations between columns. While Circos generates very colorful displays: Today I was impressed by a much cleaner and Tuftier variant...</summary>
    <author>
        <name>Aleks Jakulin</name>
        <uri>http://stat.columbia.edu/~jakulin</uri>
    </author>
    
        <category term="Statistical graphics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;Some time ago FlowingData had an article on &lt;a href="http://flowingdata.com/2009/04/21/visual-representation-of-tabular-information-how-to-fix-the-uncommunicative-table/"&gt;visualizing tables&lt;/a&gt; - which really is about visualizing spreadsheets in terms of correlations between columns. While &lt;a href="http://srs.bcgsc.bc.ca/circos/"&gt;Circos&lt;/a&gt; generates very colorful displays:&lt;/p&gt;

&lt;p&gt;&lt;span class="mt-enclosure mt-enclosure-image" style="display: inline;"&gt;&lt;img alt="circos.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/circos.png" width="379" height="368" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Today I was impressed by a much cleaner and Tuftier variant on the theme by Mike Bostock, called &lt;a href="http://cs.stanford.edu/people/mbostock/iv/dependency-tree.html"&gt;Dependency Tree&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;span class="mt-enclosure mt-enclosure-image" style="display: inline;"&gt;&lt;a href="http://cs.stanford.edu/people/mbostock/iv/dependency-tree.html"&gt;&lt;img alt="dependency-tree.png" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/dependency-tree.png" width="450" height="343" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Click on the link, it's interactive. &lt;a href="http://jheer.org/"&gt;Jeff Heer&lt;/a&gt; and Bostock also have a new JavaScript visualization toolkit out &lt;a href="http://vis.stanford.edu/protovis/"&gt;ProtoVis&lt;/a&gt;, which simplifies the creation of such stuff. The computer scientist in me finds this development very cool. But I still like my &lt;a href="http://www.stat.columbia.edu/~jakulin/Politics/matrix.png"&gt;correlation matrices&lt;/a&gt;.&lt;/p&gt;
        
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/M6FyqgjWx1Y" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/visualizing_tab.html</feedburner:origLink></entry>

<entry>
    <title>"A paved United States in our day"</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/trVjt77grwo/good_roads_ever_1.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2538</id>

    <published>2009-06-30T12:07:38Z</published>
    <updated>2009-06-30T12:17:56Z</updated>

    <summary>Sometimes you hear discussion of how the red states get more from the government than they pay in taxes while the blue states get less and pay more. This is slightly misleading because the blue states are richer and rich...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;Sometimes you hear discussion of how the red states get more from the government than they pay in taxes while the blue states get less and pay more.  This is slightly misleading because the blue states are richer and rich people pay a higher rate of income tax, but it does raise the interesting question of the regionally distributive effects of national taxing and spending poliicies.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.stat.columbia.edu/~cook/movabletype/mlm/6a00d8341c6d6753ef010536249841970b-800wi.jpg"&gt;&lt;span class="mt-enclosure mt-enclosure-image" style="display: inline;"&gt;&lt;img alt="minimap.jpg" src="http://www.stat.columbia.edu/~cook/movabletype/mlm/minimap.jpg" width="400" height="274" class="mt-image-center" style="text-align: center; display: block; margin: 0 auto 20px;" /&gt;&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For some perspective on where this is coming from:  In our office is a map from 1924 titled "Good Roads Everywhere" that shows a proposed system of highways spanning the country, "to be built and forever maintained by the United States Government." The map, made by the National Highways Association, also includes the following explanation for the proposed funding system:  "Such a system of National Highways will be paid for out of general taxation.  The 9 rich densely populated northeastern States will pay over 50 per cent of the cost. They can afford to, as they will gain the most.  Over 40 per cent will be paid for by the great wealthy cities of the Nation. . . . The farming regions of the West, Mississippi Valley, Southwest and South will pay less than 10 per cent of the cost and get 90 per cent of the mileage." Beyond its quaint slogans ("A paved United States in our day") and ideas that time has passed by ("Highway airports"), the map gives a sense of the potential for federal taxing and spending to transfer money between states and regions.&lt;/p&gt;

&lt;p&gt;P.S.  Yes, I posted this last year, but without the pretty map image (click on it for higher resolution, which unfortunately still isn't quite good enough to make out the text)..&lt;/p&gt;
        
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/trVjt77grwo" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/good_roads_ever_1.html</feedburner:origLink></entry>

<entry>
    <title>He shared an office with an assortment of mops and brooms</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/SMjiFjbY-Ss/he_shared_an_of.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2539</id>

    <published>2009-06-29T20:19:32Z</published>
    <updated>2009-06-29T20:27:30Z</updated>

    <summary>The Howard Wainer story. On of the fun parts is this story from his days as an assistant professor:...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Teaching" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;&lt;a href="http://www.stat.columbia.edu/~gelman/stuff_for_blog/wainer.pdf"&gt;The Howard Wainer story&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;On of the fun parts is this story from his days as an assistant professor:&lt;/p&gt;
        &lt;blockquote&gt;Soon after I [Wainer] arrived, a parade of students and faculty came to my door asking for help. . . . It wasn't long before every spare minute was used up doing analyses for others. I felt useful, but a bit overwhelmed. About mid-year I was back in Princeton having lunch with Harold [Gulliksen], and when he asked about my research, I grimaced and told him that there was no time. He asked what was taking it all up, and I explained. His advice was sage and practical. He told me that I should remember that my goal was not to help the students get their projects done, but rather help them learn something. He suggested a 4-step solution:
&lt;p&gt;&lt;p&gt;
1. Ask all who come for a consultation to prepare first a one-paragraph description of their problem and give it to me a day or two in advance, so I might be able to think about it (this alone cut back on the line by 30-50%).

&lt;p&gt;2. Prepare an annotated bibliography.&lt;/p&gt;

&lt;p&gt;3. Check off the appropriate reading on the bibliography and give that to the student.&lt;/p&gt;

&lt;p&gt;4. Only if I didn't know an appropriate reading should I meet face-to-face with the student.&lt;/p&gt;

&lt;p&gt;I [Wainer] followed this advice and found that, once students realized that they would have to do something themselves, the torrent of help-seekers shrank to a trickle.&lt;/blockquote&gt;&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;But the "prepare an annotated bibliography" step seems like a lot of work!  How did he find the time to do that?&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/SMjiFjbY-Ss" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/he_shared_an_of.html</feedburner:origLink></entry>

<entry>
    <title>Casey Mulligan is consistent</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/0rnUHDu64hA/casey_mulligan.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2537</id>

    <published>2009-06-29T04:12:23Z</published>
    <updated>2009-06-29T04:12:54Z</updated>

    <summary>Back in April, in an article about partisan perceptions of the economy, John Sides and I wrote:...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Economics" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Political Science" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;Back in April, in an article about partisan perceptions of the economy, John Sides and I &lt;a href="http://www.fivethirtyeight.com/2009/04/red-and-blue-economies.html"&gt;wrote&lt;/a&gt;:&lt;/p&gt;
        &lt;blockquote&gt;Democrats feel better about the economy when Democrats are in power, and Republicans feel better when their party rules. What's striking, though, is how quickly these perceptions can change.

&lt;p&gt;For example, in mid-September, John McCain notoriously said, "The fundamentals of our economy are still strong." But then in early March, he said that the American people "want to know how we got into this ditch--the worst economic crisis since the great Depression." Based on these two statements, the slide into the ditch apparently occurred sometime between September 16 and March 3.&lt;/p&gt;

&lt;p&gt;Similarly, University of Chicago economist Casey Mulligan spent the end of 2008 arguing that the economy is just not that bad, but then changed course in March, writing that "the crash of 2008 did not bother me" but "the crash of 2009 is more worrisome . . . So far productivity has been good in this recession, but 2009's stock market could well see that changing."&lt;/p&gt;

&lt;p&gt;It's no surprise that John McCain and Casey Mulligan's views on the economy differ from those of Rahm Emanuel and Paul Krugman, or for that matter Barack Obama, who just last week was beginning to see "glimmers of hope" in the economy. . . .&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;I don't know about John McCain and Barack Obama, but I recently checked on Casey Mulligan, and I'm pleased to report that he does &lt;em&gt;not&lt;/em&gt; seem to have shown a partisan tack in his statements about the economy.  For example, some recent posts:&lt;/p&gt;

&lt;blockquote&gt;&lt;a href="http://caseymulligan.blogspot.com/2009/06/real-disposable-personal-income-per_26.html"&gt;Real Disposable Personal Income per Capita Higher than Ever&lt;/a&gt;
The BEA reported that real disposable personal income was $2478 per person in May 2009. The only month in U.S. history higher than that was May 2008 ($2499). Based on the recent trends, I [Mulligan] expect that June 2009 (which is almost over) will have the highest real disposable personal income ever.&lt;/blockquote&gt;

&lt;blockquote&gt;&lt;a href="http://caseymulligan.blogspot.com/2009/06/592-per-person.html"&gt;$592 per person&lt;/a&gt;
Compared to a world in which real GDP remained at the (thusfar) all time high (achieved in 2008 Q2), the BEA's report this morning shows that through 2009 Q1 the U.S. economy had lost $181 billion (measured at 2008 Q4 prices).  $181 billion is equivalent to:
&lt;blockquote&gt;
    - $592 per person, which is equivalent to

&lt;p&gt;    - 4.6 days of GDP (that is, we are producing like we took 3 weeks of vacation per year, instead of two)&lt;/blockquote&gt;&lt;/blockquote&gt;&lt;/p&gt;

&lt;p&gt;Mulligan clearly has a partisan perspective, but his take on the economy--that things aren't going so badly--has been broadly consistent since the fall of 2008.  He does not seem to have changed this view or applied any partisan filters in response to the change in power in Washington.&lt;/p&gt;

&lt;p&gt;I know nothing about macroeconomics--even less than I know about the EM algorithm (that's an inside joke; Xiao-Li can explain it to you)--and I am not trying in any way to agree or disagree with Mulligan's analyses (not that my position on this matter would mean anything, anyway).  I just wanted to follow up on my earlier offhand remark that had implied that Mulligan had changed course following the change in administration.&lt;/p&gt;
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/0rnUHDu64hA" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/casey_mulligan.html</feedburner:origLink></entry>

<entry>
    <title>A scary thought</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/lPZ7aZ_MbR4/a_scary_thought.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2388</id>

    <published>2009-06-29T03:51:54Z</published>
    <updated>2009-06-28T23:10:29Z</updated>

    <summary>A colleague and I were talking the other day about how much we pay our research assistants. It turns out that she pays much more. In fact, sometimes I don't get around to paying my research assistants at all, but...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Economics" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;A colleague and I were talking the other day about how much we pay our research assistants.  It turns out that she pays much more.  In fact, sometimes I don't get around to paying my research assistants at all, but she pays hers a decent amount.&lt;/p&gt;

&lt;p&gt;My colleague, who's an untentured professor, said that was understandable because she makes less money than I do, so she can better relate to the students' lifestyles.  That's a pretty scary thought--it should really go the other way, right?  I get paid more so I should be able to afford to be more generous.  But maybe she's right; if so, it's a sobering insight.&lt;/p&gt;
        
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/lPZ7aZ_MbR4" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/a_scary_thought.html</feedburner:origLink></entry>

<entry>
    <title>Some NIH-funded projects are less than earthshaking</title>
    <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~3/Yyc_p9RlFEs/some_nih-funded.html" />
    <id>tag:www.stat.columbia.edu,2009:/~cook/movabletype/mlm//1.2535</id>

    <published>2009-06-27T21:59:17Z</published>
    <updated>2009-06-27T22:30:24Z</updated>

    <summary>One major impediment, scientists agree, is the grant system itself. It has become a sort of jobs program, a way to keep research laboratories going year after year . . . I was on an NIH panel a couple of...</summary>
    <author>
        <name>Andrew Gelman</name>
        <uri>http://www.stat.columbia.edu/~gelman</uri>
    </author>
    
        <category term="Sociology" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.stat.columbia.edu/~cook/movabletype/mlm/">
        &lt;p&gt;&lt;a href="http://www.nytimes.com/2009/06/28/health/research/28cancer.html?_r=1&amp;partner=rss&amp;emc=rss"&gt;One major impediment, scientists agree, is the grant system itself. It has become a sort of jobs program, a way to keep research laboratories going year after year . . .&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I was on an NIH panel a couple of years ago with about 25 other scientists, reviewing something like 90 grants.  It was pointless.  25 people is just too many to make a decision.  What happened was that there were 3 or 4 people who were experienced in the process, who ended up guiding the entire discussion.&lt;/p&gt;

&lt;p&gt;The highlight--or, I should say, lowlight--was when we were reviewing a proposal involving the study of the carcinogenic effects of hookah (water pipe) smoking.  I asked if this was really such a big deal, and one of the panel members told me that smoking tobacco through a hookah is something like 10 times worse than smoking a cigarette.  If so, the public health consequences could be pretty serious, even if not so many people did it.  I said this sounded like a reasonable point to me.  Then this guy across the table from me spoke up and said that he knew somebody who was 80 years old, had been smoking with a hookah all his life and was none the worse from it.  At this point, I blew up.  I couldn't believe that the "my elderly aunt smokes and she didn't get cancer" argument could be brought up at an NIH panel!&lt;/p&gt;
        
    &lt;img src="http://feeds.feedburner.com/~r/StatisticalModelingCausalInferenceAndSocialScience/~4/Yyc_p9RlFEs" height="1" width="1"/&gt;</content>
<feedburner:origLink>http://www.stat.columbia.edu/~cook/movabletype/archives/2009/06/some_nih-funded.html</feedburner:origLink></entry>

</feed>
