The post Inventor of Arxiv speaks at Columbia this Tues 4pm appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I [Ginsparg] will give a very brief sociological overview of the current metastable state of scholarly research communication, and then a technical discussion of the practical implications of literature and usage data considered as computable objects, using arXiv as exemplar. Some of these algorithms scale to larger data sets.

The post Inventor of Arxiv speaks at Columbia this Tues 4pm appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Forget about pdf: this looks much better, it makes all my own papers look like kids’ crayon drawings by comparison. appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Forget about pdf: this looks much better, it makes all my own papers look like kids’ crayon drawings by comparison. appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Which of these classes should he take? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I for many years wanted to pursue medicine but after recently completing a master of public health, I caught the statistics bug. I need to complete the usual minimum prerequisites for graduate study in statistics (calculus through multivariable calculus plus linear algebra) but want to take additional math courses as highly competitive stats and biostats programs either require or highly recommend more than the minimum. I could of course end up earning a whole other bachelor degree in math if I tried to take all the recommended courses. Could you please rank the following courses according to importance/practical utility in working in statistics and in applying for a competitive stats PhD program? This would greatly assist me in prioritising which courses to complete.

1. Mathematical modeling

2. Real analysis

3. Complex analysis

4. Numerical analysis

My quick advice:

– “Mathematical modeling”: I don’t know what’s in this class. But, from the title, it seems very relevant to statistics.

– “Real analysis”: Not so relevant to real-world statistics but important for PhD applications because it’s a way to demonstrate that you understand math. And understanding math _is_ important to real-world statistics. Thus, the point of a “real analysis” class for a statistician is not so much that you learn real analysis, which is pretty irrelevant for most things, but that it demonstrates that you can do real analysis.

– “Complex analysis”: A fun topic but you’ll probably never ever need it, so no need to take this one.

– “Numerical analysis”: I don’t know what’s in this class. You could take it but it’s not really necessary.

The post Which of these classes should he take? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “The general problem I have with noninformatively-derived Bayesian probabilities is that they tend to be too strong.” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Continuing from yesterday‘s quotation of my 2012 article in Epidemiology:

Like many Bayesians, I have often represented classical confidence intervals as posterior probability intervals and interpreted one-sided p-values as the posterior probability of a positive effect. These are valid conditional on the assumed noninformative prior but typically do not make sense as unconditional probability statements.

The general problem I have with noninformatively-derived Bayesian probabilities is that they tend to be too strong. At first this may sound paradoxical, that a noninformative or weakly informative prior yields posteriors that are too forceful—and let me deepen the paradox by stating that a stronger, more informative prior will tend to yield weaker, more plausible posterior statements.

How can it be that adding prior information weakens the posterior? It has to do with the sort of probability statements we are often interested in making. Here is an example from Gelman and Weakliem (2009). A sociologist examining a publicly available survey discovered a pattern relating attractiveness of parents to the sexes of their children. He found that 56% of the children of the most attractive parents were girls, compared to 48% of the children of the other parents, and the difference was statistically significant at p<0.02. The assessments of attractiveness had been performed many years before these people had children, so the researcher felt he had support for a claim of an underlying biological connection between attractiveness and sex ratio.

The original analysis by Kanazawa (2007) had multiple comparisons issues, and after performing a regression rather than selecting the most significant comparison, we get a p-value closer to 0.2 rather than the stated 0.02. For the purposes of our present discussion, though, in which we are evaluating the connection between p-values and posterior probabilities, it will not matter much which number we use. We shall go with p=0.2 because it seems like a more reasonable analysis given the data.

Let θ be the true (population) difference in sex ratios of attractive and less attractive parents. Then the data under discussion (with a two-sided p-value of 0.2), combined with a uniform prior on θ, yields a 90% posterior probability that θ is positive. Do I believe this? No. Do I even consider this a reasonable data summary? No again. We can derive these No responses in three different ways, first by looking directly at the evidence, second by considering the prior, and third by considering the implications for statistical practice if this sort of probability statement were computed routinely.

First off, a claimed 90% probability that θ>0 seems too strong. Given that the p-value (adjusted for multiple comparisons) was only 0.2—that is, a result that strong would occur a full 20% of the time just by chance alone, even with no true difference—it seems absurd to assign a 90% belief to the conclusion. I am not prepared to offer 9 to 1 odds on the basis of a pattern someone happened to see that could plausibly have occurred by chance, nor for that matter would I offer 99 to 1 odds based on the original claim of the 2% significance level.

Second, the prior uniform distribution on θ seems much too weak. There is a large literature on sex ratios, with factors such as ethnicity, maternal age, and season of birth corresponding to difference in probability of girl birth of less than 0.5 percentage points. It is a priori implausible that sex-ratio differences corresponding to attractiveness are larger than for these other factors. Assigning an informative prior centered on zero shrinks the posterior toward zero, and the resulting posterior probability that θ>0 moves to a more plausible value in the range of 60%, corresponding to the idea that the result is suggestive but not close to convincing.

Third, consider what would happen if we routinely interpreted one-sided p-values as posterior probabilities. In that case, an experimental result that is 1 standard error from zero—that is, exactly what one might expect from chance alone—would imply an 83% posterior probability that the true effect in the population has the same direction as the observed pattern in the data at hand. It does not make sense to me to claim 83% certainty—5 to 1 odds—based on data that not only could occur by chance but in fact represent an expected level of discrepancy. This system-level analysis accords with my criticism of the flat prior: as Greenland and Poole note in their article, the effects being studied in epidemiology are typically range from -1 to 1 on the logit scale, hence analyses assuming broader priors will systematically overstate the probabilities of very large effects and will overstate the probability that an estimate from a small sample will agree in sign with the corresponding population quantity.

Rather than relying on noninformative priors, I prefer the suggestion of Greenland and Poole to bound posterior probabilities using real prior information.

OK, I did discuss some buffoonish research here. But, look, no mockery! I was using the silly stuff as a lever to better understand some statistical principles. And that’s ok.

The post “The general problem I have with noninformatively-derived Bayesian probabilities is that they tend to be too strong.” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post There are 6 ways to get rejected from PLOS: (1) theft, (2) sexual harassment, (3) running an experiment without a control group, (4) keeping a gambling addict away from the casino, (5) chapter 11 bankruptcy proceedings, and (6) having no male co-authors appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>[The author] and her colleague have appealed to the unnamed journal, which belongs to the PLoS family . . .

I thought PLOS published just about everything! This is not a slam on PLOS. Arxiv publishes everything too, and Arxiv is great.

The funny thing is, I do think there are cases where having both male and female coauthors gives a paper more credibility, sometimes undeserved. For example, if you take a look at those papers on ovulation and voting, and ovulation and clothing, and fat arms and political attitudes, you’ll see these papers have authors of both sexes, which insulates them from the immediate laugh-them-out-of-the-room reaction that they might get were they written by men only. Having authors of both sexes does not of course exempt them from direct criticisms of the work; I just think that a paper on “that time of the month” written by men would, for better or worse, get a more careful review.

**P.S.** Also, one thing I missed in my first read of this story: the reviewer wrote:

Perhaps it is not so surprising that on average male doctoral students co-author one more paper than female doctoral students, just as, on average, male doctoral students can probably run a mile race a bit faster than female doctoral students . . . And it might well be that on average men publish in better journals . . . perhaps simply because men, perhaps, on average work more hours per week than women, due to marginally better health and stamina.

“Marginally better health and stamina”—that’s a laff and a half! Obviously this reviewer is no actuary and doesn’t realize that men die at a higher rate than women at every age.

On the plus side, it’s pretty cool that James Watson is still reviewing journal articles, giving something back to the community even in retirement. Good on ya, Jim! Don’t let the haters get you down.

The post There are 6 ways to get rejected from PLOS: (1) theft, (2) sexual harassment, (3) running an experiment without a control group, (4) keeping a gambling addict away from the casino, (5) chapter 11 bankruptcy proceedings, and (6) having no male co-authors appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Good, mediocre, and bad p-values appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>In theory the p-value is a continuous measure of evidence, but in practice it is typically trichotomized approximately into strong evidence, weak evidence, and no evidence (these can also be labeled highly significant, marginally significant, and not statistically significant at conventional levels), with cutoffs roughly at p=0.01 and 0.10.

One big practical problem with p-values is that they cannot easily be compared. The difference between a highly significant p-value and a clearly non-significant p-value is itself not necessarily statistically significant. . . . Consider a simple example of two independent experiments with estimates ± standard error of 25 ± 10 and 10 ± 10. The first experiment is highly statistically significant (two and a half standard errors away from zero, corresponding to a (normal-theory) p-value of about 0.01) while the second is not significant at all. Most disturbingly here, the difference is 15 ± 14, which is not close to significant . . .

In short, the p-value is itself a statistic and can be a noisy measure of evidence. This is a problem not just with p-values but with any mathematically equivalent procedure, such as summarizing results by whether the 95% confidence interval includes zero.

Good, mediocre, and bad p-valuesFor all their problems, p-values sometimes “work” to convey an important aspect of the relation of data to model. Other times a p-value sends a reasonable message but does not add anything beyond a simple confidence interval. In yet other situations, a p-value can actively mislead. Before going on, I will give examples of each of these three scenarios.

A p-value that worked.Several years ago I was contacted by a person who suspected fraud in a local election (Gelman, 2004). Partial counts had been released throughout the voting process and he thought the proportions for the different candidates looked suspiciously stable, as if they had been rigged ahead of time to aim for a particular result. Excited to possibly be at the center of an explosive news story, I took a look at the data right away. After some preliminary graphs—which indeed showed stability of the vote proportions as they evolved during election day—I set up a hypothesis test comparing the variation in the data to what would be expected from independent binomial sampling. When applied to the entire dataset (27 candidates running for six offices), the result was not statistically significant: there was no less (and, in fact, no more) variance than would be expected by chance. In addition, an analysis of the 27 separate chi-squared statistics revealed no particular patterns. I was left to conclude that the election results were consistent with random voting (even though, in reality, voting was certainly not random—for example, married couples are likely to vote at the same time, and the sorts of people who vote in the middle of the day will differ from those who cast their ballots in the early morning or evening), and I regretfully told my correspondent that he had no case.In this example, we did not interpret a non-significant result as a claim that the null hypothesis was true or even as a claimed probability of its truth. Rather, non-significance revealed the data to be compatible with the null hypothesis; thus, my correspondent could not argue that the data indicated fraud.

A p-value that was reasonable but unnecessary.It is common for a research project to culminate in the estimation of one or two parameters, with publication turning on a p-value being less than a conventional level of significance. For example, in our study of the effects of redistricting in state legislatures (Gelman and King, 1994), the key parameters were interactions in regression models for partisan bias and electoral responsiveness. Although we did not actually report p-values, we could have: what made our paper complete was that our findings of interest were more than two standard errors from zero, thus reaching the p<0.05 level. Had our significance level been much greater (for example, estimates that were four or more standard errors from zero), we would doubtless have broken up our analysis (for example, separately studying Democrats and Republicans) in order to broaden the set of claims that we could confidently assert. Conversely, had our regressions not reached statistical significance at the conventional level, we would have performed some sort of pooling or constraining of our model in order to arrive at some weaker assertion that reached the 5% level. (Just to be clear: we are not saying that we would have performed data dredging, fishing for significance; rather, we accept that sample size dictates how much we can learn with confidence; when data are weaker, it can be possible to find reliable patterns by averaging.)In any case, my point here is that in this example it would have been just fine to summarize our results in this example via p-values even though we did not happen to use that formulation.

A misleading p-value.Finally, in many scenarios p-values can distract or even mislead, either a non-significant result wrongly interpreted as a confidence statement in support of the null hypothesis, or a significant p-value that is taken as proof of an effect. A notorious example of the latter is the recent paper of Bem (2011), which reported statistically significant results from several experiments on ESP. At brief glance, it seems impressive to see multiple independent findings that are statistically significant (and combining the p-values using classical rules would yield an even stronger result), but with enough effort it is possible to find statistical significance anywhere (see Simmons, Nelson, and Simonsohn, 2011).The focus on p-values seems to have both weakened the study (by encouraging the researcher to present only some of his data so as to draw attention away from non-significant results) and to have led reviewers to inappropriately view a low p-value (indicating a misfit of the null hypothesis to data) as strong evidence in favor of a specific alternative hypothesis (ESP) rather than other, perhaps more scientifically plausible alternatives such as measurement error and selection bias.

I’ve written on these issues in many other places but the questions keep coming up so I thought it was worth reposting.

Tomorrow I’ll highlight another part of this article, this time dealing with Bayesian inference.

The post Good, mediocre, and bad p-values appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Carl Morris: Man Out of Time [reflections on empirical Bayes] appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>When Carl Morris came to our department in 1989, I and my fellow students were so excited. We all took his class. The funny thing is, though, the late 1980s might well have been the worst time to be Carl Morris, from the standpoint of what was being done in statistics at that time—not just at Harvard, but in the field in general. Carl has made great contributions to statistical theory and practice, developing ideas which have become particularly important in statistics in the last two decades. In 1989, though, Carl’s research was not in the mainstream of statistics, or even of Bayesian statistics.

When Carl arrived to teach us at Harvard, he was both a throwback and ahead of his time.

Let me explain. Two central aspects of Carl’s research are the choice of probability distribution for hierarchical models, and frequency evaluations in hierarchical settings where both Bayesian calibration (conditional on inferences) and classical bias and variance (conditional on unknown parameter values) are relevant. In Carl’s terms, these are “NEF-QVF” and “empirical Bayes.” My point is: both of these areas were hot at the beginning of Carl’s career and they are hot now, but somewhere in the 1980s they languished.

In the wake of Charles Stein’s work on admissibility in the late 1950s there was an interest, first theoretical but with clear practical motivations, to produce lower-risk estimates, to get the benefits of partial pooling while maintaining good statistical properties conditional on the true parameter values, to produce the Bayesian omelet without cracking the eggs, so to speak. In this work, the functional form of the hierarchical distribution plays an important role—and in a different way than had been considered in statistics up to that point. In classical distribution theory, distributions are typically motivated by convolution properties (for example, the sum of two gamma distributions with a common shape parameter is itself gamma), or by stable laws such as the central limit theorem, or by some combination or transformation of existing distributions. But in Carl’s work, the choice of distribution for a hierarchical model can be motivated based on the properties of the resulting partially pooled estimates. In this way, Carl’s ideas are truly non-Bayesian because he is considering the distribution of the parameters in a hierarchical model not as a representation of prior belief about the set of unknowns, and not as a model for a population of parameters, but as a device to obtain good estimates.

So, using a Bayesian structure to get good classical estimates. Or, Carl might say, using classical principles to get better Bayesian estimates. I don’t know that they used the term “robust” in the 1950s and 1960s, but that’s how we could think of it now.

The interesting thing is, if we take Carl’s work seriously (and we should), we now have two principles for choosing a hierarchical model. In the absence of prior information about the functional form of the distribution of group-level parameters, and in the absence of prior information about the values of the hyperparameters that would underly such a model, we should use some form with good statistical properties. On the other hand, if we

dohave good prior information, we should of course use it—even R. A. Fisher accepted Bayesian methods in those settings where the prior distribution is known.But, then, what do we do in those cases in between—the sorts of problems that arose in Carl’s applied work in health policy and other areas? I learned from Carl to use our prior information to structure the model, for example to pick regression coefficients, to decide which groups to pool together, to decide which parameters to model as varying, and then use robust hierarchical modeling to handle the remaining, unexplained variation. This general strategy wasn’t always so clear in the theoretical papers on empirical Bayes, but it came through in the Carl’s applied work, as well as that of Art Dempster, Don Rubin, and others, much of which flowered in the late 1970s—not coincidentally, a few years after Carl’s classic articles with Brad Efron that put hierarchical modeling on a firm foundation that connected with the edifice of theoretical statistics, gradually transforming these ideas from a parlor trick into a way of life.

In a famous paper, Efron and Morris wrote of “Stein’s paradox in statistics,” but as a wise man once said, once something is understood, it is no longer a paradox. In un-paradoxing shrinkage estimation, Efron and Morris finished the job that Gauss, Laplace, and Galton had begun.

So far, so good. We’ve hit the 1950s, the 1960s, and the 1970s. But what happened next? Why do I say that, as of 1989, Carl’s work was “out of time”? The simplest answer would be that these ideas were a victim of their own success: once understood, no longer mysterious. But it was more than that. Carl’s specific research contribution was not just hierarchical modeling but the particular intricacies involved in the combination of data distribution and group-level model. His advice was not simply “do Bayes” or even “do empirical Bayes” but rather had to do with a subtle examination of this interaction. And, in the late 1980s and early 1990s, there wasn’t so much interest in this in the field of statistics. On one side, the anti-Bayesians were still riding high in their rejection of all things prior, even in some quarters a rejection of probability modeling itself. On the other side, a growing number of Bayesians—inspired by applied successes in fields as diverse as psychometrics, pharmacology, and political science—were content to just fit models and not worry about their statistical properties.

Similarly with empirical Bayes, a term which in the hands of Efron and Morris represented a careful, even precarious, theoretical structure intended to capture classical statistical criteria in a setting where the classical ideas did not quite apply, a setting that mixed estimation and prediction—but which had devolved to typically just be shorthand for “Bayesian inference, plugging in point estimates for the hyperparameters.” In an era where the purveyors of classical theory didn’t care to wrestle with the complexities of empirical Bayes, and where Bayesians had built the modeling and technical infrastructure needed to fit full Bayesian inference, hyperpriors and all, there was not much of a market for Carl’s hybrid ideas.

This is why I say that, at the time Carl Morris came to Harvard, his work was honored and recognized as pathbreaking, but his actual research agenda was outside the mainstream.

As noted above, though, I think things have changed. The first clue—although it was not at all clear to me at the time—was Trevor Hastie and Rob Tibshirani’s lasso regression, which was developed in the early 1990s and which has of course become increasingly popular in statistics, machine learning, and all sorts of applications. Lasso is important to me partly as the place where Bayesian ideas of shrinkage or partial polling entered what might be called the Stanford school of statistics. But for the present discussion what is most relevant is the centrality of the functional form. The point of lasso is not just partial pooling, it’s partial pooling with an exponential prior. As I said, I did not notice the connection with Carl’s work and other Stein-inspired work back when lasso was introduced—at that time, much was made of the shrinkage of certain coefficients all the way to zero, which indeed is important (especially in practical problems with large numbers of predictors), but my point here is that the ideas of the late 1950s and early 1960s again become relevant. It’s not enough just to say you’re partial pooling—it matters _how_ this is being done.

In recent years there’s been a flood of research on prior distributions for hierarchical models, for example the work by Nick Polson and others on the horseshoe distribution, and the issues raised by Carl in his classic work are all returning. I can illustrate with a story from my own work. A few years ago some colleagues and I published a paper on penalized marginal maximum likelihood estimation for hierarchical models using, for the group-level variance, a gamma prior with shape parameter 2, which has the pleasant feature of keeping the point estimate off of zero while allowing it to be arbitrarily close to zero if demanded by the data (a pair of properties that is not satisfied by the uniform, lognormal, or inverse-gamma distributions, all of which had been proposed as classes of priors for this model). I was (and am) proud of this result, and I linked it to the increasingly popular idea of weakly informative priors. After talking with Carl, I learned that these ideas were not new to me, indeed these were closely related to the questions that Carl has been wrestling with for decades in his research, as they relate both to the technical issue of the combination of prior and data distributions, and the larger concerns about default Bayesian (or Bayesian-like) inferences.

In short: in the late 1980s, it was enough to be Bayesian. Or, perhaps I should say, Bayesian data analysis was in its artisanal period, and we tended to be blissfully ignorant about the dependence of our inferences on subtleties of the functional forms of our models. Or, to put a more positive spin on things: when our inferences didn’t make sense, we changed our models, hence the methods we used (in concert with the prior information implicitly encoded in that innocent-sounding phrase, “make sense”) had better statistical properties than one would think based on theoretical analysis alone. Real-world inferences can be superefficient, as Xiao-Li Meng might say, because they make use of tacit knowledge.

In recent years, however, Bayesian methods (or, more generally, regularization, thus including lasso and other methods that are only partly in the Bayesian fold) have become routine, to the extent that we need to think of them as defaults, which means we need to be concerned about . . . their frequency properties. Hence the re-emergence of truly empirical Bayesian ideas such as weakly informative priors, and the re-emergence of research on the systematic properties of inferences based on different classes of priors or regularization. Again, this all represents a big step beyond the traditional classification of distributions: in the robust or empirical Bayesian perspective, the relevant properties of a prior distribution depend crucially on the data model to which it is linked.

So, over 25 years after taking Carl’s class, I’m continuing to see the centrality of his work to modern statistics: ideas from the early 1960s that were in many ways ahead of their time.

Let me conclude with the observation that Carl seemed to us to be a “man out of time” on the personal level as well. In 1989 he seemed ageless to us both physically and in his personal qualities, and indeed I still view him that way. When he came to Harvard he was not young (I suppose he was about the same age as I am now!) but he had, as the saying goes, the enthusiasm of youth, which indeed continues to stay with him. At the same time, he has always been even-tempered, and I expect that, in his youth, people remarked upon his maturity. It has been nearly fifty years since Carl completed his education, and his ideas remain fresh, and I continue to enjoy his warmth, humor, and insights.

The post Carl Morris: Man Out of Time [reflections on empirical Bayes] appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post What’s the most important thing in statistics that’s not in the textbooks? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>As I wrote a couple years ago:

Statistics does not require randomness. The three essential elements of statistics are measurement, comparison, and variation. Randomness is one way to supply variation, and it’s one way to model variation, but it’s not necessary. Nor is it necessary to have “true” randomness (of the dice-throwing or urn-sampling variety) in order to have a useful probability model.

For my money, the #1 neglected topic in statistics is **measurement**.

In most statistics texts that I’ve seen, there’s a lot on data analysis and some stuff on data collection—sampling, random assignment, and so forth—but nothing at all on measurement. Nothing on reliability and validity but, even more than that, nothing on the *concept* of measurement, the idea of considering the connection between the data you gather and the underlying object of your study.

It’s funny: the data model (the “likelihood”) is central to much of the theory and practice of statistics, but the steps that are required to make this work—the steps of measurement and assessment of measurements—are hidden.

When it comes to the question of how to take a sample or how to randomize, or the issues that arise (nonresponse, spillovers, selection, etc.) that interfere with the model, statistics textbooks take the practical issues seriously—even an intro statistics book will discuss topics such as blinding in experiments and self-selection in surveys. But when it comes to measurement, there’s silence, just an implicit assumption that the measurement is what it is, that it’s valid and that it’s as reliable as it needs to be.

**Bad things happen when we don’t think seriously about measurement**

And then what happens? Bad, bad things.

In education—even statistics education—we don’t go to the trouble of accurately measuring what students learn. Why? Part of it is surely that measurement takes effort, and we have other demands on our time. But it’s more than that. I think a large part is that we don’t carefully think about evaluation as a measurement issue and we’re not clear on what we want students to learn and how we can measure this. Sure, we have vague ideas, but nothing precise. In other aspects of statistics we aim for precision, but when it comes to measurement, we turn off our statistics brain. And I think this is happening, in part, because the topic of measurement is tucked away in an obscure corner of statistics and is then forgotten.

And in research too, we see big problems. Consider all those “power = .06″ experiments, these “Psychological Science”-style papers we’ve been talking so much about in recent years. A common thread in these studies is sloppy, noisy, biased measurement. Just a lack of seriousness about measurement and, in particular, a resistance to the sort of within-subject designs which much more directly measure the within-person variation that is often of interest in such studies.

Measurement, measurement, measurement. It’s central to statistics. It’s central to how we learn about the world.

The post What’s the most important thing in statistics that’s not in the textbooks? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Eccentric mathematician appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>What I liked about Wilkinson’s article is how it captured Zhang’s eccentricities with affection but without condescension. Zhang is not like the rest of us, but from reading the article, I get the sense of him as an individual, not defined by his mathematical abilities.

At one level, sure, duh: each of us is an individual. I’m an unusual person myself so maybe it’s a bit rich for me to put the “eccentric” label on some mathematician I’ve never met.

But I think there’s more to it than that. For one thing, I think the usual way to frame an article about someone like this is to present him as a one-of-a-kind genius, to share stories about how brilliant he is. Here, though, you get the idea that Zhang is a top mathematician but not that he has some otherworldly brilliance. Similarly, he solved a really tough problem but we don’t have to hear all about how he’s the greatest of all time. Rather, I get the idea from Wilkinson that Zhang’s life is worth living even if he hadn’t done this great work. Of course, without that, the idea for the article never would’ve come up in the first place, but still.

Here’s a paragraph. I don’t know if it conveys the feeling I’m trying to share but here goes:

Zhang met his wife, to whom he has been married for twelve years, at a Chinese restaurant on Long Island, where she was a waitress. Her name is Yaling, but she calls herself Helen. A friend who knew them both took Zhang to the restaurant and pointed her out. “He asked, ‘What do you think of this girl?'” Zhang said. Meanwhile, she was considering him. To court her, Zhang went to New York every weekend for several months. The following summer, she came to New Hampshire. She didn’t like the winters, though, and moved to California, where she works at a beauty salon. She and Zhang have a house in San Jose, and he spends school vacations there.

So gentle, both on the part of Zhang and of Wilkinson. New Yorker, E. B. White-style, and I mean that in a good way here. It could’ve come straight out of Charlotte’s Web. And it’s such a relief to read after all the Erdos-Feynman-style hype, not to mention all the latest crap about tech zillionaires. I just wish I could’ve met Stanislaw Ulam.

The post Eccentric mathematician appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post On deck this week appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>**Tues:** What’s the most important thing in statistics that’s not in the textbooks?

**Wed:** Carl Morris: Man Out of Time [reflections on empirical Bayes]

**Thurs:** “The general problem I have with noninformatively-derived Bayesian probabilities is that they tend to be too strong.”

**Fri:** Good, mediocre, and bad p-values

**Sat:** Which of these classes should he take?

**Sun:** Forget about pdf: this looks much better, it makes all my own papers look like kids’ crayon drawings by comparison.

The post On deck this week appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post This year’s Atlantic Causal Inference Conference: 20-21 May appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The conference will take place May 20-21 (with a short course on May 19th) and the web site for the conference is here. The deadline for submitting a poster title for the poster session is this Friday. Junior researchers (graduate students, postdoctoral fellows, and assistant professors) whose poster demonstrates exceptional research will also be considered for the Thomas R. Ten Have Award, which recognizes “exceptionally creative or skillful research on causal inference.” The two award winners will be invited to speak at the 2016 Atlantic Causal Inference Conference.

We held the first conference in this series ten years ago at Columbia, and I’m glad to see it’s still doing well.

The post This year’s Atlantic Causal Inference Conference: 20-21 May appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Statistical analysis on a dataset that consists of a population appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Donna Towns writes:

I am wondering if you could help me solve an ongoing debate?

My colleagues and I are discussing (disagreeing) on the ability of a researcher to analyze information on a population. My colleagues are sure that a researcher is unable to perform statistical analysis on a dataset that consists of a population, whereas I believe that statistical analysis is appropriate if you are testing future outcomes. For example, a group of inmates in a detention centre receive a new program. As it would contravene ethics, all offenders receive the program. Therefore, a researcher would need to compare a group of inmates prior to the introduction of the program. Assuming, or after confirm that these two populations are similar, are we able to apply statistical analysis to compare the outcomes of these to populations (such as time to return to detention)? If so, what would be the methodologies used? Do you happen to know of any articles that discuss this issue?

I replied with a link to this post from 2009, which concludes:

If you set up a model including a probability distribution for these unobserved outcomes, standard errors will emerge.

The post Statistical analysis on a dataset that consists of a population appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Statistical significance, practical significance, and interactions appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I thought about this today (OK, a couple months ago, what with our delay) when reading a post by Dan Kopf on the exaggeration of small truths. Or, to put it another way, statistically significant but not practically significant.

I’ll give you Kopf’s story and then explain how everything falls into place when we think about interactions.

Here’s Kopf:

Immediately after an album is released, the critics descend. The period from the first to last major review for an albums typically falls between 1-6 months. As time passes, the average review gets slightly worse. Assuming my methodology is appropriate and the data is accurate and representative, this is very likely a statistical truth.

But is this interesting? . . .

My result about album reviews worsening over the review periods is “statistically significant.” The p-value is so small it risks vanishing. My initial response to the finding was excitement and to begin armchair psychologizing on what could be causing this. I even wrote an extensive article on my speculations.

But I [Kopf] was haunted by this image:

Each point is a review. Red ones are above average for that album, and green ones below. . . . With so many data points, it can be difficult for the human eye to determine correlation, but your eyes don’t deceive you. There is not much going on here. Only 1% of the variation of an album’s rating is explained by knowing when in the order of reviews an album fell. . . .

Is 1% worth considering? It depends on the subject matter, but in this case, its probably not. When you combine the large sample sizes that come with big data and the speed of modern computing, it is relatively easy to find patterns in data that are statistically significant . . . . But many of these patterns will be uninteresting and/or meaningless for decision making. . . .

But here’s the deal. What does it mean for the pattern to be tiny but still statistically significant? There are lots of albums that get reviewed. Each set of reviews has a time trend. Some trends go up, some go down. Is the *average* trend positive or negative? Who cares? The average trend is a mixture of + and – trends, and whether the avg is + or – for any given year depends on the population of albums for that year.

So I think the answer is the secret weapon (or, to do it more efficiently, a hierarchical model). Slice the data a bunch of ways. If the trend is negative for every album, or for 90% of albums, then this is notable, if puzzling: how exactly would that be, that the trend is almost always negative, but the aggregate pattern is so weak?

More likely, the trend is positive for some, negative for others, and you could try to understand that variation.

The key is to escape from the trap of trying to estimate a single parameter. Also to point out the near-meaninglessness of statistical significance the context of varying patterns.

The post Statistical significance, practical significance, and interactions appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Political Attitudes in Social Environments appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Their article is scheduled to be published in Behavioral and Brain Sciences with several discussions, including one by Neil Gross and myself.

Here’s our abstract:

We agree with the authors that it is worthwhile to study professions’ political alignments. But we have seen no evidence to support the idea that social science fields with more politically diverse workforces generally produce better research. We also think that when considering ideological balance, it is useful to place social psychology within a larger context of the prevailing ideologies of other influential groups within society, such as military officers, journalists, and business executives.

And here’s the rest of our discussion:

Although we appreciate several things about the Duarte et al. essay, “Political Diversity Will Improve Social Psychological Science,” including its insistence that social scientists should work to minimize the impact of their political views on their research and its sensitivity to political threats to social science funding, we find their central argument unpersuasive. Contrary to the assertion of the authors, we have seen no evidence that social science fields with more politically diverse workforces have higher evidentiary standards, are better able to avoid replication failures, or generally produce better research. As there are no standardized ways to measure these outcomes at the disciplinary or subdisciplinary level, and as reliable data on researcher politics at the disciplinary and subdisciplinary level are scarce, there have never been—to our knowledge—any systematic attempts to examine the relationship between epistemic quality and variation in the political composition of the social-scientific community. The authors are thus calling for major changes in policy and practice based on sheer speculation. The authors cite some evidence of the benefits of “viewpoint diversity” in collaboration, but there is a scale mismatch between these studies (of small groups) and the field-level generalizations the authors make. In point of fact, research on the history and sociology of social science suggests that scientific/intellectual movements that bundle together political commitments and programs for research—movements of the sort the authors believe to have weakened social and personality psychology—have arisen under a wide range of political conditions, as have countermovements calling for greater objectivity. Until we know more about these and related dynamics, it would be premature to tinker with organizational machineries for knowledge production in the social sciences, however much one may worry, alongside the authors, about certain current trends.

In addition we think it is helpful to consider the Duarte et al. argument in a broader context by considering other fields that lean strongly to the left or to the right. The cleanest analogy, perhaps, is between college professors (who are disproportionately liberal Democrats) and military officers (mostly conservative Republicans; see the research of political scientist Jason Dempsey, 2009). In both cases there seems to be a strong connection between the environment and the ideology. Universities have (with some notable exceptions) been centers of political radicalism for centuries, just as the military has long been a conservative institution in most places (again, with some exceptions). And this is true even though many university professors are well-paid, live well, and send their children to private schools, and even though the U.S. military has been described as the one of the few remaining bastions of socialism remaining in the 21st century. Another example of a liberal-leaning profession is journalism (with its frequently-cited dictum to “comfort the afflicted and afflict the comfortable,” and again the relative liberalism of that profession has been confirmed by polls of journalists, for example Weaver et al., 2003), while business executives represent an important, and influential, conservative group in American society. There has been some movement to balance out the liberal bias of journalism in the United States, but it is not clear what would be done to balance political representation among military officers or corporate executives.

In short, we applaud the work of Duarte et al. in exploring the statistics and implications of political attitudes among social researchers. The psychology profession is, like the military, an all-volunteer force, and it is not clear to us that the purported benefits of righting the ideological balance among social psychologists (or among military officers, or corporate executives) are worth the efforts that would involved in such endeavors. But these sorts of ideological what-ifs make interesting thought experiments.

Regular readers of this blog will know that I have problems with a lot of the social psychology research that gets published and publicized. And I certainly feel that political conservatives should feel free to contribute to this field. It’s not at all clear to me that a change in the mix of political attitudes among psychology researchers has much to do, one way or another, with scientific reform in this area. But it’s a question worth raising, just as it’s worth raising in the context of journalism, business, the military, and other institutions within our society.

The post Political Attitudes in Social Environments appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post A message from the vice chairman of surgery at Columbia University: “Garcinia Camboja. It may be the simple solution you’ve been looking for to bust your body fat for good.” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Should Columbia University fire this guy just cos he says things like this:

“You may think magic is make believe but this little bean has scientists saying they’ve found the magic weight loss cure for every body type—it’s green coffee extract.”

“I’ve got the No. 1 miracle in a bottle to burn your fat. It’s raspberry ketones.”

“Garcinia Camboja. It may be the simple solution you’ve been looking for to bust your body fat for good.”

Probably not. Exaggerating or even lying, trading off your university affiliation, I don’t think that’s a firing offense. Even the possibly “outrageous conflicts of interest,” maybe there’s no hard evidence there. And it might be that in the classes he sticks to the more standard material, or labels his speculations as such.

Or maybe they should just reduce his salary and give him a very tiny office in a faraway building, and schedule his classes for Sundays at 3 in the morning? I have no idea.

Having this sort of joker on the faculty is embarrassing for Columbia, sure, but firing or even reprimanding him could be even worse. After all, where do you draw the line? Should faculty be canned for plagiarizing, or for making up interviews in ethnographic studies, or for expressing noxious political or legal opinions, or for refusing to retract or correct the errors in their published work?

Probably Columbia has to just take the reputational hit, which means they have to continue seeing this sort of thing in the press:

Astoundingly, Dr. Oz is the vice chairman and professor of surgery at Columbia University College of Physicians and Surgeons.

Astoundingly, indeed.

Just like Cornell with Daryl Bem: it’s all an embarrassment, but Bem’s Cornell affiliation is a currency of diminishing value. When his study first got publicity, Bem benefited from the Ivy League affiliation, but now his work is evaluated on its own terms.

Dr. Oz is different, maybe, because he remains in the news. If Columbia does decide they want to get rid of the guy, I don’t think they’d fire him. They’d just make his working conditions worse and worse until he quits of his own accord.

Or maybe Columbia will go on the offensive and fight for the Vice-Chairman’s right to party—ketones style!

But not just any ketones. It’s gotta be *raspberry* ketones.

Hey, I eat celery almost every day but I don’t go all TV about it.

**P.S.** I’m thinking we should add Oz to the scripts for “Second Chance U” and “The New Dirty Dozen”. And, hey, graphics designers: I’d still like some movie posters for these!

The post A message from the vice chairman of surgery at Columbia University: “Garcinia Camboja. It may be the simple solution you’ve been looking for to bust your body fat for good.” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Instead of worrying about multiple hypothesis correction, just fit a hierarchical model. appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Pejman Mohammadi writes:

I’m concerned with a problem in multiple hypothesis correction and, despite having read your article [with Jennifer and Masanao] on not being concerned about it, I was hoping I could seek your advice.

Specifically, I’m interested in multiple hypothesis testing problem in cases when the test is done with a discrete finite distribution. For example, when doing many tests using binomial distribution. This is an important problem as it appears in in more and more places in bioinformatics nowadays, such as differential gene expression testing, Allele specific expression testing, and pathway enrichment analysis.

What seems to be clear is that the current correction methods are too conservative for such tests, and it’s also straightforward to show that such finite test distributions produce less false positives as one would expect from the null distribution. My understanding is that there’s not a clear way how to correct for multiple hypotheses in this type of situations. I was wondering if I could have your advice on the issue.

My response:

Instead of picking one comparison and doing a multiple comparisons correction, I suggest you should fit a hierarchical model including all comparisons and then there will be no need for such a corrections.

Mohammadi followed up:

I’m not sure if making a hierarchical model would be a possibility for all the cases, and anyways most of these methods are done in a frequentist way. At the moment I work around it by correcting for unique tests only but that seems not necessarily a good idea.

To which I replied:

“Frequentist” is a word for the way in which inferences are evaluated. It is fine to do a hierarchical model from a frequentist perspective.

The post Instead of worrying about multiple hypothesis correction, just fit a hierarchical model. appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post The feather, the bathroom scale, and the kangaroo appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Here’s something I wrote in the context of one of those “power = .06″ studies:

My criticism of the ovulation-and-voting study is ultimately quantitative. Their effect size is tiny and their measurement error is huge.

My best analogy is that they are trying to use a bathroom scale to weigh a feather—and the feather is resting loosely in the pouch of a kangaroo that is vigorously jumping up and down.At some point, a set of measurements is so noisy that biases in selection and interpretation overwhelm any signal and, indeed, nothing useful can be learned from them. I assume that the underlying effect size in this case is not zero—if we were to look carefully, we would find some differences in political attitude at different times of the month for women, also different days of the week for men and for women, and different hours of the day, and I expect all these differences would interact with everything—not just marital status but also age, education, political attitudes, number of children, size of tax bill, etc etc. There’s an endless number of small effects, positive and negative, bubbling around.

I like the weighing-a-feather-while-the-kangaroo-is-jumping analogy. It includes measurement accuracy and also the idea that there are huge biases that are larger than the size of the main effect.

The post The feather, the bathroom scale, and the kangaroo appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Online predictions from ipredict appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>It’s based in New Zealand so you can bet on wacky propositions such as, “David Carter to be next High Commissioner from New Zealand to the United Kingdom.” They also have political events in the U.S. and other countries and science items such as, “NASA to announce the discovery of extraterrestrial life before 1 Jan 2017.” Whaaa….? They give this one a 9% chance of happening. Could that really be? Hmmm, maybe I should bet a few thousand on that. But I guess that’s the point, that the payoff would be low: betting $10,000 to win $1,000 isn’t so exciting. Also I’m guessing the market is not so liquid, so I probably couldn’t get much of a bet on this one in any case.

The post Online predictions from ipredict appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post New Book: Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Fränzi and Tobias‘s book is now real:

Fränzi Korner-Nievergelt, Tobias Roth, Stefanie von Felten, Jérôme Guélat, Bettina Almasi, and Pius Korner-Nievergelt (2015) *Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan*. Academic Press.

This is based in part on the in-person tutorials that they and the other authors have been giving on statistical modeling for ecology.

The book starts at the beginning with an introduction to R, regression and ANOVA, discusses maximum likelihood estimation, then generalized linear models including “mixed effects” models, and then proceeds to Bayesian modeling with MCMC computation for inference, and winds up with some case studies involving BUGS and Stan. Everything works up from simple “hello world” type programs through real examples, which I really appreciate myself in computational examples.

Stan’s primarily showcased in three fully worked out examples (which I also really appreciate as a reader), all of which appear in Chapter 14, “Advanced Ecological Models”:

(14.2) zero-inflated Poisson mixed model for analyzing breeding success,

(14.3) occupancy model to measure species distribution, and

(14.5) analyzing survival based on mark-recapture data.

The post New Book: Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post On deck this week appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>**Tues:** The feather, the bathroom scale, and the kangaroo

**Wed:** Instead of worrying about multiple hypothesis correction, just fit a hierarchical model.

**Thurs:** Political Attitudes in Social Environments

**Fri:** Statistical significance, practical significance, and interactions

**Sat:** Statistical analysis on a dataset that consists of a population

**Sun:** An amusing window into folk genetics

The post Go to PredictWise for forecast probabilities of events in the news appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>C’mon. “27.4%”? Who are you kidding?? (See here for explication of this point.)

The post Go to PredictWise for forecast probabilities of events in the news appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Diederik Stapel in the news, *again* appeared first on Statistical Modeling, Causal Inference, and Social Science.

OK, that’s just ethnic profiling of Dutch people. I think they’re just gonna put the whole country on lockdown.

The post Diederik Stapel in the news, *again* appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post How do data and experiments fit into a scientific research program? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Anyway, my point here is not to get into another argument with Satoshi Kanazawa or Daryl Bem or whoever. They’re doing their research, I’m doing mine, and at this point I don’t think they’re planning to change their methods.

Instead, accept for a moment my premise that these research programs, as implemented, are dead ends. Accept my premise that these researchers are chasing noise, that they’re in the position of the “50 shades of gray” guys but without the self-awareness. They think they’re doing research and making discoveries but they’re just moving in circles.

OK, fine, but then the question arises: what *is* the role of data and experimental results in these research programs?

Here’s what I think. First, when it comes to the individual research articles, I think the data add nothing, indeed the data can even be a minus if they lead other researchers to conclude that a certain pattern holds in the general population.

From this perspective, if these publications have value, it’s *in spite of*, not because of, their data. If the theory is valuable (and it could be), then it could (and, I think, should) stand alone. It would be good if the theory also came with quantitative predictions that were consistent with the rest of available scientific understanding, which would in turn motivate a clearer understanding of what can be learned from noisy data in such situations—but let’s set that aside, let’s accept that these people are working within their own research paradigm.

So what is that paradigm? By which I mean, not What is their paradigm of evolutionary psychology or paranormal perception or whatever, but What is their paradigm of how research proceeds? How will their careers end up, and how will these strands of research go forward.

I think (but certainly am not sure) that these scientists think of themselves as operating in Popperian fashion, coming up with scientific theories that imply testable predictions, then designing measurements and experiments to test their hypotheses, rejecting when “p less than .05″ and moving forward. Or, to put it slightly more loosely, they believe they are establishing stylized facts, little islands of truth in our sea of ignorance, and jumping from island to island, building a pontoon bridge of knowledge . . . ummmm, you get the picture. The point is, that from their point of view, they’re doing classic science. I don’t think this is what’s happening, though, for reasons I discussed here a few months ago.

But, if these researchers are not following the Karl Popper playbook, what *are* they doing?

A harsh view, given all I’ve written above, is that they’re just playing in a sandbox with no connection to science or the real world.

But I don’t take this harsh view. I accept that theorizing is an important part of science, and I accept that the theorizing of Daryl Bem, or Sigmund Freud, or the himmicanes and hurricanes people, or the embodied cognition researchers, etc etc etc., is science, even if these researchers do not have a realistic sense of the sort of measurement accuracy it would take to test and evaluate these theories.

Now we’re getting somewhere. What I think is that anecdotes, or case studies, even data that are so noisy as to essentially be random numbers, can be a helpful stimulus, in that it can motivate some theorizing.

Take, for example, that himmicanes and hurricanes study. The data analysis was a joke (no more so than a lot of other published data analyses, of course), and the authors of the paper made a big mistake to double down on their claims rather than accepting the helpful criticism from outside—but maybe there’s something to their idea that the name of a weather event affects how people react to it. It’s quite possible that, if there is such an effect, it goes in the opposite direction from what was claimed in that notorious article—but the point is that their statistical analyses may have jogged them into an interesting theory.

It’s the same way, I suppose, that Freud came up with and refined his theories of human nature, based on his contacts with individual patients. In this case, researchers are looking at individual datasets, but it’s the same general idea.

Anyway, here’s my point. To the extent that research of Bem, or Kanazawa, or the ovulation-and-voting people, or the himmicanes-and-hurricanes people, or whatever, has value, I think the value comes from the theories, not from the data and certainly not from whatever happens to show up as statistically significant in some power=.06 study. And, once we recognize that the value comes in the theories, it suggests that the role of the data is to throw up random numbers that will tickle the imagination of theorists. Even if they don’t realize that’s what they’re doing.

Sociologist Jeremy Freese came up with the term Columbian Inquiry to describe scientists’ search for confirmation of a vague research hypothesis: “Like brave sailors, researchers simply just point their ships at the horizon with a vague hypothesis that there’s eventually land, and perhaps they’ll have the rations and luck to get there, or perhaps not. Of course, after a long time at sea with no land in sight, sailors start to get desperate, but there’s nothing they can do. Researchers, on the other hand, have a lot of more longitude—I mean, latitude—to terraform new land—I mean, publishable results—out of data . . .”

What I’ve attempted to do in the above post is, accepting that a lot of scientists do proceed via Columbian Inquiry, try to understand where this leads. What happens if you spend a 40-year scientific career using low-power studies to find support for, and modify, vague research hypotheses? What will happen is that you’ll move in a sort of directed random walk, finding one thing after another, one interaction after another (recall that we’ve looked at studies that find interactions with respect to relationship status, or weather, or parents’ socioeconomic status—but never in the same paper), but continuing to stay in the main current of your subfield. There will be a sense of progress, and maybe real progress (to the extent that the theories lead to useful insights that extend outside your subfield), even if the data aren’t quite playing the role that you think they are.

For example, Satoshi Kanazawa, despite what he might think, is not discovering anything about variation in the proportion of girl births. But, by spending years thinking of explanations for the patterns in his noisy data, he’s coming up with theory after theory, and this all fits into his big-picture understanding of human nature. Sure, he could do all this without ever seeing data at all—indeed, the data are, in reality, so noisy as to have have no bearing on his theorizing—but the theories could still be valuable.

**P.S.** I’m making no grand claims for my own research. Much of my political science work falls in a slightly different tradition in which we attempt to identify and resolve “puzzles” or stylized facts that do not fit the current understanding. We do have some theories, I guess—Gary and I talked about “enlightened preferences” in our 1993 paper—but we’re a bit closer to the ground. Also we tend to study large effects with large datasets so I’m not so worried that we’re chasing noise.

The post How do data and experiments fit into a scientific research program? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Gigerenzer on logical rationality vs. ecological rationality appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The “half-empty” versus “half-full” explanation of the differences between Kahneman and us misses the essential point: the difference is about the nature of the glass of rationality, not the level of the water. For Kahneman, rationality is logical rationality, defined as some content-free law of logic or probability; for us, it is ecological rationality, loosely speaking, the match between a heuristic and its environment. For ecological rationality, taking into account contextual cues (the environment) is the very essence of rationality, for Kahneman it is a deviation from a logical norm and thus, a deviation from rationality. In Kahneman’s philosophy, simple heuristics could never predict better than rational models; in our research we have shown systematic less-is-more effects.

Gigerenzer pointed to his paper with Henry Brighton, “Homo Heuristicus: Why Biased Minds Make Better Inferences,” and then he continued:

Please also note that Kahneman and his followers accept rational choice theory as the norm for behavior, and so does almost all of behavioral economics. They put the blame on people, not the model.

This makes sense, in particular the less-is-more idea seems like a good framing.

That said, I think some of the power of Kahneman and Tversky’s cognitive illusions, as with the visual illusions with which we are all familiar, is that there often is a shock of recognition, when we realize that our intuitive, “heuristic,” response is revealed, upon deeper reflection, to be incorrect.

To put it in Gigerenzer’s framework, our environment is constantly changing, and we spend much of our time in an environment that is much different than the savanna where our ancestors spent so many thousands of years.

From this perspective, rational choice is not an absolute baseline of correctness but in many ways it works well in our modern society which includes written records, liquid and storable money, and various other features for which rationality is well adapted.

The post Gigerenzer on logical rationality vs. ecological rationality appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Perhaps the most contextless email I’ve ever received appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Subject: Sample Stats Question

From: ** <**@gmail.com>

Hello,

I hope all is well and trust that you are having a great day so far. I hate to bother you but I have a stats question that I need help with: How can you tell which group has the best readers when they have the following information: Group A-130, 140, 170,170, 190, 200, 215, 225, 240, 250

Group B- 188, 189, 193, 193, 193, 194, 194, 195, 195, 196

Group A-mean (193), median (195), mode (170)

Group B- mean (193), median(193.5), mode (193)

Why?

This is for my own personal use and understanding of this subkject matter so anything you could say and redirect me would be greatly appreicated.

Any feedback that you could give me to help understand this better would be greatly appreciated.

Thanks,

The post Perhaps the most contextless email I’ve ever received appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Item-response and ideal point models appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>– Estimating the direction and the magnitude of the discrimination parameters.

– How to tell when your data don’t fit the model.

– When does ideal-point modeling make a difference? Comparing ideal-point estimates to simple averages of survey responses.

**P.S.** Unlike the previous post, this time I really *am* referring to the class we had this morning.

The post Item-response and ideal point models appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post A message I just sent to my class appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>And, sure, that’s true, I’ve seen lots of examples of bad studies that get tons of publicity. But that shouldn’t really be the #1 point you get from my class.

This is how I want you to think about today’s class:

Consider 3 different ways in which you will be using sample surveys:

1. Conducting your own survey;

2. Performing your own analysis of existing survey data;

3. Reading and interpreting a study that was performed by others.

The key statistical message of today’s lecture was that if the underlying comparison of interest in the population (what I was calling the “effect size,” but that is somewhat misleading, as we could be talking about purely descriptive comparisons with no direct causal interpretation) is small, and if measurements are poor (high bias, high variance, or both), then it can be essentially impossible to learn anything statistical from your data.

The point of the examples I discussed is not so much that they’re dumb, but that they are settings where the underlying difference or effect in the population is small, and where measurements are noisy, or biased, or both.

What does this imply for your own work? Consider the 3 scenarios listed above:

1. If you’re conducting your own survey: Be aware of what your goal is, what you’re trying to estimate. And put lots of effort into getting valid and reliable measurements. If you’re estimating a difference which in truth is tiny, or if your measurements are crap, you’re drawing dead (as they say in poker).

2. If you’re performing your own analysis of existing survey data: Same thing. Consider what you’re estimating and how well it’s being measured. Don’t fall into the trap of thinking that something that’s statistically significant is likely to accurately represent a truth in the general population.

3. If you’re reading and interpreting a study that was performed by others: Same thing. Even if the claim does not seem foolish, think about the size of the underlying comparison or effect and how accurately it’s being estimated.

To put it another way, one thing I’m pushing against is the attitude that statistical significance is a “win.” From that perspective, it’s ok to do a noisy study of a small effect if the cost is low, because you might get lucky and get that “p less than .05.” But that is a bad attitude, because if you’re really studying a small effect with a noisy measurement, anything that happens to be statistically significant could well be in the wrong direction and is certain to be an overestimate. In the long run, finding something statistically significant in this way is not a win at all, it’s a loss in that it can waste your time and other researchers’ time.

This is all some serious stuff to think about in a methods class, but it’s important to think a bit about the endgame.

**P.S.** (in case this is confusing anyone who was in class today): I wrote the above message a couple months ago. Most of the posts on this blog are on delay.

The post A message I just sent to my class appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “For better or for worse, academics are fascinated by academic rankings . . .” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Here’s what I wrote:

For better or for worse, academics are fascinated by academic rankings, perhaps because most of us reached our present positions through a series of tournaments, starting with course grades and standardized tests and moving through struggles for the limited resource of publication space in top journals, peer-reviewed grant funding, and finally, the unpredictable process of citation and reputation. As statisticians we are acutely aware of the failings of each step of the process and we find ourselves torn between the desire to scrap the whole system, Arxiv-style, or to reform it as suggested in the present paper. In this article, Varin, Catelan, and Firth argue that quantitative assessment of scientific and scholarly publication is here to stay, so we might as well try to reduce the bias and variance of such assessments as much as possible.

As the above paragraph indicates, I have mixed feelings about this sort of effort and as a result I feel too paralyzed to offer any serious comments on the modeling. Instead I will offer some generic, but I hope still useful, graphics advice: Table 2 is essentially unreadable to me and is a (negative) demonstration of the principle that, just as we should publish include any sentences that we do not want to be read, we also should avoid publishing numbers that will not be of any use to a reader. Does anyone care, for example, that AoS has exactly 1663 citations? This sort of table cries out to be replaced by a graph (which it should be possible to construct taking up no more space than the original table; see Gelman, Pasarica, and Dodhia, 2002). Figure 1 violates a fundamental principle of graphics in that it wastes one of its axes, in that it follows what Wainer (2001) has called the Alabama first ordering. Figure 2 has most of its words upside down, which is a result of an unfortunate choice to present a vertical display as horizontal, thus requiring me to rotate my computer 90 degrees to read it. Table 4 represents one of the more important outputs of the research being discussed, but it too is hard to read, requiring me to try to track different acronyms across the page. It would be so natural to display these results as a plot with one line per journal.

I will stop at this point and conclude by recognizing that these comments are trivial compared to the importance of the subject, but as noted above I was too torn by this topic offer anything more.

And here are X’s reactions.

The post “For better or for worse, academics are fascinated by academic rankings . . .” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Why do we communicate probability calculations so poorly, even when we know how to do it better? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I thought to do some reading in psychology on why Bayesian probability seems so counterintuitive, and making it difficult for many to learn and apply. Indeed, that is the finding of considerable research in psychology. It turns out that it is counterintuitive because of the way it is presented, following no doubt the way the textbooks are written. The theorem is usually expressed first with probabilities instead of frequencies, or “natural numbers” – counts in the binomial case.

The literature is considerable, starting at least with a seminal piece by David Eddy (1982). “Probabilistic reasoning in clinical medicine: problems and opportunities,” in Judgment under Uncertainty: Heuristics and Biases, eds D. Kahneman, P. Slovic and A. Tversky. Also much cited are Gigerenzer and Hoffrage (1995) “How to improve Bayesian reasoning without instruction: frequency formats” Psychol. Rev, and also Cosmides and Tooby, “Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty”, Cognition, 1996.

This literature has amply demonstrated that people actually can readily and accurately reason in Bayesian terms if the data are presented in frequency form, but have difficulty if the data are given as percentages or probabilities. Cosmides and Tooby argue that this is so for evolutionary reasons, and their argument seems compelling.

So taking a look at my several texts (not a random sample of course), including Andrew’s well written text, I wanted to know how many authors introduce the widely used Bayesian example of determining the posterior probability of breast cancer after a positive mammography in numerical frequency terms or counts first, then shifting to probabilities. None do, although some do provide an example in frequency terms later.

Assuming that my little convenience sample is somewhat representative, it raises the question of why are not the recommendations of the psychologists adopted.

This is a missed opportunity, as the psychological findings indicate that the frequency approach makes Bayesian logic instantly clear, making it easier to comprehend the theorem in probability terms.

Since those little medical inference problems are very compelling, it would make the lives of a lot of students a lot easier and increase acceptance of the approach. One can only imagine how much sooner the sometimes acrimonious debates between frequentists and Bayesians would have diminished if not ended. So there is a clear lesson here for instructors and textbook writers.

Here is an uncommonly clear presentation of the breast cancer example: http://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/. And there are numerous comments from beginning statistics students noting this clarity.

My response:

I agree, and in a recent introductory course I prepared, I did what you recommend and started right away with frequencies, Gigerenzer-style.

Why has it taken us so long to do this? I dunno, force of habit, I guess? I am actually pretty proud of chapter 1 of BDA (especially in the 3rd edition with its new spell-checking example, but even all the way back to the 1st edition in 1995) in that we treat probability as a quantity that can be measured empirically, and we avoid what I see as the flaw of seeking a single foundational justification for probability. Probability is a mathematical model with many different applications, including frequencies, prediction, betting, etc. There’s no reason to think of any one of these applications as uniquely fundamental.

But, yeah, I agree it would be better to start with the frequency calculations: instead of “1% probability,” talk about 10 cases out of 1000, etc.

**P.S.** It’s funny that Goddard cited a paper by Cosmides and Tooby, as they’re coauthors on that notorious fat-arms-and-political-attitudes paper, a recent gem in the garden-of-forking-paths, power=.06 genre. Nobody’s perfect, I guess. In particular, it’s certainly possible for people to do good research on the teaching and understanding of statistics, even while being confused about some key statistical principles themselves. And even the legendary Kahneman has been known, on occasion, to overstate the strength of statistical evidence.

The post Why do we communicate probability calculations so poorly, even when we know how to do it better? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “Another bad chart for you to criticize” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>From a statistical perspective, this graph is a disaster in that the circular presentation destroys the two-way structure (countries x topics) which has to be central to any understanding of these data. In addition, to the extent that you’d want to get something out of the graph, you’ll end up having to perform mental divisions of line widths.

At this point I’d usually say something like: On the plus side, this is a thought-provoking display (given its tentacle-like appearance, one might even call it “grabby”) that draws viewers’ attention to the subject matter. But I can’t really even say *that*, because the subject of the graph—nationalities of Nobel Prize winners—is one of the more overexposed topics out there, and really the last thing we need is one more display of these numbers. Probably the only thing we need less of is further analysis of the Titanic survivors data. (Sorry, Bruno: 5 papers on that is enough!)

The post “Another bad chart for you to criticize” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Another stylized fact bites the dust appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>In a seminal paper, Camerer, Babcock, Loewenstein, and Thaler (1997) find that the wage elasticity of daily hours of work New York City (NYC) taxi drivers is negative and conclude that their labor supply behavior is consistent with target earning (having reference dependent preferences). I replicate and extend the CBLT analysis using data from all trips taken in all taxi cabs in NYC for the five years from 2009-2013. The overall pattern in my data is clear: drivers tend to respond positively to unanticipated as well as anticipated increases in earnings opportunities. This is consistent with the neoclassical optimizing model of labor supply and does not support the reference dependent preferences model.

I explore heterogeneity across drivers in their labor supply elasticities and consider whether new drivers differ from more experienced drivers in their behavior. I find substantial heterogeneity across drivers in their elasticities, but the estimated elasticities are generally positive and only rarely substantially negative. I also find that new drivers with smaller elasticities are more likely to exit the industry while drivers who remain learn quickly to be better optimizers (have positive labor supply elasticities that grow with experience).

It’s good to get that one out of the way.

The post Another stylized fact bites the dust appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post A silly little error, of the sort that I make every day appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Ummmm, running Stan, testing out a new method we have that applies EP-like ideas to perform inference with aggregate data—it’s really cool, I’ll post more on it once we’ve tried everything out and have a paper that’s in better shape—anyway, I’m starting with a normal example, a varying-intercept, varying-slope model where the intercepts have population mean 50 and sd 10, and the slopes have population mean -2 and sd 0.5 (for simplicity I’ve set up the model with intercepts and slopes independent), and the data variance is 5. Fit the model in Stan (along with other stuff, the real action here’s in the generated quantities block but that’s a story for another day), here’s what we get:

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat mu_a[1] 49.19 0.01 0.52 48.14 48.85 49.20 49.53 50.20 2000 1 mu_a[2] -2.03 0.00 0.11 -2.23 -2.10 -2.03 -1.96 -1.82 1060 1 sigma_a[1] 2.64 0.02 0.50 1.70 2.31 2.62 2.96 3.73 927 1 sigma_a[2] 0.67 0.00 0.08 0.52 0.61 0.66 0.72 0.85 890 1 sigma_y 4.97 0.00 0.15 4.69 4.86 4.96 5.06 5.27 2000 1

We’re gonna clean up this output—all these quantities are ridiculous, also I’m starting to think we shouldn’t be foregrounding the mean and sd as these can be unstable; median and IQR would be better, maybe—but that’s another story too.

Here’s the point. I looked at the above output and noticed that the sigma_a parameters are off: the sd of the intercept is too low (it’s around 2 and it should be 10) and the sd of the slopes is too high (it’s around 0.6 and it should be 0.5). The correct values aren’t even in the 95% intervals.

OK, it could just be this one bad simulation, so I re-ran the code a few times. Same results. Not exactly, but the parameter for the intercepts was consistently underestimated and the parameter for the slopes was consistently overestimated.

What up? OK, I do have a flat prior on all these hypers, so this must be what’s going on: there’s something about the data where intercepts and slopes trade off, and somehow the flat prior allows inferences to go deep into some zone of parameter space where this is possible.

Interesting, maybe ultimately not too surprising. We do know that flat priors cause problems, and here we are again.

What to do? I’d like something weakly informative, this prior shouldn’t boss the inferences around but it should keep them away from bad places.

Hmmm . . . I like that analogy: the weakly informative prior (or, more generally, model) as a permissive but safe parent who lets the kids run around in the neighborhood but sets up a large potential-energy barrier to keep them away from the freeway.

Anyway, to return to our story . . . I needed to figure out what was going on. So I decided to start with a strong prior focused on the true parameter values. I just hard-coded it into the Stan program, setting normal priors for mu_a[1] and mu_a[2]. But then I realized, no, that’s not right, the problem is with sigma_a[1] and sigma_a[2]. Maybe put in lognormals?

And then it hit me: in my R simulation, I’d used sd rather than variance. Here’s the offending code:

a <- mvrnorm(J, mu_a, diag(sigma_a))

That should've been diag(sigma_a^2). Damn! Going from univariate to multivariate normal, the notation changed.

On the plus side, there was nothing wrong with my Stan code. Here's what happens after I fixed the testing code in R:

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat mu_a[1] 48.17 0.11 1.62 45.08 47.07 48.12 49.23 51.38 211 1.02 mu_a[2] -2.03 0.00 0.10 -2.22 -2.09 -2.02 -1.97 -1.82 1017 1.00 sigma_a[1] 10.98 0.05 1.18 8.95 10.17 10.87 11.68 13.55 496 1.01 sigma_a[2] 0.57 0.00 0.09 0.42 0.51 0.56 0.63 0.75 826 1.00 sigma_y 5.06 0.00 0.15 4.78 4.95 5.05 5.16 5.35 2000 1.00

Fake-data checking. That's what it's all about.

<rant>

And that's why I get so angry at bottom-feeders like Richard Tol, David Brooks, Mark Hauser, Karl Weick, and the like. *Every damn day* I'm out here working, making mistakes, and tracking them down. I'm not complaining; I like my job. I like it a lot. But it really *is* work, it's hard work some time. So to encounter people who just don't seem to care, who just don't give a poop whether the things they say are right or wrong, ooohhhhh, that just burns me up.

There's nothing I hate more than those head-in-the-clouds bastards who feel deep in their bones that they're right. Whether it's an economist fudging his numbers, or a newspaper columnist lying about the price of a meal at Red Lobster, or a primatologist who won't share his videotapes, or a b-school professor who twists his stories to suit his audience---I just can't stand it, and what I really can't stand is that it doesn't even seem to matter to them when people point out their errors. Especially horrible when they're scientists or journalists, people who are paid to home in on the truth and have the public trust to do that.

A standard slam against profs like me is that we live in an ivory tower, and indeed my day-to-day life is far removed from the sort of Mametian reality, that give-and-take of fleshy wants and needs, that we associate with "real life." But, y'know, a true scholar cares about the details. Take care of the pennies and all that.

</rant>

The post A silly little error, of the sort that I make every day appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Mistaken identity appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The person XX [pseudonym redacted] who posts on your blog is almost certainly YY [name redacted]. So he is referencing his own work and trying to make it sound like it is a third party endorsing it. Not sure why but it bugs me. He is an ass as well so pretty much every thing he does bugs me. . . .

OK, fair enough. I was curious so I searched in the blog archives for commenter XX. It turns out he was not a very frequent commenter but he had a few, and he did refer to the work of YY. But I’m almost certain that XX is *not* YY. I’m no Sherlock Holmes when it comes to the internet but I checked the url’s, and XX appears to be coming from a different country than the location of YY. And, looking at the comments themselves, I can’t believe this is some elaborate attempt at deception.

No big deal. But it’s an interesting example of how it’s possible to be so sure of oneself and happen to be wrong.

The post Mistaken identity appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post New research in tuberculosis mapping and control appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Jon Zelner informs os about two ongoing research projects:

1. TB Hotspot Mapping:Over the summer, I [Zelner] put together a really simple R package to do non-parametric disease mapping using the distance-based mapping approach developed by Caroline Jeffery and Al Ozonoff at Harvard. The package is available here. And I also mention it on my website.We’ve been using this package to map hotspots of multidrug-resistant (MDR) TB and spatial clustering of specific TB strains in Lima. Here’s a paper from a few years ago by Justin Manjourides (he’s one of the coauthors on our new paper that’s currently under review) that used this approach to do something similar using administrative data from Lima and has some figures that do a good job of demonstrating the application of this approach to the MDR problem in Lima. I think it’s a pretty cool way to do this kind of mapping, and it has the virtue of being less sensitive to irregular point patterns than KDE and better able to deal with really large datasets than a GP smoother.

The package is pretty simple and easy to use and there’s a tutorial on the README.md for the github repo. The only caveat is that the current version of the master branch only uses a fixed-bandwidth smoother, but I’m hoping to push a version with the variable bandwidth smoothing at some point (the version I have of that for my own work relies on a hacky combination of R and Julia and isn’t quite fit for public consumption yet; hoping to re-implement the slower bits using RCPP for an R-only version soon).

Ummm . . . I’d say do all the damn smoothing in Stan and then you don’t have to be hacky at all.

And here’s Zelner’s other tuberculosis project:

2. Developing targeted interventions for TB control:In this group of analyses (the first is in this paper from the American Journal of Epidemiology), we used data from a large household-based cohort study in Lima, Peru, that my coauthors collected to estimate age-specific rates of TB from exposure in the community and to household cases. In this paper, we tried to tackle the question of whether screening for latent TB infection and providing preventive therapy to the household contacts of TB cases that presented at community health centers could be effective for individuals older than 5 (which is the current WHO cutoff for this type of screening and preventive therapy).What we found is that it looks like there are a good number of new infections from this type of exposure up to about 15 years. And then a second paper of ours, which came out in the American Journal of Respiratory and Critical Care Medicine last year, suggested that preventive therapy provided to these younger individuals was very effective at preventing them from developing TB disease during the year following enrollment into the study, which is encouraging. What makes this analysis cool, I think, is how we used differences in the infectivity of different types of household index cases (the ones that showed up in the community health centers) to estimate the age-specific rates of community and household transmission. This kind of thing is relatively new in TB epidemiology, where we typically rely on more broad-brush kinds of policies.

I think there are also some selection issues that we tried to deal with around what kind of households get what kind of index cases. But I’ve been thinking there’s probably a post-stratification style solution to this issue that would be more elegant than what we did in the paper (basically a lot of sensitivity analysis). I’m actually working on a Stan-based extension to the AJE paper right now to see how robust are conclusions are to spatial variation in community infection rates and was hoping to ping you about a better way of tackling the potential selection problem at some point.

Poststratification good. Stan good.

This is not an application area that I know anything about but I wanted to share this interesting stuff with you.

The post New research in tuberculosis mapping and control appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post How can teachers of (large) online classes use text data from online learners? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Dustin Tingley sends along a recent paper (coauthored with Justin Reich, Jetson Leder-Luis, Margaret Roberts, and Brandon Stewart), which begins:

Dealing with the vast quantities of text that students generate in a Massive Open Online Course (MOOC) is a daunting challenge. Computational tools are needed to help instructional teams uncover themes and patterns as MOOC students write in forums, assignments, and surveys. This paper introduces to the learning analytics community the Structural Topic Model, an approach to language processing that can (1) find syntactic patterns with semantic meaning in unstructured text, (2) identify variation in those patterns across covariates, and (3) uncover archetypal texts that exemplify the documents within a topical pattern. We show examples of computationally-aided discovery and reading in three MOOC settings: mapping students’ self-reported motivations, identifying themes in discussion forums, and uncovering patterns of feedback in course evaluations.

This sounds like it could be useful, especially if the data collection and analysis is all automatic. I’m sure the model will have a lot of problems—all models do—but that’s ok. The instructor could run this program, look at the results, see what makes sense, and see what doesn’t make sense. Ideally the program would come with some feedback options so that Reich et al., as developers of the software, can improve the model and make it more useful. Thus, a system with its own built-in mechanism for improvement. Perhaps my posting here can start that process going.

The post How can teachers of (large) online classes use text data from online learners? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Comparison of Bayesian predictive methods for model selection appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>We mention the problem of bias induced by model selection in A survey of Bayesian predictive methods for model assessment, selection and comparison, in Understanding predictive information criteria for Bayesian models, and in BDA3 Chapter 7, but we haven’t had a good answer how to avoid that problem (except by not selecting any single model, but integrating over all them).

We (Juho Piironen and me) recently arxived a paper Comparison of Bayesian predictive methods for model selection, which I can finally recommend as giving a useful practical answer how to make model selection with greatly reduced bias and overfitting. We write

The results show that the optimization of a utility estimate such as the cross-validation score is liable to finding overfitted models due to relatively high variance in the utility estimates when the data is scarce. Better and much less varying results are obtained by incorporating all the uncertainties into a full encompassing model and projecting this information onto the submodels. The reference model projection appears to outperform also the maximum a posteriori model and the selection of the most probable variables. The study also demonstrates that the model selection can greatly benefit from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model.

Our experiments were made with Matlab, but we are working on Stan+R code, which should be available in a few weeks.

The post Comparison of Bayesian predictive methods for model selection appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Outside pissing in appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Coral Davenport writes in the New York Times:

Mr. Tribe, 73, has been retained to represent Peabody Energy, the nation’s largest coal company, in its legal quest to block an Environmental Protection Agency regulation that would cut carbon dioxide emissions from the nation’s coal-fired power plants . . . Mr. Tribe likened the climate change policies of Mr. Obama to “burning the Constitution.”

But can we really trust a reporter with this name on the topic of global warming? Coral is, after all, on the front line of climate change risk.

So what’s happened to Laurence “ten-strike” Tribe?

Last we heard from him, he was asking Obama for a “newly created DOJ position dealing with the rule of law.” Maybe if Tribe had gotten the damn job, he’d keep it on the down low about the whole “burning the Constitution” thing.

The noted left-leaning Harvard Law professor has gone rogue! And, like our favorite rogue economist, seems to have become disillusioned with our Hawaiian-born leader.

The news article also contains this juicy bit:

In addition to the brief, Mr. Tribe wrote a lengthy public comment on the climate rules that Peabody submitted to the E.P.A. Mr. Tribe’s critics note that his comment, which he echoed in an op-ed article in The Wall Street Journal in December, includes several references to the virtues of coal, calling it “a bedrock component of our economy.”

The comment also has phrases frequently used by the coal industry. . . .

Laurence Tribe using phrases written by others! That could never happen, right?

**P.S.** I was curious so I googled *Laurence Tribe bedrock* which took me to a legal document with this wonderful phrase:

This bedrock principle, one familiar to anyone who has taken an elementary civics class in any halfway adequate high school . . .

Damn! I knew that was my problem. My high school didn’t offer a civics class.

If this case ever gets to the Supreme Court, I expect Tribe will have some difficulty explaining these concepts to Sotomayor. As he put it so eloquently in his job-seeking letter to the C-in-C, she’s not nearly as smart as she seems to think she is. Elementary civics might be a bit beyond her.

The post Outside pissing in appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post And . . . our featured 2015 seminar speaker is . . . Thomas HOBBES!!!!! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Just in case you’ve forgotten where this all came from:

This came in the departmental email awhile ago:

CALL FOR APPLICATIONS: LATOUR SEMINAR — DUE DATE AUGUST 11 (extended)

The Brown Institute for Media Innovation, Alliance (Columbia University, École Polytechnique, Sciences Po, and Panthéon-Sorbonne University), The Center for Science and Society, and The Faculty of Arts and Sciences are proud to present

BRUNO LATOUR AT COLUMBIA UNIVERSITY, SEPTEMBER 22-25

You are invited to apply for a seminar led by Professor Bruno Latour on Tuesday, September 23, 12-3pm. Twenty-five graduate students from throughout the university will be selected to participate in this single seminar given by Prof. Latour. Students will organize themselves into a reading group to meet once or twice in early September for discussion of Prof. Latour’s work. They will then meet to continue this discussion with a small group of faculty on September 15, 12-2pm. Students and a few faculty will meet with Prof. Latour on September 23. A reading list will be distributed in advance.

If you are interested in this 3-4 session seminar (attendance at all 3-4 sessions is mandatory), please send

Name:

Uni:

Your School:

Your Department:

Year you began your terminal degree at Columbia:

Thesis or Dissertation title or topic:

Name of main advisor:

In one short, concise paragraph tell us what major themes/keywords from Latour’s work are most relevant to your own work, and why you would benefit from this seminar. Please submit this information via the site

http://brown.submittable.com/submit

The due date for applications is August 11 and successful applicants will be notified in mid-August.

This is the first time I’ve heard of a speaker who’s so important that you have to apply to attend his seminar! And, don’t forget, “attendance at all 3-4 sessions is mandatory.”

At this point you’re probably wondering what exactly is it that Bruno Latour does. Don’t worry—I googled him for you. Here’s the description of his most recent book, “An Inquiry Into Modes of Existence”:

The result of a twenty five years inquiry, it offers a positive version to the question raised, only negatively, with the publication, in 1991, of ”We have never been modern”: if ”we” have never been modern, then what have ”we” been? From what sort of values should ”we” inherit? In order to answer this question, a research protocol has been developed that is very different from the actor-network theory. The question is no longer only to define ”associations” and to follow networks in order to redefine the notion of ”society” and ”social” (as in ”Reassembling the Social”) but to follow the different types of connectors that provide those networks with their specific tonalities. Those modes of extension, or modes of existence, account for the many differences between law, science, politics, and so on. This systematic effort for building a new philosophical anthropology offers a completely different view of what the ”Moderns” have been and thus a very different basis for opening a comparative anthropology with the other collectives – at the time when they all have to cope with ecological crisis. Thanks to a European research council grant (2011-2014) the printed book will be associated with a very original purpose built digital platform allowing for the inquiry summed up in the book to be pursued and modified by interested readers who will act as co-inquirers and co-authors of the final results. With this major book, readers will finally understand what has led to so many apparently disconnected topics and see how the symmetric anthropology begun forty years ago can come to fruition.

Huh? I wonder if this is what they mean by “one short, concise paragraph” . . .

Update: We just got an announcement in the mail. The due date has been extended a second time, this time to Aug 18. This seems like a good sign, if fewer Columbia grad students than expected wanted to jump through the hoops to participate in this seminar.

**The ultimate bracket**

So . . . I had the idea that we could do better, and I gathered 64 potential speakers, eight current or historical figures from each of the following eight categories:

– Philosophers

– Religious Leaders

– Authors

– Artists

– Founders of Religions

– Cult Figures

– Comedians

– Modern French Intellectuals.

And Paul Davidson put them in a bracket, which, as of a few days ago, looked like this:

And yesterday we had the final round, which was won by Hobbes based on this positive argument from X:

from “Hobbes’s State of Nature : A Modern Bayesian Game-Theoretic Analysis” by Hun Chung:

I personally think that applying game theory to political theory is misguided only when one tries to apply the wrong model; and, not all game-theoretic models are wrong. This is why I believe conserving the details of Hobbes’s logic is important. I believe that the model provided in this paper is the correct game-theoretic model that represents Hobbes’s state of nature in a way that Hobbes had originally intended it to be.

We need to know what Hobbes thinks of Chung’s Bayesian analysis!

And this negative argument from an anonymous commenter:

I think Dick is bowing out of the competition with this quote:

Probability, Joe said to himself. A science in itself. Bernoulli’s theorem, the Bayes-Laplace theorem, the Poisson Distribution, Negative Binomial Distribution…coins and cards and birthdays, and at last random variables. And, hanging over it all, the brooding specter of Rudolf Carnap and Hans Reichenbach, the Vienna Circle of philosophy and the rise of symbolic logic. A muddy world, in which he did not quite care to involve himself.

If Dick does not care to involve himself with probability, I don’t care to involve myself with him!

Best of all, was this comment from Jonathan:

[Hobbes] got off this scatalogical sally directed at the Wallis and arguing the superiority of graphics to equations. (Note: Pappus was a 4th century geometer who proved things with pictures)

“When did you see any man but yourselves publish his Demonstrations by signs not generally received, except it were not with intent to demonstrate, but to teach the use of Signes? Had Pappus no Analytiques? Or wanted he the wit to shorten his reckoning by Signes? Or has he not proceeded Analytically in a hundred Problems (particularly in his seventh Book), and never used Symboles? Symboles are poor unhandsome (though necessary) scaffolds of Demonstration; and ought no more appear in publique, than the most deformed necessary business which you do in your Chambers.

Poop jokes *and* an argument that graphs are better than tables. Plus he’s a political scientist. Thomas Hobbes is my man.

What a great way to end our tournament, demonstrating that the earlier rounds were all worth it to lead up to this point.

Thank you all for participating!

The post And . . . our featured 2015 seminar speaker is . . . Thomas HOBBES!!!!! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post The championship! Thomas Hobbes vs. Philip K. Dick appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>OK, since this is the final round, you’re allowed to make as many Calvin and Hobbes jokes and as many dick jokes as you want.

These two guys have been through a lot together. To get here, Hobbes defeated, in order, Larry David, Leo Tolstoy, Chris Rock, Ed Wood, and Miguel de Cervantes; while Dick got by Jean Baudrillard, Grandma Moses, Mohandas Gandhi, Jane Austen, and George Carlin.

For last match I’m thinking Hobbes had the edge, as ultimately he was a much more versatile thinker and writer than Dick. Also, both of them had problems with the government but Hobbes issues were real, whereas with Dick it was more paranoia. Finally, Dick dabbled in philosophy and is famous for that, but Hobbes did the real thing.

Still, there’s that indelible image of the missing pull cord . . .

So let me know what you think!

P.S. And, for the last time, here’s the background, and here are the rules.

The post The championship! Thomas Hobbes vs. Philip K. Dick appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post But when you call me Bayesian, I know I’m not the only one appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Textbooks on statistics emphasize care and precision, via concepts such as reliability and validity in measurement, random sampling and treatment assignment in data collection, and causal identification and bias in estimation. But how do researchers decide what to believe and what to trust when choosing which statistical methods to use? How do they decide the credibility of methods? Statisticians and statistical practitioners seem to rely on a sense of anecdotal evidence based on personal experience and on the attitudes of trusted colleagues. Authorship, reputation, and past experience are thus central to decisions about statistical procedures.

The above paragraph is the abstract for the article, Convincing Evidence, by Keith O’Rourke and myself, which appeared in the just-published volume, “Roles, Trust, and Reputation in Social Media Knowledge Markets,” edited by Sorin Matei and Elisa Bertino.

Here’s how we begin:

The rules of evidence as presented in statistics textbooks are not the same as the informal criteria that statisticians and practitioners use in deciding what methods to use.

According to the official rules, statistical decisions should be based on careful design of data collection, reliable and valid measurement, and something approximating unbiased or calibrated estimation. The first allows both some choice of the assumptions and an opportunity to increase their credibility, the second tries to avoid avoidable noise and error and third tries to restrict to methods that are seemingly fair. This may be fine for evaluating psychological experiments, or medical treatments, or economic policies, but we as statisticians do not generally follow these rules when considering improvements in our teaching nor when deciding what statistical methods to use.

Did Fisher decide to use maximum likelihood because he evaluated its performance and the method had a high likelihood? Did Neyman decide to accept a hypothesis testing framework for statistics because it was not rejected at a 5% level? Did Jeffreys use probability calculations to determine there were high posterior odds of Bayesian inference being correct? Did Tukey perform a multiple comparisons analysis to evaluate the effectiveness of his multiple comparisons procedure? Did Rubin use matching and regression to analyze the efficacy of the potential-outcome framework for causal inference? Did Efron perform a bootstrap of existing statistical analyses to demonstrate the empirical effectiveness of resampling? Do the authors of textbooks on experimental design use their principles to decide what to put in their books? No, no, no, no, no, no, and no. . . .

We continue:

How, then, do we gain our knowledge about how to analyze data? This is a question that arises over and over as we encounter new sources of data that are larger and more structured than ever before. . . .

I don’t have all the answers, but I think these are important questions.

The post But when you call me Bayesian, I know I’m not the only one appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Philip K. Dick (2) vs. George Carlin (2); Hobbes advances appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>For someone who wasn’t writing in English, he’s credited with a fair number of English catchphrases.

Let’s bring the old guy back to life and see what else he’s got.

When thou art at Rome, do as they do at Rome.

A closed mouth catches no flies.

He who loses wealth loses much; he who loses a friend loses more; but he that loses his courage loses all.

No fathers or mothers think their own children ugly.

The gratification of wealth is not found in mere possession or in lavish expenditure, but in its wise application.

Forewarned, forearmed; to be prepared is half the victory.

Jonathan provides some good Hobbes quotes, but do we really want to end up with a lecture on publication bias at the end of this? Can we do no better?

But then Adam came in with this late entry:

I vote for Hobbes because he understood irony:

I often observe the absurdity of [my] dreams, but never dream of the absurdities of my waking thoughts.

With Cervantes, we’d just have more tilting at windmills. Isn’t that what usually happens in an academic seminar?

Good point.

And, for today, we have a battle of two people who really really don’t trust the government. Either one would be great, but I have to admit I’d be more psyched if it were Jane Austen vs. John Waters. . .

P.S. As always, here’s the background, and here are the rules.

The post Philip K. Dick (2) vs. George Carlin (2); Hobbes advances appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post This is why I’m a political scientist and not a psychologist appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The most recent example is a horrifying news article by Rachel Aviv about a group of ultra-religious Jews in Brooklyn who rallied around to support a child molester who was also a powerful member of their community. It’s just hard for me to get my head around the story. Various parts of it make sense: I can understand that, when a popular person is accused of molesting kids, that people don’t want to believe it. But, the amount of money and effort that was spent to protect this guy . . . it’s an incredible story. I mean, what were these people thinking???

The story also includes some sleazy lawyers and politicians, but that part doesn’t surprise me so much—in the U.S. we have an adversarial system of justice and I understand that many lawyers seem to believe that it’s their job to get their clients off the hook by all means necessary; and, when it comes to politicians, you can always feel that they’re balancing the harm to the few against the good of the many (and, for a typical politician, I guess that reelection corresponds to the good of the many). For these guys, it’s all business. An ugly business in this case, but it’s what you can expect. But, for the people in that religious community, it seems different, more like they were making personal sacrifices of time, money, and honor in order to, well, to keep a child molester out on the street, and also to intimidate other victims from exposing other child molesters. That’s the part that’s hard for me to fathom: what were they getting out of it? It can’t just be fear of reprisal (the sort of thing that, presumably, motivated so many of those denouncers in the old Soviet system): if that were the concern, I assume these guys could’ve just stayed aloof and let the system take its course. There was hard-core string-pulling, payoffs, all sorts of things (again, see the linked article for details).

In some sort of abstract sense, I suppose I understand this based on psychological principles, cognitive dissonance and all that. Also I guess there’s some sociology here, that these people felt that they had to do this based on their roles in the community.

And then there’s one other bit, which I guess is cultural (whatever that means). Here’s Aviv:

Molestation was rarely discussed in the community, and it didn’t seem to Kellner [the father of one of the molested children] that any of the prohibitions in the Ten Commandments explicitly related to it. The most relevant sins—adultery and coveting a neighbor’s belongings—didn’t capture the depth of the violation. . . .

In a community where non-procreative sex is considered shameful, molestation tends to be regarded in roughly the same light as having an affair. When children complain about being molested, the council almost never notifies the police. Instead, it devises its own punishments for offenders: sometimes they are compelled to apologize, pay restitution, or move to Israel.

When I read this, I was like, huh? Haven’t these people heard about the concept of the repeat offender? You’re gonna send them to Israel so they can do it again over there???

But maybe the real point here is I need to think not like a psychologist but like an anthropologist. From my standpoint as a modern middle-class American, child molestation is this horrible crime. But, in other cultures, it’s no worse than, I dunno, whatever offense it is that you could get off the hook by apologizing, paying restitution, or moving to the Middle East. Without really thinking about it, my reasoning had gone like this: (a) We’re talking about an extremely socially conservative community with traditional sex roles, and so (b) They must really really hate child molesters, even more than the rest of us do. Actually, though, the logic seems to be closer to the following: (a) We’re talking about an extremely socially conservative community with traditional sex roles, so (b) They really really hate any suggestion that the traditional system of roles has any problem. Child molestation is viewed in some cultures as not such a big deal.

Again, to return to the title of this post: I feel that I can understand just about any position when it comes to issue attitudes, partisanship, and voting (and I think that this openness, along with the collaboration of four excellent colleagues, gave our Red State Blue State book a broad perspective on American politics), but when it comes to this sort of personal decision making in a social environment—in this case, people going out of their way to keep a child molester on the street—I get stuck. I can go around and around and try to understand, but any explanation still seems to me to be imposed on the system, it doesn’t seem to flow naturally. I think this is a weakness on my part as a social scientist.

The post This is why I’m a political scientist and not a psychologist appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post The Final Four: Cervantes, Hobbes, Dick, Carlin! Today’s semifinal: Miguel de Cervantes (2) vs. Thomas Hobbes appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>He said, ‘No, that’s where I draw the line. I’ll beat ‘em up, but I don’t want to kill ‘em.’ And the government said, ‘Well, if you won’t kill people, we won’t let you beat ‘em up.’

And from Rick, a quote on probability and the baseline fallacy, and from Jonathan, a simple model for a simple purpose.

And today we have two oldsters. I’m rooting for Hobbes because it would be fun to have a political scientist in the final—Raymond Aron never made it past the first round—but you’d be a fool to bet against Cervantes, a man who had, arguably, the toughest route of anyone to the Final Four, having defeated, order, Joan Crawford, Henry David Thoreau, Albert Camus, and Mark Twain. Damn that’s impressive.

P.S. As always, here’s the background, and here are the rules.

The post The Final Four: Cervantes, Hobbes, Dick, Carlin! Today’s semifinal: Miguel de Cervantes (2) vs. Thomas Hobbes appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Stan meetup in NYC on Tuesday appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Free. Registration required: http://www.meetup.com/Stan-Users-NYC/events/221561385/

P.S. Boston, Stan meetups are coming your way.

The post Stan meetup in NYC on Tuesday appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “Thinking about the possibility of spurious correlation isn’t a matter of liking—it should be pretty much automatic.” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Here’s the full paragraph:

Krugman says, “you can, if you like, try to argue that this relationship is spurious, maybe not causal.” Actually, I [Weakliem] liked his original figure, since I agree with Krugman on economic policy. But thinking about the possibility of spurious correlation isn’t a matter of liking—it should be pretty much automatic.

And the full story is here.

That last bit (“it isn’t a matter of liking—it should be pretty much automatic”) holds for a lot of statistical issues, not just spurious correlation. I think that Weakliem’s illustrating a general issue, when someone does a statistical analysis and goes into “story time“—they’ll sometimes bring up a potential objection (in this case, spurious correlation) but not take it seriously. Better to bring it up than to not mention it at all. But better still would be to more fully recognize the limitations of one’s analysis. So good catch by Weakliem on this one.

The post “Thinking about the possibility of spurious correlation isn’t a matter of liking—it should be pretty much automatic.” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post John Updike vs. George Carlin (2); Hobbes and Dick advance appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Nasty, brutish and short — and not necessarily human.

Pretty good, but now that Hobbes has made it into the Final Four, the competition’s stiffer. He’ll need something better than “nasty, brutish, and short” to get past Cervantes and make it into the final.

And today we fill out the Final Four with a contest between an author who Helen DeWitt can’t stand, and a comedian who shares a name with two of my colleagues.

P.S. As always, here’s the background, and here are the rules.

The post John Updike vs. George Carlin (2); Hobbes and Dick advance appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post A rare topical post appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Every year at passover, I struggle to peel two dozen hard boiled eggs and search the web to see if there isn’t a trick to do it better. But all the hits say the same thing: put the eggs in cold water, then bring to a boil.

But this guy [J. Kenji López-Alt] says to do it the other way: First boil the water, and then put in the eggs. He claims the eggs are much easier to peel this way.

I tried it twice, with no comparison (except memory) and it works great. The shells come off amazingingly easy.

So for students who have access to a kitchen, this seems like a great project: compare ease of shelling eggs put into cold water and heating vs. put into boiling water. Lots of issues with experimental design (how many eggs, randomization, blinding, standardizing other variables like how long to cool, how to score the result) and some decisions about how to analyze the data. All doable in an evening (if they have a kitchen). Or even doable in class with a couple of hot plates.

I haven’t taught a stats course in over a decade, so won’t have a chance to try this, so am passing it along…

I followed Motulsky’s link and, ironically, he’s an author of a book he describes as “not a cookbook.” Make of that what you will.

The post A rare topical post appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Round of 8 bracket; Ed Wood (3) vs. Thomas Hobbes; Philip K. Dick (2) vs. Jane Austen appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Paul Davidson sends along the updated bracket, along with the comment that the writers seem to be doing very well. True! The 8 remaining contestants include 5 writers (along with one comedian, one philosopher, and one cult figure).

Yesterday we had no competition because I was afraid that people would not take it seriously on April Fools Day.

In deciding on the winner from 2 days ago, I started with this anonymous comment:

If Quixote comes by you would need a really good translator that knows how to speak Spanish from 500 years ago. That means less cookies for everybody at the seminar.

But this is NYC, we have a lot of Spanish speakers around. If necessary we could find a translator through the U.N. So I’m not so worried about that.

But there was also this other anonymous comment:

Cervantes was foundational and still relevant. That’s a long time to be a candidate for best comic of all time.

Ultimately I had to do with this, from David:

Assuming Hobbes advances, a Cervantes win would setup a final four that pits giants of the 16th century vs. greats of the 20th. Hobbes Cervantes would also be interesting as it represents the inflection point away from Spanish dominance to English.

And, the deciding factor:

Twain would be a no show. He’d be too concerned about the fallout from his autobiography whose publication he wanted delayed for 100 years since he “named names”.

We already knocked Jesus out for being a probable no show, and I guess Twain deserves no better treatment than the Prince of Peace.

And today we have a doubleheader, we’re playing catch-up because this entire competition is timed to end on the same day as the NCAA tournament. First we have a famous cross-dresser vs. Mr. Nasty, Brutish, and Short; and second we have a battle of the sexes, with the quintessential intellectual writer in the very male genre of sci-fi, up against the founder of chick-lit.

How do I handicap these? For the first one, I’d think Hobbes would have a lot more to say, but you can’t count out Ed Wood, a true Cinderella who somehow managed to get by Alan Turing (along with some lesser opponents). For the second contest, both candidates have strengths. Dick’s explorations of the nature of reality fit well with statistics, but Austen speaks to the social scientists in the audience. Dick is fresh off a bout with Gandhi, while Austen finished off Stewart Lee, with Karl Popper before that.

Please comment on both of today’s matches.

P.S. As always, here’s the background, and here are the rules.

The post Round of 8 bracket; Ed Wood (3) vs. Thomas Hobbes; Philip K. Dick (2) vs. Jane Austen appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post There are 5 ways to get fired from Caesars: (1) theft, (2) sexual harassment, (3) running an experiment without a control group, (4) keeping a gambling addict away from the casino, and (5) chapter 11 bankruptcy proceedings appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Schrage: What do you like to tell your academic colleagues about the challenges of real-world experimentation and innovation?

Loveman: Honestly, my only surprise is that it is easier than I would have thought. I remember back in school how difficult it was to find rich data sets to work on. In our world, where we measure virtually everything we do, what has struck me is how easy it is to do this. I’m a little surprised more people don’t do this.

We next heard about Caesars in June 2012 from Sarah Koenig on This American Life. She reported on the elaborate efforts that Caesars puts in to bring gambling addicts back to the casino, again and again until there was nothing left to squeeze from them, busting them out (as Henry Hill might say). So that’s where all that data-driven research goes: to target people’s weaknesses and destroy them.

And then I opened today’s paper and saw this news article by Willam Alden:

Caesars Unit Files for Chapter 11 Bankruptcy Protection

The casino operator Caesars Entertainment was locked in a bitter standoff with some creditors on Thursday after it put its largest unit into bankruptcy. Caesars — which has labored under a mountain of debt since it was acquired by two private equity firms for about $30 billion in 2008 — filed to put the unit into Chapter 11 bankruptcy protection . . . But before it could begin in earnest, the Chicago bankruptcy process was halted by a judge in Wilmington, Del., who is overseeing a rival attempt by the creditors to force the Caesars unit into a separate bankruptcy on their terms. . . .

Whatever the outcome, the bankruptcy of the unit — which owns Caesars Palace in Las Vegas as well as numerous other casinos and hotels around the country — provides a vivid demonstration of the risks of buying companies by loading them up with debt. Caesars, whose buyout was emblematic of private equity’s golden age before the financial crisis, has since become a symbol of that industry’s excesses. . . .

But there’s this:

The company plans to keep its operations open as normal.

I wonder if they’re still doing the aggressive hooking of addicts, or whether they’re winding that down now that they’re going into bankruptcy.

P.S. I looked up Michael Schrage and found this:

Schrage has advised segments of the national security community on cyberconflict and cybersecurity issues. . . . Schrage helped launch a series of workshops sponsored by the Department of Defense on federal complex systems procurement.

What could possibly go wrong???

P.P.S. Some commenters seemed to think I was implying that the sleaze practiced by the Caeasars management was the cause of the bankruptcy. I wasn’t implying this at all. My impression from my occasional reading on the topic is that the company was in trouble before the business analytics gurus got involved. So it seems most likely that they did not sink the ship; rather, as the ship was sinking, they got on board and picked the pockets of a bunch of passengers before it was all over.

I am a bit disturbed that someone who’s involved in Defense Department procurement has a sideline in glorifying scammy business practices. And, as an MIT graduate, I really hate to see this sort of thing in Technology Review.

The post There are 5 ways to get fired from Caesars: (1) theft, (2) sexual harassment, (3) running an experiment without a control group, (4) keeping a gambling addict away from the casino, and (5) chapter 11 bankruptcy proceedings appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Enough with the replication police appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Can’t those shameless little bullies just let scientists do their research in peace?

If a hypothesis test is statistically significant and a result is published in a real journal, that should be enough for any self-styled skeptic.

Can you imagine what might happen if any published result could be questioned—by anybody? You’d have serious psychology research being grilled by statisticians, and biology research being called into question by . . . political scientists?

Where did that come from? I don’t ask my hairdresser to check my math calculations, I don’t ask my house cleaner to repair my TV, and I sure as heck wouldn’t trust a political scientist to vet a biology paper.

When something’s published in the Journal of Theoretical Biology, pal, it’s theoretical biology. It’s not baseball statistics. Don’t send an amateur to do a pro’s job.

Simply put, peer review is a method by which scientists who are experts in a particular field examine another scientist’s work to verify that it makes a valid contribution to the evidence base. With that assurance, a scientist can report his or her work to the public, and the public can trust the work.

And then there’s multiple comparisons. Or should I say, “p hacking.” Christ, what a load of bull. You replication twits are such idiots. If I publish a paper with 9 statistically significant results, do you know what the probability of this is, if the null hypothesis were true? It’s (1/20) to the 9th power. Actually much lower than that: you’d get (1/20)^9 if all the p-values were .05, but actually some of these p-values are even lower, like .01 or .001. Anyway, even if it’s just (1/20)^9, do you know how low that is?

Probably not, you innumerate git.

So let me tell you, it’s 1.953125e-12, that’s 0.00000000000195312. Got that? No way any amount of multiple comparisons can cover that one. If I find 9 statistically significant results, my result is real. Period. I don’t care how many people can’t replicate it. If they can’t replicate it, it’s their problem.

p < 0.00000000000195312. You can take that one to the bank, pal.

OK, let’s be systematic. Suppose I do a study and it is statistically significant and I publish it—it’s hard to get a paper published, dontcha know?—and then some little Dutch second-stringers raise some pissant objection on some blog, and then they sandbag me with some lame-a$$ “replication.” OK, fine. There are two possibilities, then:

1. My study replicates. Good. So shut the f^&#!@ up. Or,

2. The so-called replication fails. This doesn’t mean squat. All it tells us is that the world is complicated. We already knew that.

Science is about exploration, not criticism. Let’s be open-minded. Personally, I’m open-minded enough to believe that women’s political preferences change by 20 percentage points during their monthly cycle. Why not? What are you, anti-science? OK, ok, I’m not so sure that Daryl Bem found ESP—but I think we’re a damn sight better off giving him the benefit of the doubt, than censoring any result that doesn’t fit our high-and-mighty idea of what is proper science.

Jean Piaget never did a preregistered replication. Nor, for that matter, did B. F. Skinner or Sigmund Freud or Barbara Fredrickson or that Dianetics guy or all the other leading psychologists of the last two centuries.

What did Piaget and the rest of those guys did? They did what all the best scientists did: they ran questionnaires on Mechanical Turk, they found p<.05, and they published in Psychological Science. If it was good enough for Jean Piaget and B. F. Skinner and William James and Daryl Bem and Satoshi Kanazawa, it's good enough for me.

So take those replications and stick ’em where the sun don’t shine, then crawl back under the rock where you came from, you little twerp. The rest of us won’t even notice. Why? Cos, while you’re sniping and criticizing and replicating and blogging, we’re busy in our labs. Doing science.

**P.S.** Someone asked me where some of the above quotes came from. Here are some sources:

“Replication police” and “shameless little bullies” here

“Simply put, peer review is a method by which scientists who are experts in a particular field examine another scientist’s work to verify that it makes a valid contribution to the evidence base. With that assurance, a scientist can report his or her work to the public, and the public can trust the work” here

“Second stringers” here

“Little twerp” here (ok, it’s nothing to do with the topic at hand, but the phrase fit in so well that I included it).

Other lines above are generally based on things I’ve read but are not exact quotes.

The post Enough with the replication police appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>