The post Comedy book with surefire can’t-miss formula, misses appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I checked it out and . . . jeez was it boring. It’s hard to imagine you could interview a bunch of comedians and come up with something so blah. This book is only a million times worse than The Last Laugh, which I blurbed here.

I did a quick web search . . . hmmm, the book was reviewed by the New York Times, the Boston Globe, featured on NPR . . . yeah, that all makes sense, as it *sounds* like a great idea for a book.

Here’s the Times review, which is pretty much what I would say. The book might be interesting to historians but it’s surprisingly non-entertaining.

Or, to ba fair, maybe I should flip it around: The book is not entertaining but it might be interesting to historians. Who says books have to be entertaining? Bayesian Data Analysis is not particularly entertaining, so why do I ask that of others.

Still, if you’re interested in the recent history of comedy, I recommend you start with The Last Laugh.

The post Comedy book with surefire can’t-miss formula, misses appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post We got mooks appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Columbia University’s Data Science Institute is releasing some mooks, and I’m part of it. I’ll first give the official announcement and then share some of my thoughts.

The official announcement:

The Data Science Institute at Columbia University is excited to announce the launch of its first online-education series, Data Science and Analytics in Context, on Dec. 14. Available through the edX platform, the three-course series will run through April, featuring lectures, engaging exercises and community discussion.

The first course, Statistical Thinking for Data Science and Analytics, teaches the statistical foundations for analyzing large datasets. You will learn how data scientists design the data collection process, gain insights from visualizing data, find supporting evidence for data-based decisions and construct models for predicting future trends.

The second course, Machine Learning for Data Science and Analytics, is an introduction to machine learning and algorithms. In this course, you will develop a basic understanding of machine learning principles and how to find practical solutions using predictive analytics. We will also examine why algorithms play an essential role in Big Data analysis.

The third course, Enabling Technologies for Data Science and Analytics, explores the major components of the Internet of Things, including data gathering sensors. You will develop an understanding of how software is able to analyze events, recognize faces, interpret sentiment on social media, and how this information is fed into the decision-making process.

Learn from leading data scientists at Columbia University with guidance provided by Columbia graduate assistants during each course. Watch the video trailer for the series online at ColumbiaX and enroll today!

Link for video – https://www.youtube.com/watch?v=ahvuPvm-1YU

Link to enroll – https://www.edx.org/xseries/data-science-analytics-context

My perspective:

The mooks were organized by a group at our new Data Science Institute, including Prof. Tian Zheng, a friend and colleague of mine in the statistics department. I prepared two lectures, one on Bayesian data analysis and one on exploratory data analysis and visualization. The content was not super-organized; I just used some material I had around, including some of my favorite recent stories such as the Xbox polls and the age-adjusted death rates. I’m not sure how well they went because I hate looking at videos of myself. I did see clips from some of the other lectures and they looked pretty good.

Last year I prepared an intro stat course for the School of International and Public Affairs. I taped twelve 40-minute lectures, and along with each were R sessions with Ben Goodrich. These taped lectures were super-smooth; I actually ended up writing scripts for all of them because I sounded too awkward when I simply spoke as if I were giving a usual talk. In contrast, these new mooks are more like classroom lectures; it’s a different feel entirely.

Anyway, I hope this goes well. Organizing a remote course on data science seems like a real challenge, and it seems like a reasonable starting point to get different people to give different lectures on their areas of expertise. I suppose much will depend on the homework assignments and the student feedback. I was happy to contribute my parts, small as they were.

The post We got mooks appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post You’ll never believe what this girl wrote in her diary (NSFW) appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>This made me think that it may be that a diary is more useful for statistics than for psychology. I don’t know why that would be, but it’s a thought. It also could be that this sort of introspective assignment is par for the course in psychology but unusual in statistics.

Arber responded:

That’s an intriguing idea. Students in this class (Perspectives on Human Nature) may have seen the diary as just another “assignment” whereas students in your class may have treated it as an “activity.”

Your claim, if I understand it correctly, makes an interesting prediction, that students in, say, English or Comparative Literature classes would find the diary the least useful (and, perhaps, students in set theory or particle physics courses would find it the most useful).

Maybe so. From another perspective, statistics and psychology have the feature that they connect to many different aspects of everyday life. A particle-physics or set-theory diary might end up being a bit more inward-focusing, although that could be good in its own right.

Anyway, this is just speculation based on small and non-representative samples. . . .

The post You’ll never believe what this girl wrote in her diary (NSFW) appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “iPredict to close after Govt refuses anti-money laundering law exemption” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The site, run by Victoria University of Wellington’s commercialisation arm, VicLink, issued a statement to its website and on Twitter on Thursday.

According to the iPredict statement, Associate Justice Minister Simon Bridges refused to grant it an exemption from the Anti-Money Laundering and Countering Financing of Terrorism Act, declaring that it was a “legitimate money laundering risk” because of the lack of customer due diligence. . . .

Geoff Todd, managing director of VicLink, said the website had been caught in a legal loophole which had caused problems globally.

“Predictions markets aren’t financial markets, and they’re not gambling, but the legislation is very binary. You’re either gambling or you’re a financial market.” . . .

The purpose of the website was to bring private information into the public domain, Todd said, meaning it was unreasonable to ask users to give their identities. A number of staff around Parliament, as well as MPs, are known to have used iPredict. . . .

The post “iPredict to close after Govt refuses anti-money laundering law exemption” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Boston Stan meetup 1 Dec appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Using Stan for variational inference, plus a couple lightning talks

Dustin Tran will give a talk on using Stan for variational inference, then we’ll have a couple lightening (5 minute-ish) talks on projects. David Sparks will talk, I will talk about some of my work and we’re looking for 1-2 more volunteers. If you have a project or idea of a project using Stan you’d like to talk about let me know!

Thanks to RStudio for sponsorship of the meeting.

Cool!

The post Boston Stan meetup 1 Dec appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Gary McClelland agrees with me that dichotomizing continuous variables is a bad idea. He also thinks my suggestion of dividing a variable into 3 parts is also a mistake. appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I remain convinced that discretizing a continuous variable, especially for multiple regression, is the road to perdition.

Here I explain my concerns. First, I don’t buy the motivation that discretized analyses are easier to explain to lay citizens and the press. Second, I believe there is an error in your logic for computing the relative efficiency for splitting into three groups. Third, and most importantly, dichotomizing or trichotomizing two or more continuous variables in a multiple regression is an especially bad idea. In such cases, the loss of efficiency is irrelevant because the discrete predictor variables have a different correlation than the continuous variables. As a consequence, the parameter estimates from the discrete analysis are biased. I’ll explain all three issues in some detail.

1. I just don’t buy the motivating issue—that the essence of a regression analysis can’t be explained to lay people. Explaining a regression result in terms of differences between averages is fine with me, but that doesn’t require a dichotomized analysis. We assume there is some true population difference between the average case in the top group (whether that be the top half, top third, top 27%, top quarter) and the average case in the bottom group. Let’s call those two population means muH and muL (for high and low). Our goal is to estimate that population mean difference. We, as statisticians, have two (at least) ways to estimate that mean difference muH – muL.

a. We do the split and compute the corresponding averages ybarH and ybarL and our estimate of muH – muL is ybarH – ybarL.

b. We regress y on x, as originally measured, to obtain yhat = b0 + b1 x. then we estimate muH – muL using (b0 + b1 xbarH) – (b0 + b1 xbarL) = b1(xbarH – xbarL).

Both are unbiased estimates of muH – muL and both can be described as “our data estimates the difference between the average person in the top group in the population and the average person in the bottom group in the population is …” The only difference between the two methods is that the variance of the estimate in (a) is greater than the variance of the estimate in (b). That implies that there will be many times when the estimate in (a) is either higher or lower than the estimate in (b). Hence, the two analyses will seldom agree on the magnitude of the raw effect. That gives researchers another degree of freedom to report the estimate that better fits their argument. We should use the the more precise regression estimate (b) and explain it in terms of a mean difference between high and low groups. If we are communicating to a lay group we should give them our best estimate and that is b1(xbarH – xbarL). We don’t need to explain to them how we got our estimate of muH – muL unless they ask. and even then the explanation isn’t that difficult. “We compared our prediction for a person with an average score in the top group to our prediction for a person with an average score in the low group.”

2. Using extreme groups in a three-way split of a single continuous predictor: The error in your mathematical analysis, I believe, is the assumption that the residual variance remains constant and is therefore the same in Eq. 4 and Eq. 5 in Gelman & Park (2008). It is easy to disprove that assumption. The residual variance is V(e) = V(Y)(1-r^2). Discretizing changes the correlation between X and Y. Furthermore, restricting the values of Y to cases that have extreme values of X will necessarily increase the V(Y). The exception is that when r = 0, V(Y) will be unchanged. Hence, your claims about the relative efficiency of extreme groups apply if and only if r = 0. In an attached Mathematica notebook (also included the pdf if you don’t use Mathematica) and an attached R simulation file, I did a detailed analysis of the relative efficiency for different values of b in the model Y = b X + e. This graph summarizes my results:

The curves represent the relative efficiency (ratio of the variances of the estimates of the slopes) for, top to bottom, slopes of b = 0, .25, 0.5, 1, 2, and 3. Given the assumption that V(e) = 1 in the full data, these correspond to correlations, respectively, of r = 0, 0.24, 0.45, 0.71, 0.89, and 0.95. The top curve corresponds to your efficiency curve for the normal distribution in your Figure 3. And, as you claim, using an extreme group split (whether the keep fraction is 0.2, 0.25, 0.27, or 0.333) is superior to a median split at all degrees of relationship between X and Y. However, relative efficiency declines as the strength of the X,Y relationship increases. Note also that the optimal fraction to keep shifts lower as the strength of the relationship increases.

Are these discrepancies important? For me and my colleagues in the social sciences, I decided the differences were of interest to geeky statisticians like you and me but probably not of practical importance. Within the range of most social science correlations (abs(r) < 0.5), the differences in the efficiency curves are trivial. And if a social scientist felt compelled for reasons of explainability to discretize the analysis, then I certainly agree that doing an extreme-groups analysis is preferable to doing a median split. However, if a researcher studying a physical industrial process (where the correlation is likely very high and high precision is desired) were tempted to do an extreme-groups analysis because it would be easier to explain to upper management, I would strongly advise against it. The relative efficiency is likely to be extremely low. On the right axis I’ve indexed the factor by which the sample size would need to be increased to compensate for the loss of efficiency. The price to be paid is extremely high. 3. When two or more correlated variables are analyzed via multiple regression, discretizing a continuous variable is a particularly bad idea not only because of reduced efficiency, but more importantly because discretizing changes the correlational structure of the predictors and that leads to bias in the parameter estimates. Most of the discussion in the set of median split papers in JCP concerned whether one could get away with splitting a single continuous variable which was to be analyzed in multiple regression with another continuous variable or as a covariate in an ANCOVA design. We thought both the considerable loss of power and the induced bias as a function of the predictor correlation were adequate reasons to reject such dichotomizations. I will be interested to see what your take is on that. However, I believe that doing an analysis with two discretized variables, whether by median splits or by “thirds” is a terrible idea because of the bias it induces. For median splits of two predictors with a bivariate normal distribution with correlation rho = 0.5, I can show analytically that the correlation between the dichotomized predictors will be 0.33, resulting in a confounding of the estimated slopes. Specifically, b1 = (5 b1 +b2)/6 and b2 = (b1 + 5 b2)/6. That is not good science. In the case of trichotomizing the two predictors and then using the extreme four corners of the 3 x 3 cells, I can show analytically that the predictor correlation INCREASES from 0.5 to 0.7. You can see why the correlation is enhanced in the bivariate distribution with correlation rho = 0.5 in this contour plot:

Using only the four extreme cells makes the correlation appear stronger.

I haven’t yet worked through analytically the bias this will cause, but I have experimented with simulations and observed that there is an enhancement bias for the coefficients. If one coefficient is larger than the other, then the value of the larger coefficient is unbiased but the value of the smaller coefficient is increased (i’ve been working with all positive coefficients). For example, when predictors x and z are from a bivariate normal distribution with correlation 0.5 and the model is y = x + 2 z + e, then the cut analysis yields coefficient estimates of 1.21 and 2.05. The 21% enhancement of the smaller coefficient isn’t just bad science, it isn’t science at all. The source of the problem can be seen in the comparison of two tables. The first table is the predicted means using the regression equation for the full model applied to the actual cell means for x and z.

-3.7 1.2

-1.2 3.7The following table is the mean y values for each cell (equivalently, the model derived from the cut variables).

-4.0 1.0

-1.0 4.0In other words, the cut analysis exaggerates the differences in the cell means. This arises because the cut analysis forces a false orthogonal design. This is confounding in the same sense that bad experimental designs confound effect estimates.

A particularly disturbing example is for the model y = x + 0*z + e, the coefficients for the cut analysis are 0.96 and 0.11, a spurious effect for z. This can be seen in the table of means for the four groups:

-1.3 -1.1

1.0 1.3In fact, the columns should have been identical as in

-1.22 -1.23

1.22 1.21consistent with the null effect for z. This spurious effect is akin to the problem of spurious effects due to dichotomizing two variables identified by Maxwell & Delaney (1993).

In short, discretizing two continuous predictors has no place in the responsible data analyst’s toolbox. At the end of section 2.8 you describe doing the appropriate full analysis as an option. I strongly disagree this is optional—it is a requirement.

You present bivariate analyses across years both for a continuous analysis (Figure 5) and an extreme-groups analysis (Figure 6). If both ways of estimating the effects were unbiased and equally efficient, we would expect the rank order of a given effect across years to remain the same as well as the rank order of the three effects for a given year to remain constant. Neither seems to be the case. The differences are not large relative to the standard error so perhaps these differences are just due to the increased variability of the discretized estimates. However, if religious attendance and income are correlated and especially if the degree of this correlation changes over the years, then I suspect that some of the differences between Figures 5 and 6 are due to bias induced by using discretized correlated predictors. I think the logits of Figure 5 transformed back to probability differences would have been more appropriate and no more difficult to explain.

I also am attaching a 5th paper in the JCP sequence—our effort at a rebuttal of their rebuttal that we posted on SSRN.

For the quick version, here’s McClelland’s note, which begins:

Gelman & Park (2008) argue that splitting a single continuous predictor into extreme groups and omitting the middle category produces an unbiased estimate of the difference and, although less efficient than using the continuous predictor, is less destructive than the popular median split. In this note I show that although their basic argument is essentially true, they overstate the efficiency of the extreme splits. Also their claims about optimal fractions for each distribution ignores a dependency of the optimal fraction on the magnitude of the correlation between X and Y.

In their Equations 4 and 5, Gelman & Park assume that the residual variance of Y is constant. It is easy to show that is not the case when discretizing a continuous variable, especially when using extreme groups. . . .

I don’t have time to look at this right now, but let me quickly say that I prefer to model the continuous data, and I consider the discretization to just be a convenience. I’ll have to look at McClelland’s notes more carefully to see what’s going on: is he right that we were overstating the efficiency of the comparison that uses the discretized variable? Stay tuned for updates.

**P.S.** I don’t want to make a big deal about this, but . . . this is the way to handle it when someone says you made a mistake in a published paper: you give them a fair hearing, you don’t dismiss their criticisms out of hand. And it’s more than that: if you have a reputation for listening to criticism, this motivates people to make such criticisms openly. Everybody wins.

The post Gary McClelland agrees with me that dichotomizing continuous variables is a bad idea. He also thinks my suggestion of dividing a variable into 3 parts is also a mistake. appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Beyond the median split: Splitting a predictor into 3 parts appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>My reply:

I think it does not generally make sense to break a variable into 2 parts. Breaking into 3 parts is better; see this paper from 2008. We recommend splitting a predictor into three parts and then coding the trichotomized variable as -1, 0, 1. This allows the ease of interpretation of the median split (you can just compare the upper to lower parts) but at a great increase in efficiency.

Key figure:

For pure efficiency it’s still best to keep the continuous information, of course, but if you have the a goal of clarity in exposition and you are willing to dichotomize, I say: trichotomize instead.

We used this trick in Red State Blue State to compare incomes of rich and poor voters, and rich and poor states, without having to run regressions (which I felt would add an unnecessary level of complexity in exposition). Indeed, David Park and I performed the analysis that led to this paper as a side project during the writing of Red State Blue State.

I had the impression this idea of trichotomizing was well known in psychometrics (when we got our paper back from the journal reviewers, they pointed us to a paper by Cureton from 1957 and papers from Kelley in 1928 and 1939!) so I was surprised to see the above papers which were all about dichotomizing. In many applications, though, I’ve seen people dichotomize in ways that seem wasteful and inappropriate. So I guess it’s necessary for experts to continue to write papers explaining why dichotomizing is generally a bad idea.

I suspect that part of the motivation for dichotomizing is that people like to think deterministically. For example, instead of giving people a continuous depression score and recognizing that different people have different symptoms at different times, just set a hard threshold: then you can characterize people deterministically and not have to think so hard about uncertainty and variation.

As Howard Wainer might have said at some point or another: people will work really hard to avoid having to think.

The median split indeed: what a joke! Truly a crime to throw away data like that, unless you really have way more than you need.

The post Beyond the median split: Splitting a predictor into 3 parts appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post I already know who will be president in 2016 but I’m not telling appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>One debate in political science right now concerns how the economy influences voters. Larry Bartels argues that Q14 and Q15 impact election outcomes the most. Doug Hibbs argues that all 4 years matter, with later growth being more important. Chris Wlezien claims that the first two years don’t influence elections but the second two do.

After 2000, Larry Bartels and John Zaller used Bayesian model averaging (BMA) to assess why election models overestimated the Gore vote in their pre-election forecast. Erikson, Bafumi and Wilson (2002) had some beefs with their arguments and conclusions. But both groups agreed BMA is useful for understanding elections. Do you think BMA could help evaluate these arguments and the uncertainties surrounding them?

My reply:

There are interesting debates on how the economy influences voters and also how presidents influence the economy, and in both cases the question of timing comes up. With regard to the particular question above, I haven’t looked at the details myself, but my guess is that the data at hand would not be enough to decide among these various theories. Ultimately, “N” is not large, and you have to use outside criteria to decide what model to go with. I doubt that Bayesian model averaging will give you much here. But I guess it won’t hurt.

The post I already know who will be president in 2016 but I’m not telling appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Top 9 questions to ask a statistician appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I am a student at . . . We have been given an assignment that requires us to interview a professional in the criminal justice field who performs, or has performed, statistical analyses on social science related data. . . . We are supposed to collect information pertaining to job description, job responsibilities, typical research projects involved in, etc.

Would you be willing to allow me to interview you via email with questions pertaining to those concepts l? It would only take a few minutes of your time and would be greatly appreciated!

I said sure. Here are the questions I got:

1. Name, title and contact information

2. Job Description, Salary (completely optional)

3. Job Responsibilities

4. Typical Activities

5. Types of Research Activities Involved in

6. Types of Statistical Analyses you engage in

7. Any recommendations to be made to students interested in obtaining a similar job

8. Any education or training that can best prepare students for a similar job

9. Aspects of your job that you most enjoy/least enjoy

And here was my reply:

1. Info is on my webpage

2. Research, teaching and service

3. Teaching classes, advising students, participating in curriculum design, doing research

4. Computing, writing, teaching, meetings with collaborators

5. Social science, public heath, statistical methods

6. Fitting models, graphing data, graphing fitted models

7. Develop useful skills, work on real problems and when you get stuck, ask lots of questions. Accept that your work will be imperfect. Criticize your own work mercilessly and always work on doing better.

8. Training in statistics or related fields such as quantitative political science, economics, or sociology

9. Most enjoy working with colleagues on interesting projects. Nothing that I least enjoy!

The post Top 9 questions to ask a statistician appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post If a study is worth a mention, it’s worth a link appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Can’t the NYT provide a link or an explicit reference to the JAMA Internal Medicine article underlying this OpEd? A reader could then access the original piece and judge its credibility for himself

I replied: Yes, very tacky of the author not to even mention the authors of the study, nor to give a title or a link. And poor practice of the NYT editors not to demand this. Also ironic that one of his policy recommendation is to “require that doctors provide patients with data about a procedure, including its rate of success, complications and the like, before every major intervention”—but he can’t be bothered even to provide a link to the study.

P.S. I’m not saying that every fact or even every obscure reference in an article needs to be references. Readers can always use Google and Wikipedia on their own. But when the entire basis of a column is a published study, it’s poor form not to link, and even poorer form not to say who did the study.

P.P.S. Sometime after I posted this, the NYT editors slipped in a link to the original study, which is by Anupam Jena, Vinay Prasad, Dana Goldman, and John Romley. See here for the original version of the op-ed which had no link. The newspaper version of course had no link, nor did it name any of the study’s authors.

The post If a study is worth a mention, it’s worth a link appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Flatten your abs with this new statistical approach to quadrature appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Philipp Hennig, Michael Osborne, and Mark Girolami write:

We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. . . . We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. We then show that the probabilistic view suggests new algorithms that can flexibly be adapted to suit application specifics, while delivering improved empirical performance. We provide concrete illustrations of the benefits of probabilistic numeric algorithms on real scientific problems from astrometry and astronomical imaging, while highlighting open problems with these new algorithms. Finally, we describe how probabilistic numerical methods provide a coherent framework for identifying the uncertainty in calculations performed with a combination of numerical algorithms (e.g. both numerical optimisers and differential equation solvers), potentially allowing the diagnosis (and control) of error sources in computations.

“A call to arms,” huh? That sounds very British indeed. In any case, this all seems very important, the idea of treating computing problems as statistical problems. Worth a look.

The post Flatten your abs with this new statistical approach to quadrature appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Benford lays down the Law appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>A few months ago I received in the mail a book called An Introduction to Benford’s Law by Arno Berger and Theodore Hill. I eagerly opened it but I lost interest once I realized it was essentially a pure math book. Not that there’s anything wrong with math, it just wasn’t what I wanted to read.

But, hey, the book got reviewed by Frank Benford himself (well, actually, Frank Benford the grandson of Frank Benford himself), and the review has some math too. Should be of interest to some of you.

**P.S.** On the applied end, Ethan Rouen sends along a recent paper, Financial statement errors: evidence from the distributional properties of financial statement numbers, where Dan Amiram, and Zahn Bozanic, and Rouen “use Benford’s Law to measure the amount of error contained in a firm’s financial statements and show how the law can be used to help predict fraud.”

Here’s an example:

Which somehow reminds me of this classic plot:

The post Benford lays down the Law appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post 4 California faculty positions in Design-Based Statistical Inference in the Social Sciences appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The University of California at Riverside is hiring 4 open rank positions in Design-Based Statistical Inference in the Social Sciences. I [Cummins] think this is a really exciting opportunity for researchers doing all kinds of applied social science statistical work, especially work that cuts across traditional disciplinary boundaries.

Relevant disciplines include, but are not limited to, Business, Economics, Education, Medicine (Epidemiology and Public Health), Political Science, Public Policy, Sociology, and Statistics/Biostatistics. We seek candidates who excel at developing, testing, and applying cutting-edge research designs and statistical methods for causal identification. Successful candidates might make theoretical and methodological contributions to causal inference, develop novel experimental designs, conduct Bayesian meta-analysis, program evaluation, applied econometrics, or political methodology, and will show an interest to work across traditional disciplines and ability to attract extramural funds. Review of the applications will begin January 8, 2016 and will continue until the position is filled. Senior application materials should be submitted to https://aprecruit.ucr.edu/apply/JPF00468. Junior application materials should be submitted to https://aprecruit.ucr.edu/apply/JPF00469.

UCR is embarking on a major new hiring initiative (http://clusterhiring.ucr.edu) that will add 300 tenured and tenure-track faculty in 33 cross-disciplinary areas and invest in research infrastructure to support their work. This initiative will build critical mass in vital and emerging fields of scholarship, foster truly cross-disciplinary work and further diversify the faculty at one of America’s most diverse research universities.

Here’s the job posting.

The post 4 California faculty positions in Design-Based Statistical Inference in the Social Sciences appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Stan Puzzle 2: Distance Matrix Parameters appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>**Part I: Constrained Parameter Definition**

Define a Stan program with a transformed matrix parameter d that is constrained to be a K by K distance matrix. Recall that a distance matrix must satisfy the definition of a metric for all i, j:

* *positivity* : d[i,j] >= 0

* *self distance* : d[i,j] = 0 iff i = j

* *symmetry* : d[i,j] = d[j,i]

* *triangle inequality* : d[i,j] <= d[i,k] + d[k,j] for all k

**Part II: Modeling Assumptions**

Now suppose there are noisy measurements y[n] of the distance between points ii[n] and jj[n]. This corresponds to a Stan program with the following data block.

data { intK; // num dims int N; // num observations int ii[N]; // first point of observation n int jj[N]; // second point of observation real y[N]; // distance measurements (could bound below by 0) }

A likelihood that assumes independent normal noise on the measurements is defined as follows.

model { for (n in 1:N) y[n] ~ normal(d[ii[n], jj[n]], sigma); }

This assumes a positive-constrained parameter sigma for the noise scale.

parameter { realsigma; // measurement noise scale ...

Feel free to give it a prior taking into account the distance scales involved in the problem and the measurement process.

Then there are then (K choose 2) free parameters, which may be (a) left unconstrained, (b) constrained to be positive, or (c) transformed to a proper distance matrix satisfying the distance metric conditions. What is the effect on the posterior of these three modeling assumptions?

**Part III: Complexity**

What’s the complexity of evaluating the likelihood with and without the distance-matrix constraint on the parameters?

**Hints**

*Hint 1 (I)* : There should be (K choose 2) [i.e., K * (K – 1) / 2] “raw” parameters (either unconstrained or constrained to be positive) which then need to be scaled.

*Hint 2 (I)* : See the manual chapter “Transformation of Constrained Variables” and in particular the section on lower and upper bounded variables if you want hints on the Jacobians necessary.

*Hint 3 (I)* : Order the transform so the Jacobian’s easy to compute.

*Hint 4 (II)* : The only way I know how to do this is by simulation, but a bunch of the other Stan developers probably just know the answer.

*Hint 5 (III)* : The answer should be should be an expression involving N and K, such as for some function .

*Hint 6 (III)* : Count the number of expressions that get evaluated in the transform and in the model block.

The post Stan Puzzle 2: Distance Matrix Parameters appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Tip o’ the iceberg to ya appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Paul Alper writes:

The Washington Post ran this article by Fred Barbas with an interesting quotation:

“Every day, on average, a scientific paper is retracted because of misconduct,” Ivan Oransky and Adam Marcus, who run Retraction Watch, wrote in a New York Times op-ed in May.

But, can that possibly be true, just for misconduct alone and not including honest mistakes? Science researchers would then seem to belong in the same league as used car dealers.

My reply: I don’t give two poops about Dr. Anil Potti but, sure, with over a million scientific papers published a year, I’d think that we should be seeing tens of thousands of retractions a year. Just as a point of reference, I’ve published a few hundred research papers and retracted two of them (or I’ve run corrections which I consider the equivalent of retraction). I think I’m a pretty good scientist so I’m guessing that other people’s retraction rates should be higher than my approx 1/2 of 1%.

If tens of thousands of papers should be retracted each year, that comes to something like 100 a day. So if only 1 paper a day is retracted, my guess is that something like 99% of the papers that should be retracted, aren’t.

The post Tip o’ the iceberg to ya appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post I like the Monkey Cage appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>For better or worse, political science does not have a high profile in the news media or, I think, in the public policy world. We sometimes get quoted on some technical matters regarding campaigns, polls, and elections, but, other than that, political scientists’ inputs into the public discourse are sporadic. Sure, there was “bowling alone,” and “broken windows,” and “the clash of civilizations,” but I don’t get a sense that, when political issues come up, political scientists are the go-to sources for explanation and commentary.

That might be just fine. Maybe the contributions of political science are pretty much irrelevant to politics. It’s not my place to say that news coverage of these issues should change.

But one thing this all *does* mean, is that when political scientists *do* enter the discussion, it can be in a weird, contextless way, not representative of the field of political science and not necessarily making much sense at all.

One problem, I suspect, is that because poli sci is not on journalists’ radar screen, they have no way of evaluating the input they get from political scientists. It’s as if they were reviewing literature in translation from some distant northern land.

I was thinking about this the other day after coming across a horrible op-ed in the New York Times the other day written by someone named Gerard Alexander who is labeled as a political scientist.

I have nothing against Gerard Alexander either as a person or a political scientist. I did not happen to be familiar with his work but I see on Google that it’s been cited many times, and it might be excellent. He perhaps was having a bad day when writing that op-ed.

But that the editors at the NYT had no way of evaluating this piece. It was by a political scientist, and . . . who knows, really?

I took the opportunity on the Monkey Cage to extract something useful (I hope) out of Alexander’s op-ed, as to me it was an interesting, if unwitting, example of political polarization.

And this brings me to the point of today’s post. In the Monkey Cage, political science is presented in context. I’m not saying every post on the Cage is perfect, far from it. Some recent Monkey Cage claims that I’ve publicly disagreed with include “Liberals smell better to other liberals than to conservatives,” “Here’s how a cartoon smiley face punched a big hole in democratic theory,” and “Could non-citizens decide the November election?” But that’s fine: research is messy, and we also have room to give second opinions. But there is a context, there’s a steady stream of political science, rather than what we see in the NYT op-ed page, which is random things that happen to pop in.

I’m hoping the Monkey Cage will have a positive impact, not just at the Washington Post, but also at other media outlets, as editors start to sense that political science is a field of research and scholarship and not just an excuse for people to polemicize.

The post I like the Monkey Cage appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post First, second, and third order bias corrections (also, my ugly R code for the mortality-rate graphs!) appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>As an applied statistician, I don’t do a lot of heavy math. I did prove a true theorem once (with the help of some collaborators), but that was nearly twenty years ago. Most of the time I walk along pretty familiar paths, just hoping that other people will do the mathematical work necessary for me to fit my models (for example, taking care of all the intricacies of implementing differential equation models in Stan, or developing the mathematical tools necessary to derive algorithms to sample from difficult distributions).

Every once in awhile, though, I’m reminded that a baseline level of mathematical expertise allows me (and others with similar training) to see problems from a distance and resolve them as necessary. This sort of mathematical skill can be nearly invisible while it is being applied, and even afterward it’s not always apparent what was being done.

Mathematical understanding can be used not just to *solve* a well-formulated problem; it also helps us decide what problems are worth solving in the first place.

I thought of this general point after some back-and-forth regarding a recently published article by Anne Case and Angus Deaton on trends in death rates. If you haven’t been following this story on the blog, you can read my recent Slate article for some background.

The study was first summarized as an increase in death rates for 45-54-year-old non-Hispanic white Americans (see, for example, Ross Douthat and Paul Krugman), but after “age adjustment”—that is, correcting for the change in age distribution, standardizing to a common distribution of ages—the pattern looks much different. We then learned more by looking at other ages and breaking up the data for men and women. The biggest part of the story is a comparison to mortality trends in other countries, but I won’t get into that now. Here I’ll be focusing on the U.S. data.

**First-order correction**

What I want to talk about is the value of a mathematical understanding of different sorts of bias correction, a kind of thinking that is known by many statisticians but is rarely part of the formal curriculum—we learn it “on the street,” as it were.

Let’s start with a first-order bias. Here’s a graph of #deaths among 45-54-year-old non-Hispanic whites in the U.S., based on data taken directly from the CDC website:

But that’s just raw number of deaths. The population is increasing too. Let’s take a look:

Hey—the population increased and then decreased in this age group! That’s the baby boomers entering and leaving the 45-54 category. Anyway, this population pattern tracks pretty closely to the #deaths pattern.

Looking at trends in number of deaths without adjusting for population is like looking at nominal price trends without adjusting for inflation. It’s a first-order bias, and (almost) everyone knows not to do it.

**Second-order correction**

So the natural step is to look at changes in mortality *rate*, #deaths/#people in this group:

But then we have to worry about another bias. As noted above, the baby boom generation was moving through, and so we’d expect the average age among 45-54-year-olds to be increasing, which all by itself would lead to an increase in mortality rate via the aging of the group.

Let’s check:

As expected, the 45-54-year olds are getting older. But what’s happening with 2001? Is that for real? Let’s just double-check by pulling off ages from another dataset:

Yup, it seems real. Just quickly, let’s consider 2001. 2001-55=1946, and the jumpiness of the lines at the start of the above graph is tracking corresponding jumps in the number of babies born each year during the 1940s.

OK, the next question is: How would the change in age distribution affect the death rate in the 45-54 category? In other words, what is the bias in the above raw mortality curve, due to age composition?

We can do a quick calculation by taking the death rate by single year of age in 1999, and use this along with each year’s age distribution to track the mortality rate in the 45-54 group, *if there were not change in underlying death rates by age*. Thus, all the changes in the graph below represent the statistical artifact of age composition:

Now let’s line up this curve with the changes in raw death rate:

About half the change can be attributed to aggregation bias.

We can sharpen this comparison by anchoring the expected-trend-in-death-rate-just-from-changing-age-composition graph at 2013, the end of the time series, instead of 1999. Here’s what we get:

And here it’s clear: since 2003, *all* the changes in raw death rate in this group can be explained by changes in age composition.

The much-heralded increase in death rates among middle-aged non-Hispanic white Americans happened entirely in the first part of the series.

In summary so far: this adjustment for changes in age composition is a *second-order* bias correction, less important then the first-order correction for raw population changes but large enough to qualitatively change the trend story.

**Third-order correction**

Now that we’ve identified the bias, we can correct by producing age-adjusted death rates: for each year in time, we take the death rates by year of age and average them, thus computing the death rate that would’ve been observed had the population distribution of 45-54-year-olds been completely flat each year.

The age-adjusted numbers show an increasing death rate until 2003-2005 and then a steady state since then:

But this is only one way to perform the age adjustment. Should we be concerned, with Anne Case, that “there are a very large number of ways someone can age-adjust this cohort” and that each method comes “with its own implicit assumptions, and that each answers a different question”?

The answer is no, we need not be so concerned with exactly how the age adjustment is done in this case. I’ll show this empirically and then discuss more generally.

First the empirics. I performed three age adjustments to these data: first assuming a uniform distribution of ages 45-54, as shown above; second using the distribution of ages in 1999, which is skewed toward the younger end of the 45-54 group; and third using the 2013 age distribution, which is skewed older.

Here’s what we found:

The results don’t differ much, with no change in the qualitative trends and not much change in the numbers either.

It’s important to do *some* age adjustment, but it doesn’t matter so much exactly *how* you do the age adjustment. In math jargon, age-adjustment corrects a second-order bias, while the choice of age adjustment represents a third-order correction.

That’s why, when I did my analysis a week or so, I performed a simple age adjustment. Based on my statistical experience and general mathematical understanding, I had a sense that the choice of age adjustment was a third-order decision that really wouldn’t have any practical implications. So I didn’t even bother to check. I did it here just for the purpose of teaching this general concept, and also in response to Case’s implication that the whole age-adjustment thing was too assumption-laden to trust. Case was making the *qualitative* point that any adjustment requires assumptions; I’m making a *quantitative* analysis of how much these assumptions make a difference.

**Lateral thinking**

So far I’ve been focusing entirely on the headline trends in mortality among 45-54-year-old non-Hispanic whites. But there’s nothing stopping us from grabbing the data separately for men and women:

These separate age-adjusted trends tell a new and interesting story. All the bias correction in the world won’t get you there; you have to pull in new data.

To put it another way: Age adjustment was a necessary first step. But now that we’ve dealt with that, we can move forward and really start learning from the data.

We can also look at other ages and other groups; see here for some graphs.

**Concerns about data quality**

When I first heard about Case and Deaton’s paper, I didn’t think about age adjustment at all; I was alerted to the age aggregation bias by an anonymous commenter. More recently this commenter has raised skepticism regarding the ethnic categories in the CDC data. I haven’t checked this out at all but it seems worth looking into. Changes in categorization could affect observed trends.

Turtles all the way down.

**Asking the question is the most important step**

As I wrote the other day, the point of bias correction and data inspection is not “gotcha!” Rather, the point of correcting biases and questioning the data is that the original researchers are studying something interesting and important, and we want to help them do better.

**And here’s the R script**

I put my R code and all my data files here. You should be able to run the script and create all the graphs I’ve blogged.

Warning: the code is ugly. Don’t model your code after my practices! If any of you want to make a statistics lesson out of this episode, I recommend you clean the code. Meanwhile, perhaps the very ugliness of the script can give you a feeling of possibility, that even a clunky programmer like me can perform an effective graphical data analysis.

I have the feeling that Hadley could’ve done all of this analysis in about an hour using something like 20 lines of code.

There’s lots more that can be done; I’ve only looked at a small part of the available data. The numbers are public; feel free to do your own analyses.

The post First, second, and third order bias corrections (also, my ugly R code for the mortality-rate graphs!) appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Pareto smoothed importance sampling and infinite variance (2nd ed) appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Last week Xi’an blogged about an arXiv paper by Chatterjee and Diaconis which considers the proper sample size in an importance sampling setting with infinite variance. I commented Xi’an’s posting and the end result was my guest blog posting in Xi’an’s og.

I made an additional figure below to summarise how Pareto shape parameter and Pareto smoothed importance sampling is related to this. You can read more in our arXived paper with Andrew.

When MCMC got popular, the reasons why IS got less popular were 1) the problem of finding good proposal distributions and 2) possibility of getting infinite variance for the weight distribution (which leads to infinite variance of the IS estimate). If the variance is finite, the central limit theorem holds and people have (mostly) assumed that everything is going to be fine, and if the variance is infinite people have assumed that all hope is lost.

Chen and Shao (2004) showed that the rate of convergence to normality is faster when more higher moments exist, so it’s useful to examine the existence of higher moments, too. Koopman, Shephard, and Creal (2009) proposed to make a sample based estimate of the existence of the moments using generalized Pareto distribution fitted to the tail of the weight distribution. The number of existing moments is less than 1/*k* (when *k*>0), where *k* is the shape parameter of generalized Pareto distribution. Koopman, Shephard, and Creal (2009) focused on making a hypothesis test whether the variance is finite.

The following figure shows why it is useful to look at the continuous *k* value, instead of discrete number of moments (and why Pareto smoothing is great). The proposal distribution was Exp(1) and the target distribution was Exp(θ) with varying value of θ. The figure shows results with basic IS (blue) and with Pareto smoothed IS (yellow). The vertical axis is the estimated mean divided by the true mean (ie. values close to 1 are good). The violin plots present the distributions of the results from 1000 repeated simulations (with different random number seeds) using each time 1000 draws from the proposal distribution. For each case the estimated Pareto shape value *k* is shown.

The figure shows that when the variance is finite the errors are smaller, but the Pareto shape value *k* gives additional information about the distribution of errors in both finite and infinite variance cases (since we are using a finite number of samples, we don’t in practice observe infinite variance, but it is still a useful concept to describe asymptotic properties). The figure also shows that Pareto smoothing can reduce the error in all cases and that we can get useful estimates also when *k*≥1/2, although it can’t make miracles when the proposal distribution is too narrow compared to the target distribution. PSIS estimate as described in our paper has allways finite variance, but when *k*≥1/2 it will have some bias which seems to be small for *k*<0.7.

Our PSIS-LOO paper with Andrew and Jonah shows one example of the benefit of PSIS and we’ll soon publish other examples where PSIS improves the computation.

The post Pareto smoothed importance sampling and infinite variance (2nd ed) appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “David Brooks And Jeremy Paxman To Judge The Golden Giraffes” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I don’t think I have much of a chance here, not because of the judging—I’d trust Brooks and Paxman to recognize good writing—but because the competition includes some heavy hitters, including Dan Davies with a meta-blog-post called The Verjus Manifesto, Sara Paretsky on The Detective As Speech, and Charles Pierce with . . . well, it almost doesn’t matter what, since anything by Charles Pierce is gonna be good. Hey—isn’t it cheating to include a professional writer on this list? It would be like Robin Williams coming on to the Gong Show to win a quick $516.32.

Other notable items on the list include Peter McCleery’s gimmicky-but-well-executed Thank You For Calling Mamet’s Appliance Centre and Evan Ratliff’s grabbily-titled-but-disappointing My Wife Found My Email In The Ashley Madison Database. (Spoiler: it was someone else who used his email address.)

Also a post by blog favorite Scott Adams. Not about Charlie Sheen, though.

Unfortunately it’s hard to quickly get a sense of all hundred articles cos you have to click through to 10 separate pages.

My own article on the list is a post from January on Psychology And Behavioural Economics, which begins stirringly:

I’ve been coming across these issues from several different directions lately, and I wanted to get the basic idea down without killing myself in the writing of it. So consider this a sketchy first draft.

Charles Pierce, it ain’t. Seriously, though, I appreciate being appreciated, and I appreciate that they’ll consider something that’s exploratory and bloggy, that’s not so polished. I like The Browser; they published two 5books interviews with me and one with my sister.

Top 10 vote getters get to be judged by David Brooks! You can vote here.

The post “David Brooks And Jeremy Paxman To Judge The Golden Giraffes” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Just Filling in the Bubbles appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Collin Hitt writes:

I study wrong answers, per your blog post today. My research focuses mostly on surveys of schoolchildren. I study the kids who appear to be just filling in the bubbles, who by accident actually reveal something of use for education researchers.

Here’s his most recent paper, “Just Filling in the Bubbles: Using Careless Answer Patterns on Surveys as a Proxy Measure of Noncognitive Skills,” which concludes:

Using two national longitudinal datasets, I [Hitt] show that careless answer patterns from adolescent respondents are negatively predictive of later educational attainment, independent of cognitive ability and other traditionally-measured noncognitive skills. I posit that careless answers, as I have quantified them, proxy as a behavioral measure of a negative noncognitive trait.

No graphs, though. Gotta work on the graphs. Need some graphs. graphs graphs graphs

The post Just Filling in the Bubbles appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Is 8+4 less than 3? 11% of respondents say Yes! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Obviously, some people respond randomly. For open ended questions, it is pretty easy to determine the fraction who do so. In some research I did with online surveys, “asdf” was the most common and “your mama” was 9th. This fraction is small (maybe 1-2%). But the fraction of random responses is harder to identify (and is likely higher) for items with binary and multichotomous response options, since many respondents must realize their random responding can go undetected. Hence, you can’t use the random response rate from open ended questions to assess this. You can do other things to try to estimate it (like ask them “Is 8+4 less than 3?” YES NO). But two problems remain: the fraction saying YES is a blend of random and perverse responding and both of these things vary across items. Dramatically.

I put up a few questions on Google Consumer Surveys with large samples. Random + perverse response rates differ dramatically:

Do you have a fraternal twin? YES NO

4% Yes. *Pretty close to truth*Do you have an identical twin? YES NO

8% Yes. *Pretty far from truth, but funnier to lie about?*Is 8+4 less than 3? YES NO

11% Yes. *Profound innumeracy, confusion, or just fucking with me?*Were you born on the planet Neptune? YES NO

17% Yes. *Perhaps using it metaphorically, as in “My friends say I’m a weird guy”?*In a recent published paper I [Frederick] averred that you could just multiply the number of people who endorse something crazy by the number of response options to estimate the fraction of random responders. But this is obviously wrong.

So, basically, I’m not sure what to do. You could look at response latencies or something, but then you end up imposing some arbitrary thresholds which are unsatisfying, much like removing outliers without any good justification that the responses are not sincere.

My reply: These responses are hilarious. I believe there is some literature on this sort of thing but I’m not the expert on it. I’ve looked a bit into list experiments (you can search my blog, I have a post with a title like, A list of reasons not to trust list experiments) but there seems to be a lot of information on the actual responses. Maybe you could learn something by regressing these on demographics, also see if the same people who give wrong answers for some of these, give wrong answers for others.

The post Is 8+4 less than 3? 11% of respondents say Yes! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Asking the question is the most important step appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>In statistics, the glamour often comes to those who perform a challenging data analysis that extracts signal from noise, as in Aki Vehtari’s decomposition of the famous birthday data which led to the stunning graphs on the cover of BDA3.

But, from a social-science point of view, the biggest credit has to go to whoever asked the question in the first place. For the birthday example, that credit goes to Becca Levy, Pil Chung, and Martin Slade who were the ones who noticed the pattern of excess births on Valentine’s day and fewer on Halloween. From a statistics perspective, the challenge began there. But nothing would’ve been done without the question being asked.

I’ve been thinking about this recently in the context of the recent discussion of trends in mortality rates among middle-aged whites:

I worked super-hard to make the graph above, along with lots of other displays like this grid that helped me understand what was going on in the data. And people such as this anonymous commenter have been looking carefully into issues of data quality.

All of this work is useful and relevant, I do believe. But, from the social science perspective, what’s far more important is asking the question in the first place, which is what Case and Deaton did in their recent article. The data have been out there for any of us to grab and graph and analyze. But we didn’t. Case and Deaton did, and that’s what got the ball rolling. (And, to be fair, they also rolled the ball most of the way.) I’m happy to have refined their analyses and, as noted yesterday, I wasn’t so thrilled by one of Case’s offhand remarks, but let me emphasize that all this discussion is predicated on their effort, on their knowing what to look at, which in turn derives from their justly well-respected research on public health and economic development.

That’s the big picture. Acknowledging a statistical bias correction is fine, and statisticians such as myself have our place in the research ecosystem, but all the bias correction and modeling and clever graphics in the world won’t help you if you don’t know what to look at. And in this particular example, I had no idea of looking at any of this until I was pointed to Case and Deaton’s work. Aggregation bias was an entry point for me into this problem, just as analyzing all 366 days was an entry point for Aki into the birthday problem. If we can correct and improve and expand existing analyses, that’s great, but in these cases none of our contributions could’ve happened without the work by the original authors.

It’s not Us vs. Them. It’s never Us vs. Them. It’s Us and Them. Or, perhaps more accurately, THEM followed by a little bit of us. And that’s one reason I want them to respect and understand us, not to fear us and be defensive. We want to be useful, which we can do by building upon their work and motivating them (not just the original researchers, but the whole field) to do more.

The point of a bias correction is not “gotcha!” Rather, the point of the bias correction is that *the original researchers are studying something interesting and important*, and we want to help them do better.

The post Asking the question is the most important step appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Why is it so hard for them to acknowledge a correction? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Anne Case (as quoted by Jesse Singal):

We spent a year working on this paper, sweating out every number, sweating out over what we were doing, and then to see people blogging about it in real time — that’s not the way science really gets done. . . . And so it’s a little hard for us to respond to all of the blog posts that are coming out. . . . And if this is all people shooting from the hip, I don’t think that’s any way to move science forward, to move the research forward.

Angus Deaton (as quoted by Kate Murphy):

There’s been a tremendous amount of recalculating of our numbers in the media and blogs. It’s all happening at the speed of light. You do worry whether it’s being done appropriately. But we’re not complaining.

*“We’re not complaining”*? That’s the best you can do???

How bout this:

We spent a year working on this paper, sweating out every number, sweating out over what we were doing, and we’re happy to see see people blogging about it in real time.

We very much appreciate the effort put in by Laudan Aron, Lisa Dubay, Elaine Waxman, and Steven Martin, Philip Cohen, and Andrew Gelman to uncover the aggregation bias in our analysis, to correct for that bias, and to explore subtleties that we did not have a chance to get into in our paper. As Gelman noted, these corrections are in no way a debunking of our work—our comparisons of non-Hispanic American whites to groups in other countries and other ethnic groups still stand.

We think it’s great that, after our paper was published in PNAS, it was possible to get rapid feedback. Had it not been for bloggers, we’d still be in the awkward situation of people trying to trying to explain an increase in death rates which isn’t actually happening. We join Paul Krugman and Ross Douthat in thanking these bloggers for their unpaid efforts on the behalf of everyone interested in this research. We count ourselves lucky to live in an era in which mistakes can be corrected rapidly, so that we and others do not have to wait months or even years for published corrections which themselves could contain further errors.

As economists, we recognize that research work is always provisional, and that anyone studying the real world of human interactions has to accept that mistakes are part of the process. It is only through the efforts of our entire research community—publishing in journals, publishing in blogs, through informal conversations, whatever—that we move toward the truth. We always considered our PNAS paper to be just a single step in this process and we are glad that others have taken the trouble to correct some of our biases and omissions.

Again, we thank the many researchers who have taken a careful look at our analyses. It’s good to know that our main findings are not affected by the corrections, we welcome further research in this area, and we hope that future discussion of our work, both in the scientific literature and in the popular press, make use of the corrected, age-adjusted trends.

– Sincerely, Anne Case and Angus Deaton

P.S. We have heard some people criticize the researchers noted above because they published their work in blogs rather than in peer-reviewed journals. We would never make such a silly, uninformed criticism. Since appearing in print, our work has received a huge amount of publicity. And, to the extent that we made mistakes or did not happen to explain ourselves clearly enough, it is the responsibility of others to publish their corrections and explanations as rapidly as possible. Blogs are a great way to do this. Blogs, unlike newspaper interviews, allow unlimited space to develop arguments and to present graphs of data. And we are of course aware that peer-reviewed journals make mistakes too. We published our paper in the Proceedings of the National Academy of Sciences, a journal that last year published a notorious paper on himmicanes and hurricanes, another discredited paper claiming certain behavior by people whose ages end in 9, and another paper on demographics which neglected to apply a basic age adjustment. So, yes, publication in journals is fine, but we very much welcome researchers who are willing to stick out their necks and correct the record in real time on blogs.

See, that wasn’t that hard!

Case and Deaton did great work. No need for them to get so defensive and attack-the-messenger-y about it. Indeed, I had an email exchange with Angus Deaton a few days ago and he was completely polite and reasonable, directly pointing me to the data set that they had used in their paper. In my opinion, Case and Deaton have acted as exemplary scientists in this entire episode, with only the very minor exception of a difficulty in graciously handling public corrections of their work. Hence my disappointment. Really only a small thing in the grand scheme of things, but it still bugs me, hence this post.

And, contra Case, this sort of give-and-take, whether it appears in blogs or wherever, is *exactly* the way science really gets done. Read your Lakatos.

**P.P.S.** Regarding the line above, “trying to trying to explain an increase in death rates which isn’t actually happening”: As I’ve emphasized throughout, the comparison to other countries remains interesting, as the age-aggregation bias is small compared to the observed declines of 20% in mortality rates in other countries and among other groups. But in the U.S. press there’s been a lot of explanation of why things are getting worse here, not just in a comparative but in an absolute scale. And, as the age-adjusted data show, the mortality rate in this group has been flat since 2005—increasing for women, decreasing for men, with an average change of just about zero. If people want to explain *that*, fine, but Case and Deaton have all of us to thank for the fact that people can now be focused on explaining this pattern, not an artifactual pattern of a steady increase in death rates which is what everyone was talking about before.

Or, as discussed in the comments, Case and Deaton don’t have to thank me, or Laudan Aron, Lisa Dubay, Elaine Waxman, and Steven Martin, or Philip Cohen, at all. But could they please refrain from slamming us? I’d like for Case and Deaton to show our work the same respect and consideration that we are showing theirs.

The post Why is it so hard for them to acknowledge a correction? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “Should Prison Sentences Be Based On Crimes That Haven’t Been Committed Yet?” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Criminal sentencing has long been based on the present crime and, sometimes, the defendant’s past criminal record. In Pennsylvania, judges could soon consider a new dimension: the future.

Is that really new? I thought that the question of whether someone will re-offend was already considered in sentencing. For example, this bit from Errol Morris: “You can’t predict future violence, you can’t predict human behavior except in one instance: What those two doctors, Dr. Grigson and Dr. [John] Holbrook, would say in the penalty phase of a capital murder trial. They’ll say the defendant will kill again. And as a result, the jury imposes the death penalty.” So I’m not quite sure why Barry-Jester et al. say this is new. Maybe it’s just new for Pennsylvania? (Morris was talking about Texas.)

Anyway, here’s what they write:

As early as next year, judges [in Pennsylvania] could receive statistically derived tools known as risk assessments to help them decide how much prison time — if any — to assign. . . . The tools try to predict recidivism — repeat offending or breaking the rules of probation or parole — using statistical probabilities based on factors such as age, employment history and prior criminal record. They are now used at some stage of the criminal justice process in nearly every state. . . . But Pennsylvania is about to take a step most states have until now resisted for adult defendants: using risk assessment in sentencing itself. A state commission is putting the finishing touches on a plan that, if implemented as expected, could allow some offenders considered low risk to get shorter prison sentences than they would otherwise or avoid incarceration entirely. Those deemed high risk could spend more time behind bars.

I think the whole “behind bars” thing is overdone, but let’s set that aside. Conditional on our current policy (misguided, in my opinion) of prison time as the standard punishment, I don’t see anything wrong with sentences that are based on forecasts.

Barry-Jester et al. quote an opponent of the policy:

Sonja Starr, a University of Michigan law professor who has been a leading opponent of risk assessment, says it isn’t fair. “These instruments aren’t about getting judges to individually analyze life circumstances of a defendant and their particular risk,” she said. “It’s entirely based on statistical generalizations.”

But ultimately the decisions are based on individual information that goes into the forecast, thus “life circumstances,” no? To me, the title, “Should Prison Sentences Be Based On Crimes That Haven’t Been Committed Yet?”, is misleading, as the prison sentences would be based on past information, it’s just information that is statistically correlated with recidivism.

**P.S.** As an aside, I’m baffled by the political angle of the post. Barry-Jester et al. write:

The approach has bipartisan appeal: Among some conservatives, risk assessment appeals to the desire to spend tax dollars on locking up only those criminals who are truly dangerous to society. And some liberals hope a data-driven justice system will be less punitive overall and correct for the personal, often subconscious biases of police, judges and probation officers.

This baffles me. Liberals too want to spend less tax dollars on prisons, and liberals also want to emphasize locking up criminals who are truly dangerous to society. And conservatives also are concerned about biases of police, judges and probation officers. Bipartisan is fine, but I don’t understand this take on it.

The post “Should Prison Sentences Be Based On Crimes That Haven’t Been Committed Yet?” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Inference from an intervention with many outcomes, not using “statistical significance” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I have been reading your papers “Type S error rates for classical…” and “Why We (Usually) Don’t Have to Worry…” with great interest and would be grateful for your views on the appropriateness of a potentially related application.

I have a non-hierarchical dataset of 28 individuals who participated in a randomized control trial (half treated, half control) and I am testing for treatment effects on 11 different outcome variables. I would like to adjust for multiple inference while maximizing power on my few observations. I find the Type S error rate a very intuitive and appropriate concept for the policy question at hand of whether the data suggest that the 14 treated individuals generally performed better than the controls.

In the multiple outcomes section of “Don’t Worry” you mention that disparate domains is a bit of a stretch. In my data, the 11 outcomes divide into 4 domains, however I think it could be reasonable to view all of them as reflecting a general underlying level of effort. My main question is whether you think it natural to adapt the multi-level model approach to multiple outcomes measured on non-hierarchical data, and if so, if you had any guidance about how to implement this or additional references you could point me towards.

My reply: I don’t have great references on this: in short, I talk the talk but I haven’t really walked the walk! But here are a few thoughts:

1. I think of Type S and Type M errors as a byproduct of statistical procedures (in addition to the two papers linked above, I also recommend my recent paper with Carlin), not as a goal in themselves. That is, I would not set a procedure to aim for a particular Type S or Type M error rate; rather, I would fit a model and perform Bayesian inference, then I could do an analysis to estimate the type S and type M error rates for various posterior summaries, under various assumptions.

2. Yes, I do think it makes sense to fit a multilevel model to the outcomes, using indicators for domains.

3. More and more, I’m thinking the key is to use informative priors.

In summary, though, yes, I do think a Bayesian multilevel way is the way to go. Otherwise I think you’re doing all sorts of tricks and dodges to avoid overfitting.

Short term, here’s some quick advice:

a. Try looking at an aggregate, summing up all the 11 outcomes.

b. Don’t demand or even expect to get statistical significance. That’s a dead end. In a noisy setting, statistical significance doesn’t tell you anything at all. Conversely, *not* finding statistical significance should not be a discouragement. If you set statistical significance as your goal, you’re setting yourself up for all sorts of trouble. So I recommend making clear from the start that this isn’t your goal. I think this is better than the tempting strategy of hoping for statistical significance and then thinking you’ll solve that problem down the road.

The post Inference from an intervention with many outcomes, not using “statistical significance” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Hanging Chad appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Wattenberg writes:

Some more highlights from Willson’s roster of custom-named clients:

Rad Fulton

Cal Bolder

Rand Saxon

Race Gentry

Chance Nesbitt

Dack Rambo

Van Williams

Dare Harris

Trax ColtonToday, the Willson names sound like clichés. They’re the kind of formula-hunk names skewered by The Simpsons’ “actor Troy McClure” and mimicked by countless gay porn stars. But it was Willson who defined that formula, and parents responded to its allure.

It’s a gay world. We just live in it.

The post Hanging Chad appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Bayesian Computing: Adventures on the Efficient Frontier appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Bayesian Computing: Adventures on the Efficient Frontier appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Pass the popcorn appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Rodney Sparapini writes:

I got this in my inbox today. I thought this might be of interest to you and your blog readers.

It’s not at all of interest to me but it might interest some of my readers. I’m posting it here because there’s something amazing about seeing this intense dispute about something I’ve never heard of.

OK, here it is:

From: ELM Exposed ELM.exposed@gmail.com

Subject: The ELM Scandal: What You May Not Know about the Extreme

Learning Machines

Date: Tue, 4 Aug 2015 04:09:35 -0700Dear Researcher,

The objective of launching this homepage

(http://elmorigin.wix.com/originofelm) is to present the evidences

regarding the tainted origins of the extreme learning machines (ELM). As

we would like all readers to verify the facts within a short period of

time (perhaps 10 to 20 minutes), we have uploaded a dozen of PDF files

with highlights and annotations clearly showing the following:1. The kernel (or constrained-optimization-based) version of ELM

(ELM-Kernel, Huang 2012) is identical to kernel ridge regression (for

regression and single-output classification, Saunders ICML 1998, as well

as the LS-SVM with zero bias; for multiclass multi-output

classification, An CVPR 2007).2. ELM-SLFN (the single-layer feedforward network version of the ELM,

Huang IJCNN 2004) is identical to the randomized neural network (RNN,

with omission of bias, Schmidt 1992) and another simultaneous work,

i.e., the random vector functional link (RVFL, with omission of direct

input-output links, Pao 1994).3. ELM-RBF (Huang ICARCV 2004) is identical to the randomized RBF neural

network (Broomhead-Lowe 1988, with a performance-degrading

randomization of RBF radii or impact factors).4. In all three cases above, Huang got his papers published after

excluding a large volume of very closely related literature.5. Hence, all 3 “ELM variants” have absolutely no technical originality,

promote unethical research practices among researchers, and steal

citations from original inventors. For easy verifications on the origins

of the ELM, with annotated PDF files, please visit:http://elmorigin.wix.com/originofelm

Please forward this message to your contacts so that others can also

study the materials presented at this website and take appropriate

actions, if necessary.ELM: The Sociological Phenomenon

Since the invention of the name “extreme learning machines (ELM)” in

2004, the number of papers and citations on the ELM has been increasing

exponentially. How can this be imaginable for the ELM comprising of 3

decade-old algorithms published by authors other than the ELM inventor?

This phenomenon would not have been possible without the support and

participation of researchers on the fringes of machine learning. Some

(unknowingly and a few knowingly) love the ELM for various reasons:• Some authors love the ELM, because it is always easy to

publish ELM papers in an ELM conference or an ELM special issue. For

example, one can simply take a decade-old paper on a variant of RVFL,

RBF or kernel ridge regression and re-publish it as a variant of the

ELM, after paying a small price of adding 10s of citations on Huang’s

“classic ELM papers”.• A couple of editor-in-chiefs (EiCs) love the ELM and offer

multiple special issues/invited papers, because the ELM conference &

special issues will bring a flood of papers, many citations and

therefore high impact factors to their low quality journals. The EiCs

can claim to have faithfully worked within the peer-review system, i.e.

the ELM submissions are all rigorously reviewed by ELM experts.• A few technical leaders, e.g. some IEEE society officers,

love the ELM, because it rejuvenates the community by bringing in more

activities and subscriptions.• A couple of funding agencies love the ELM, because they

would rather fund a new sexy name, than any genuine research.One may ask: how can something loved by so many be wrong?

A leading cause of the current Greek economic crisis was that a previous

government showered its constituents with jobs and lucrative

compensations, in order to gain their votes, thereby raising the debt to

an unsustainable level. At that time, the government behavior was

welcome by many, but led to severe consequences. Another example of

popularity leading to a massive disaster can be found in WW II as Hitler

was elected by popular votes.The seemingly small price to pay in the case of the ELM is the

diminished publishing ethics, which, in a long run, will fill the

research literature with renamed junk, thereby making the research

community and respected names, such as IEEE, Thomson Reuters, Springer

and Elsevier, laughing stocks. Similar to that previous Greek government

and its supporting constituents, the ELM inventor and his supporters are

“borrowing” from the future of the entire research community for their

present enjoyment! It is time to wake up to your consciousness.Our beloved peer-review system was grossly abused and failed

spectacularly in the case of the ELM. It is time for the machine

learning experts and leaders to investigate the allegations presented

hereand to take corrective actions soon.5 Easy but Proven Steps to Fame

1. The Brink of Genius: Take a paper published about 20 years ago (so

that the original authors have either passed away, retired, or are too

well-established/generous to publicly object. Unfortunately, pioneers

like Broomhead and Pao have passed away). Introduce a very minor

variation, for example, by fixing one of the tunable parameters at zero

(who cares if this makes the old method worse, as long as you can claim

it is now different and faster). Rewrite the paper in such a way that

plagiarism software cannot detect the similarity, so that you are not in

any of the “IEEE 5 levels of plagiarism”. Give a completely new

sensational name (hint: the word “extreme” sounds extremely sexy).2. Publication: Submit your paper(s) to a poor quality conference or

journal without citing any related previous works.3. Salesmanship: After publishing such a paper, now it is time to sell

the stolen goods!Never blush. Don’t worry about ethics. Get your

friends/colleagues to use your “big thing”. Put up your Matlab program

for download. Organize journal special issues, conferences, etc. to

promote these unethical research practices among junior researchers who

would just trust your unethical publications without bothering to read

the original works published in the 1980s or 1990s. Of course, the

pre-requisite for a paper to be accepted in your special

issues/conferences is 10s of citations for your unethically created name

and publications. Invite big names to be associated with your

unethically created name as advisory board members, keynote speakers, or

co-authors. These people may be too busy to check the details (with a

default assumption that your research is ethical) and/or too nice to say

no. But, once “infected” with your unethically created name, they will

be obliged to defend it for you.4. The Smoke Screen: Should others point out the original work, you

claim not to know the literature while pointing to a minor variation

that you introduced in the first place. Instead of accepting that your

work was almost the same as the literature and reverting back to the

older works, you promote your work by: (1) repeating the tiny variation;

(2) excluding the almost identical works in the list of references or

citing and describing them incorrectly; (3) excluding thorough

experimental comparisons with nearly identical works in the literature

so that worse performance of your minute variations will not be exposed;

(4) making negative statements about competing methods and positive

statements about your unethically created name without solid

experimental results using words like “may” or “analysis”; (5) comparing

with apparently different methods. You can copy the theories and proofs

derived for other methods and apply to your method (with tiny variation

from those in the old literature) claim that your method has got a lot

of theories while others do not have.5. Fame: Declare yourself as a research leader so that junior

researchers can follow your footsteps. Enjoy your new fortune, i.e.,

high citations, invited speeches, etc. You don’t need to be on the

shoulders of giants, because you are a giant! All you have to do to get

there is to follow these easy steps!One can call the above steps “IP” (Intelligent Plagiarism), as opposed

to stupid (verbatim) plagiarism specified by the IEEE in “5 levels”. The

machine learning community should feel embarrassed if “IP” (Intelligent

Plagiarism) was originally developed and/or grandiosely promoted by this

community, while the community is supposed to create other (more

ethical) intelligent algorithms to benefit the mankind.In mid-July 2015, G.-B. Huang posted an email on his

ELM@mlist.ntu.edu.sg emailing list. This email was forwarded to

ELM.exposed@gmail.com for our responses. As usual, this email was

meaningless and our remarks are attached.

And also this pdf, which you can read if you’re not tired of this yet.

I just have a few comments about the above message:

1. Hitler never received much more than a third of the vote in a fair election.

2. I thought Elsevier was already a laughing stock?

3. I’d hardly call this a path to fame, given that I’d never heard of this Huang character.

4. There’s nothing wrong with putting up a Matlab program for download, right?

5. I’m kinda doubting that invited speeches will lead to fortune. Free flights, sure, but probably not much more than that.

The post Pass the popcorn appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Taleb’s Precautionary Principle: Should we be scared of GMOs? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I was wondering if you could (or had already) weigh(ed) in on Nassim Taleb’s Precautionary Principle as it applies to GMOs?

I’ve attached his working paper with Rupert Read, Raphael Douady, Joseph Norman and,Yaneer Bar-Yam. It can also be found at his site, fooledbyrandomness.com.

See also his response to a critique from a biologist.

A search for ‘Taleb’ on your site brought up reviews of his books, but I found no mention of the Precautionary Principle.

My reply: I don’t agree with everything Taleb writes but I’m generally sympathetic to his perspective.

I liked this bit from Taleb’s response linked to above:

Many of the citations you are asking for fall within the “carpenter fallacy” that we present in the text, i.e. that discussions about carpentry are not relevant to and distract from identifying the risks associated with gambling, even though the construction of a roulette wheel involves carpentry.

This is not to say that Trevor Charles is wrong here and that Nassim Taleb is right—I feel unmoored in this whole discussion—but I do like the quote.

Speaking more generally, I suppose that Taleb’s precautionary principle could fruitfully be expressed in terms of tradeoffs. Here’s the principle:

If an action or policy has a suspected risk of causing severe harm to the public domain (affecting general health or the environment globally), the action should not be taken in the absence of scientific near-certainty about its safety. Under these conditions, the burden of proof about absence of harm falls on those proposing an action, not those opposing it.

As a statistician, I tend to be skeptical about arguments based on “the burden of proof” or “scientific near-certainty,” as they have a bit of the flavor of the one-sided bet—but what is relevant here is the idea of correlated risks.

As many observers have noted, the U.S. is in many ways a hyper-individualistic society, and social policies are often evaluated in an individualistic way. But there’s a big difference between risks that are uncorrelated or only weakly correlated in the population (for example, getting killed in a car crash) and highly correlated risks (with the paradigmatic examples being asteroid impacts and global wars).

As Taleb has written, his own attitudes on extreme events derive in part from his understanding of what happened to Lebanon in the 1970s, when a longstanding apparent equilibrium was revealed as being unstable, and which gave him a general wariness about picking pennies in front of a steamroller.

This is not really an answer to what policy should be on genetically modified organisms, but I do think that it makes sense, for the reasons Taleb and his collaborator say, to consider these global risks associated with GMOs in a different way than we treat the individual-level risks associated with electric power lines and cancer, or whatever.

The post Taleb’s Precautionary Principle: Should we be scared of GMOs? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Death rates have been increasing for middle-aged white women, decreasing for men appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Hoo boy. Looky here, something interesting: From 1999 to 2013, the death rate for middle-aged white women steadily increased. The death rate for middle-aged white men increased through 2005, then decreased.

Since 2005, the death rate has been *rising* for middle-aged white women and *declining* for middle-aged white men. Not by a lot—we’re talking a change of 4% over a decade—but this is what we see.

It’s funny. We’re so used to the narrative that things are getting worse for men, it’s so hard to be a guy in the modern era, etc. But in his particular case it’s the middle-aged women who are doing worse (relatively speaking; of course the absolute death rates remain much higher for men than for women, that’s just how things always are).

**Background: Why age adjustment is needed**

As Anne Case and Angus Deaton noted in a much-talked-about recent paper, the mortality rate among middle-aged white Americans has been roughly constant in recent decades, even while it’s dropped dramatically among other other groups and other countries.

Here’s the graph of the raw data of mortality among 45-54-year-old non-Hispanic whites in the U.S.:

But that curve, which shows a steady increase since 1999, is wrong—or, should I say, misleading. As we discussed recently in this space (see here, here, and here), it can be tricky to interpret raw death rates binned across ages, especially in the U.S. What with the baby boom generation moving through, the average age in the 45-54 group crept up from 49.3 in 1999 to 49.7 in 2013.

An increase of 0.4 years might not sound like much, but mortality rate increases a lot by age—more than doubling between the ages of 45 and 54—so even a small shift in average age can cause a big shift in the observed trends.

Here’s what we get after adjusting for age:

The flat pattern after 2005 is the sum of the increasing trend for women and the down slope for men.

**What’s the point?**

The published curves were biased because they did not correct for the changing age distribution within the 45-54 bin. When we make the adjustment we find something different: no longer a steady increase. And when we look at men and women separately, we find something more.

This update has not yet percolated through the news media.

For example, here’s Paul Krugman in the New York Times:

There has been a lot of comment, and rightly so, over a new paper by the economists Angus Deaton (who just won a Nobel) and Anne Case, showing that mortality among middle-aged white Americans has been rising since 1999.

Ross Douthat in that same newspaper yesterday:

Starting around the turn of the millennium, the United States experienced the most alarming change in mortality rates since the AIDS epidemic. . . . concentrated among less-educated, late-middle-aged whites.

Julia Belluz writes in vox.com about “the shocking rise in mortality rates among middle-aged white Americans.”

And Angus Deaton quoted in the Times the other day:

If we want to be more precise about the age range involved, we could say that for all single years of age from 47 to 52, mortality rates are increasing.

All these reports should be corrected to make it clear that **the increase stopped in 2005. Since 2005, mortality rates have increased among women in this group but not men**.

The age-aggregation bias did come up in this online NYT article, but the focus there was on the comparison between 1999 and 2013, so it did not come up that the net increase stopped after 2005, and that men and women’s mortality rates have been going in opposite directions since then.

**Where does age adjustment make a difference?**

First, I followed Deaton’s advice and downloaded death data from the CDC Wonder site. Second, I looked not just at the range 45-54 but also at the age decades before and after. Third, I looked at non-Hispanic whites, also at Hispanic whites, also at African Americans.

Then I computed the raw and age-adjusted death rates for each decade of age for each group, to get a sense of where age adjustment matters.

I plotted death rates since 1999, and here’s what I found:

It turns out that the only place where a lack of age adjustment really changes the story is . . . non-Hispanic whites aged 45-54. Too bad about that! But good that we checked.

Of course I may well have some “gremlins” in my analyses too. Anyone who wants can and should feel free to go to the data and find out what I garbled or missed.

**Bring on the data**

Finally, I broke down the numbers by sex and single year of age. Here’s what happened from 1999-2015 among all three ethnic groups:

And here’s a summary:

That pattern among 45-54-year-olds? It was happening in the younger decade too.

**One more time**

Let me emphasize that this is all in no way a “debunking” of the Case and Deaton paper. Their main result is the comparison to other countries, and that holds up just fine. The place where everyone is confused is about the trends among middle-aged non-Hispanic white Americans.

The story being told is that there was something special going on, with an increase in mortality in the 45-54 age group. Actually what we see is an increasing mortality among *women* aged 52 and younger—nothing special about the 45-54 group, and nothing much consistently going on among men. Perhaps someone can inform Douthat and Krugman and they can modify their explanations accordingly. I’m sure they’ll be up to the task.

The post Death rates have been increasing for middle-aged white women, decreasing for men appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “Using prediction markets to estimate the reproducibility of scientific research” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Concerns about a lack of reproducibility of statistically significant results have recently been raised in many fields, and it has been argued that this lack comes at substantial economic costs. We here report the results from prediction markets set up to quantify the reproducibility of 44 studies published in prominent psychology journals and replicated in the Reproducibility Project: Psychology. The prediction markets predict the outcomes of the replications well and outperform a survey of market participants’ individual forecasts. This shows that prediction markets are a promising tool for assessing the reproducibility of published scientific results. The prediction markets also allow us to estimate probabilities for the hypotheses being true at different testing stages, which provides valuable information regarding the temporal dynamics of scientific discovery. We find that the hypotheses being tested in psychology typically have low prior probabilities of being true (median, 9%) and that a “statistically significant” finding needs to be confirmed in a well-powered replication to have a high probability of being true.

I replied: I think the idea is interesting and I have a lot of respect for the research team. But I am not so happy with the framing of these hypotheses as being “true” or “false,” and I think that statements such as “the probability of being true” generally have no real meaning. Consider, for example, one of those notorious social priming studies such as the claim that giving elderly-related words causes people to walk more slowly. Or one of those silly so-called evolutionary psychology studies such as the claim that single women were more likely to support Obama for president during certain times of the month. Yes these claims are silly and were overhyped, but are they “false”? I think it’s pretty meaningless to even ask the question. Certainly the effects in question won’t be exactly zero; more to the point, the effects will vary by person and by scenario. It makes sense to talk about average effects and variation in effects and the probability of a successful replication (if the criteria for “success” are defined clearly and ahead of time), but “the probability the hypothesis is true”? I don’t think so.

In summary I am supportive of this project. I think it’s a good idea and I’m interested in seeing it go further. I think they could do better by moving away from a true/false or even a replicate/not-replicate attitude, and instead think more continuously about uncertainty and variation. I don’t think it would be hard for them to move away from formulations such as “the probability that the research hypothesis is true” into a more sensible framing.

**P.S.** Robin Hanson offers thoughtful comments. I’m impressed by what Hanson has to say, partly because they are interesting remarks (no surprise given that he’s been thinking hard about this topic for many years), but more because it would be so easy for him to just view this latest project as a vindication of his ideas. But instead of just celebrating his success (as I think I’d do in this situation), he looks at all this with a critical eye. I might disagree with Robin about John Poindexter, but he (Robin) does good here.

The post “Using prediction markets to estimate the reproducibility of scientific research” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Pathological liars I have known appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>There was this guy in college who just made stuff up. It was weird, then funny, then sad. He was clearly an intelligent guy and but for some reason felt the need to fabricate. One thing I remember was something about being a student of Carl Sagan at Cornell—at the same time as he was taking 11 classes a semester at MIT. But there were lots of other lies, things that were easily checkable. I never knew his background—he seemed like a nice guy but perhaps he was never really a student, or maybe he was in the U.S. illegally, I have no idea, but maybe he was already living a lie and so he felt he might as well keep going.

The other guy was a student at an institution where I taught. It turned out he was lying about all sorts of stuff and he got kicked out of the program. The whole thing baffled me, especially when, after it was all over, one of the other grad students told us that they all knew this guy was a pathological liar. Why didn’t they tell the faculty? I have no idea.

Whassup with these pathological liars? I dunno, but maybe it’s some sort of principle of least effort. For me, lying is effortful and work is easy, so I’d rather work. For these guys, I’m guessing that it’s soooo easy to lie, but buckling down and working is tough. And, I guess, once they get in the habit of lying, they just do more and more of it. I’d think that, in order to avoid detection, they’d want to minimize the number of lies they tell. But I guess that’s not how they think.

I’d distinguish pathological liars of this sort from people such as Marc Hauser or Dr. Anil Potti or Ed Wegman or Diederik Stapel or Michael Lacour, whose misrepresentations seem pretty clearly instrumental. What’s characteristic about pathological liars is that they lie about things where they’re not really gaining from the lie, or where whatever gains they might obtain from the lie are trivial compared to the losses from being found out. I’d also distinguish them from people like Hillary Clinton, who has a habit of tweaking her stories to make them a bit more dramatic. Behavior that’s acceptable for David Sedaris but which I don’t like so much in a politician. Unfortunately, I can see the instrumental value in Clinton’s exaggerations, especially given the motivation a politician has to say what she thinks her audience wants to hear. Pathological lying seems different—it’s florid exaggeration just for the hell of it.

We’ll be discussing this in next week’s Perceptions 301 class.

**P.S.** Just to continue this, I find instrumental liars *disturbing* but I find pathological liars *scary*. A few months ago I had some indirect dealings with someone who was on the border of these two categories, a Nixonian type who was lying in a somewhat arbitrary and unnecessary way but using these lies in an aggressive way. Someone who would just make stuff up about me and then use this as a basis for an attack: what would this guy be capable of? I do not want to engage with someone like that. People like David Brooks or even Ed Wegman I can understand: they make mistakes (or, in Wegman’s case, ethically questionable decisions) and then don’t want to back down. And I can understand people like Mark Hauser or Ron Unz who think they have a true model of the world and so don’t want to be bothered with details. I don’t follow this approach but I kinda see where they’re coming from. Or people like all those Psychological Science researchers who in, I assume, all sincerity, are using statistical methods that are the functional equivalent of the proverbial Tarot cards: sure, I’m bothered that they don’t do better but I understand that, by their lights, they’re working hard and following the rules. But the pathological liars, people like Ben Carson who will go to the trouble to make up an entire course at Yale just for the benefit of an already-implausible story, or this other guy I dealt with online, who scared me so much that I don’t even want to mention his name here: that scares me. A lot. It probably shouldn’t, and I’m probably displaying a disgracefully old-fashioned attitude toward mental illness. Given the casually negative attitudes many people have toward Tourette’s syndrome, I’m really the last person who should go around being creeped out by something as innocuous as pathological lying, maybe. So there you have it. Now I’m just tied up in knots.

**P.P.S.** Commenters have rightly pointed out that I may be overreacting to whatever the news media happen to want to focus on. Mark Palko reminded us of the history of the news media pouncing on the Clintons, and BrianB pointed to a report that Ben Carson’s story was based on an actual experience. In particular, if Carson did not “make up an entire course at Yale just for the benefit of an already-implausible story,” but instead he took a life experience and twisted it a bit, making it more dramatic, it’s not so different from what Hillary is so notorious for doing. In the language of my above post, Carson was acting instrumentally, not pathologically. He was writing a book so he wanted good stories so he exaggerated or made some things up to make the story better, which from a storytelling position makes sense. Just as it makes sense for Hillary Clinton to have expressed the risks she felt in traveling to war zones by saying that her plane was under fire. Or for that matter Joe Biden stealing somebody else’s biographical story because it worked well in a speech.

One reason that I may have been characterizing Carson’s stories as pathological rather than instrumental is that I was forgetting that, when he was writing a book, his goal was to sell books, he wasn’t running for president. And embellishing or even making up stories for an autobiography, that’s pretty standard practice: the goal is to give insight into the person, not to produce a documentary record. Carson’s later lies when running for president (see, for example, here) also fall in the instrumental category in that he’s denying something that looks bad.

The post Pathological liars I have known appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post You won’t believe these stunning transformations: How to parameterize hyperpriors in hierarchical models? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Isaac Armstrong writes:

I was working through your textbook “Data Analysis Using Regression and Multilevel/Hierarchical Models” but wanted to learn more and started working through your “Bayesian Data Analysis” text. I’ve got a few questions about your rat tumor example that I’d like to ask.

I’ve been trying to understand one of the hierarchical models revolving around rat tumors (Chapter 5). This is where there is a binomial model with p assigned a beta distribution. The Beta distribution has parameters $\alpha$ and $\beta$ which need a distribution for the full hierarchical model.

In order to create a noninformative distribution the book parametrizes the model in terms of $\frac{\alpha}{\alpha+\beta}$ and an approximation of the standard deviation $(\alpha+\beta)^{-1/2}$. (described here too http://andrewgelman.com/2009/10/21/some_practical/) I know you mentioned not favoring this approach anymore, but I’d still like to understand the modeling thinking/process that supports this if possible.

I have a few questions about this:

– Why use an approximation here for the parametrization rather than the actual standard deviation of the Beta distribution? /When to use approximations for reparametrization? Computational reasons?

– How did you arrive at this particular approximation?

– What connection, if any, does this have to a Pareto distribution? I tried parametrizing this model with a Pareto(1.5,1) distribution for $\alpha+\beta$ and a uniform distribution on $\alpha/(\alpha+\beta)$ and ended up with $p(\alpha,\beta)\propto (\alpha+\beta)^{-3/2}$ but the book’s approach seems to yield $p(\alpha,\beta)\propto (\alpha+\beta)^{-5/2}$ which disagrees with the gentleman writing into the blog in the link above.

My reply: As I’ve said, I’ve changed my views since writing that book in the early 1990s, but not all my newer perspective has been worked into the later editions of the book. In particular, I’m not so happy with noninformative priors, for two reasons:

1. We often have prior information, so let’s use it. Traditionally we pragmatic Bayesians have been hung up on the difficulty of precisely specifying our prior information—but I it seems clear to me now that specifying weak prior information is better than specifying nothing at all.

2. With flat priors and a small number of groups, we can get a broad posterior distribution for the group-level variation, which in turn can lead to under-smoothing of estimates. In some contexts this is ok (for example, when the unpooled, separate estimates are taken as a starting point or default), but in other settings it’s asking for trouble, and the use of flat priors is basically a way to gratuitously add noise to the inference.

Anyway, back to the example. It seemed to make sense to put a prior on the center of the beta distribution and the amount of information in the beta distribution. These can be specified using mean and variance, but in this case the “effective sample size” seemed reasonable too. To put it another way: you ask, Why not parameterize in terms of the mean and variance? But in general that won’t work either, for example what would you do if you had a Cauchy prior, which has no mean and no variance?

A rule such as “parameterize using the mean and variance” is nothing but a guideline. So, when introducing this example into the book, I didn’t want to try to overly formalize this point. In retrospect, I actually think this was pretty mature of me! But maybe I should’ve explained a bit more. There’s a tradeoff here too: Not enough explanation and things are mysterious; too much explanation and the practical material gets lost in the verbiage (a point of which readers of this blog are well aware, I’m afraid).

The post You won’t believe these stunning transformations: How to parameterize hyperpriors in hierarchical models? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post 3 new priors you can’t do without, for coefficients and variance parameters in multilevel regression appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I am interested in finding out a good prior for the regression coefficients and variance components in a multi-level setting. For concreteness, let’s say we have a model like the following:

Level 1: Y_ijk | theta_ij ~(ind) N( theta_ij, sigma^2)

Level 2: theta_ij| mu_i ~(ind) N( mu_i, tau^2)

Level 3: mu_i|beta, ~(ind) N( x_i’beta, delta^2)One possibility is to assume independent uniform proper priors with large length for beta and the standard deviations sigma, tau and delta. Since all the distributions are proper, the posterior will be proper. I need to check if anyone talked about propriety of posterior if we make these proper uniform to improper uniform (e.g., delta is uniform in (0,\infty)) in order to avoid insensitivity of the results (in the lines of your section 2.2).

Another possibility is to use improper uniform for beta (usually it does not cause problem with impropriety of posterior – but need to check) and independent half Cauchy on the standard deviation (need an appropriate scale for half Cauchy).

My reply:

If you have enough groups you can get away with just about anything, but in the real world you won’t have enough groups so I think it’s best for you to use an informative prior. I recommend weakly or strongly informative priors on sigma, tau, and delta. Maybe I’d call these sigma_y, sigma_theta, and sigma_mu, actually. For sigma_y you’ll have a lot of data so maybe you don’t even need to bother, but you can do it just for completeness. For sigma_theta and, especially, sigma_mu, you’ll want something real.

Here’s a start: make sure your data y and predictors x are all dimensionless or on a unit scale so that differences of 1 are large. Then put half-normal(0,1) priors on the 3 scale parameters (sigma_y, sigma_theta, sigma_mu) and independent normal(0,1) priors on the betas.

Aki would probably recommend t_7 and he’s probably right, but recently I’ve been just using normals. Of course these priors might not make sense for your problem. They’re the default: use them as an anchor or starting-off point.

The post 3 new priors you can’t do without, for coefficients and variance parameters in multilevel regression appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post This is a workshop you can’t miss: DataMeetsViz appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>This looks like it was a great conference with an all-star lineup of speakers. You can click through and see the talks.

The post This is a workshop you can’t miss: DataMeetsViz appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post What happened to mortality among 45-54-year-old white non-Hispanics? It declined from 1989 to 1999, increased from 1999 to 2005, and held steady after that. appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The raw death rates for the group (which appeared in the Case-Deaton paper) are in red, and the age-adjusted death rates (weighting each year of age equally) are in black.

So . . . the age-adjusted mortality in this group increased by 5% from 1999 to 2005 and has held steady thereafter. But if you look at the raw data you’d be misled into thinking there was a steady increase. That’s the aggregation bias I’ve been talking about here and here.

For some reason it’s not so easy to get the numbers before 1999. But, following Deaton’s tip, I grabbed the 1999-2013 data and made some plots. All are renormalized to be relative to 1999.

Based on my earlier analysis, I’m guessing that age-adjusted mortality in this group dropped pretty dramatically from 1989 to 1999. Hence the title of this post.

The natural next step is to break this one up by men and women, and by ethnic group. And someone should do this. But not me. I got a job, and this ain’t it.

**P.S.** In the original version of this post I referred to “non-Hispanic white men.” I don’t know why I wrote that. All these graphs are for non-Hispanic whites, both sexes. As noted above, it would be easy enough to do separate calculations for men and women, but I didn’t do that.

The post What happened to mortality among 45-54-year-old white non-Hispanics? It declined from 1989 to 1999, increased from 1999 to 2005, and held steady after that. appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Age adjustment mortality update appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Earlier today I discussed a paper by Anne Case and Angus Deaton in which they noted an increase in mortality rates among non-Hispanic white Americans from 1989 to 2013, a pattern that stood in sharp contrast to a decrease in several other rich countries and among U.S. Hispanics as well:

Interpretation of this graph is tricky though, because the “45-54” age group was, on average, younger at the beginning of this time series than at the end, what with the big fat baby boomer generation passing through (see image at top of page). Average age increased from 49.1 in 1989 to 49.7 in 2013. Not a huge increase, but not trivial either given the steady increase in mortality rate as a function of age (approximately 8% per year) among the middle-aged.

I did a quick calculation to estimate what we might expect to happen to the mortality rate in the 45-54 age group, just from the changing age distribution, and here’s what I found:

Based on this analysis, the entire increase in mortality among non-Hispanic white Americans aged 55 in the Case-Deaton graph can be explained by changing age composition. Sociologist Philip Cohen sliced the data in a somewhat different way and estimated that the change in age composition could explain about half of the increase.

As I wrote in my earlier post, the Case-Deaton result is still interesting because of the comparison to other countries (and to Hispanics within the U.S.): these other groups show declines in mortality rates of around 30%, which is much more than could be explained by any age-aggregation artifacts.

**Deaton replies**

I asked a colleague to point this post to Deaton, and he (Deaton) replied with the following data from the CDC showing deaths per 100,000 among white non-Hispanics in 1999 (not 1989, which was the beginning of the series shown above, but 1999; apparently the pre-1999 data are harder to grab) and 2013:

Age 1999 2013 Change 45 262.3 260.7 -1.6 46 292.9 289.8 -3.1 47 305.9 323.5 17.6 48 337.2 342.9 5.7 49 359.0 384.5 25.5 50 376.7 422.2 45.5 51 429.0 466.1 37.1 52 444.8 481.2 36.4 53 545.1 526.7 -18.4 54 555.3 572.7 17.4

Deaton pointed out that the mortality rate increased among most age groups. And, indeed, the average increase is about 4%.

Deaton also sent this analysis to the New York Times, where David Leonhardt reports:

Breaking down the 45-to-54 age group into single years of age, which should avoid Mr. Gelman’s concern, still shows the same pattern.

“If we want to be more precise about the age range involved, we could say that for all single years of age from 47 to 52, mortality rates are increasing,” wrote Mr. Deaton, the most recent winner of the Nobel Prize in economics. “So the overall increase in mortality is not due to failure to age adjust.” . . .

“We stick by our results,” he said.

According to the table above, mortality rates among non-Hispanic whites aged 45-54 increased by an average of about 4% after controlling for age. But if you go to Case and Deaton’s graph above, you’ll find an increase of about ~~12%~~ 9% in the raw mortality rate for that group from 1999 (again, not 1989 for this comparison) and 2013.

So according to these calculations, if you correct for the age-composition bias, about ~~2/3~~ half of the observed change from 1999 to 2013 goes away. If you look at the top graph above, 1999 appears to be an unusual year so it might not be the best to use as a baseline.

Here, then, is a quick summary of our estimates of the bias from age composition in estimating the recent changes in death rate for non-Hispanic white Americans aged 45-54:

After controlling for age, there was a decline in the death rate from 1989 to 1999, then an increase from 1999 to 2005, then it’s been steady since then. See graphs here.

In my post, I estimated no change because I was considering the entire range, 1989-2013, as presented in the original Case and Deaton paper. In his reply Deaton estimated an increase because he was just looking from 1999-2013. Actually, though, all that increase occurred between 1999 and 2005.

**Summary**

So there appears to have been no aggregate increase in age-adjusted mortality in this group in the 1989-2013 period.

Is it then appropriate to say “We stick by our results”?

In this case I say yes, that Case and Deaton’s main results seem to stand up just fine.

As noted above (and in my earlier post), their key claim was that death rates among middle-aged non-Hispanic whites in the U.S. slightly increased, even while corresponding death rates in other countries declined by about 30%. Even after you apply a bias correction and find that death rates among middle-aged non-Hispanic whites in the U.S. were actually flat (or maybe even decreased slightly), the key comparison to other countries is barely affected. A bias of 5% is small compared to an observed difference of 30%.

And this is why I emphasized throughout that this statistical bias did not invalidate the Case and Deaton study. As a statistician, I am of course interested in such biases, and it wasn’t clear to me ahead of time how large the correction would be. It turned out that the bias explained the observed increase among 45-54-year-old non-Hispanic whites, and that’s interesting, but the cross-national comparison is still there, and that seems to be the most important thing.

**P.S.** Deaton also asked why I estimated the bias using the age distribution rather than single-year mortality rates. The answer to this question is that I just used the data I found. I have no great familiarity with demographic data and I did not know that the data by ethnicity and year of age were easily available. I agree that the natural thing to do would be to analyze death rates by year of age. If someone can point me to such a dataset, I’d be glad to fit a model to it, indeed this would be an excellent project.

**P.P.S.** The mortality rates by year of age from 1999 to 2003 are at CDC Wonder, so that’s a start. If anyone knows where the 1989-1998 data are, please let me know.

I agree with Case and Deaton on the main point, for sure: if indeed there was a decrease from 1989 to 1999, and an increase from 1999 to 2005, and no change after that, this is largely consistent with their story of there being a reversal, or at least a stalling of improvement, after decades of progress. And, in any case, the change compared to other countries and groups is huge. Which is a point that I emphasized in all my posts. The existence of a bias does not imply that there is no underlying effect. Indeed, that’s why I wanted to quantify the bias, to get a sense of how it changes one’s conclusions.

**P.P.P.S.** More graphs here, including this:

The post Age adjustment mortality update appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Correcting statistical biases in “Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century”: We need to adjust for the increase in average age of people in the 45-54 category appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>This paper documents a marked increase in the all-cause mortality of middle-aged white non-Hispanic men and women in the United States between 1999 and 2013. This change reversed decades of progress in mortality and was unique to the United States; no other rich country saw a similar turnaround.

Here’s the key figure:

I have no idea why they label the lines with three-letter abbreviations when there’s room for the whole country names, but maybe that’s some econ street code thing I don’t know about.

Anyway, the graph is pretty stunning. And for obvious reasons I’m very interested in the mortality of white Americans in the 45-54 age range.

But could this pattern be an artifact of the coarseness of the age category? A commenter here raised this possibility a couple days ago, pointing out that, during the period shown in the above graph (1989 to the present), the 45-54 bin has been getting older as the baby boom has been moving through. So you’d expect an increasing death rate in this window, just from the increase in average age.

How large is this effect? We can make a quick calculation. A blog commenter pointed out this page from the Census Bureau, which contains a file with “Estimates of the Resident Population by Single Year of Age, Sex, Race, and Hispanic Origin for the United States: April 1, 2000 to July 1, 2010.” We can take the columns corresponding to white non-Hispanic men and women. For simplicity I just took the data from Apr 2000 and assumed (falsely, but I think an ok approximation for this quick analysis) that this age distribution translates by year. So, for example, if we want people in the 45-54 age range in 1990, we take the people who are 55-64 in 2000.

If you take these numbers, you can compute the average age of people in the 45-54 age group during the period covered by Case and Deaton, and this average age does creep up, starting at 49.1 in 1989 and ending up at 49.7 in 2013. So the increase has been about .6 years of age.

How does this translate into life expectancy? We can look up the life table at this Social Security website. At age 45, Pr(death) is .003244 for men and .002069 for women. At age 54, it’s .007222 for men and .004301 for women. So, in one year of age, Pr(death) is multiplied by approximately a factor of (.007222/.003244)^.1 = 1.08 for men and (.004301/.002069)^.1 = 1.08 for women—that is, an increase in Pr(death) of 8% per year of age.

The above calculations are only approximate because they’re using life tables for 2011, and for the correct analysis you’d want to use the life table for each year in the study. But I’m guessing it’s close enough.

To continue . . . in the period graphed by Case and Deaton, average age increases by about half a year, so we’d expect Pr(death) to increase by about .6*8%, or about 5%, in the 45-54 age group, just from the increase of average age within the cohort as the baby boom has passed through.

Doing the calculation a bit more carefully using year-by-year mortality rates, we get this estimate of how much we’d expect death rates in the 45-54 age range to increase, just based on the increase in average age as the baby boom passes through:

This is actually not so different from the “US Whites” line in the Case-Deaton graph shown above: a slight decrease followed by a steady increase, with a net increase in death rate of about 5% for this group. Not identical—the low point in the actual data occurs around 1998, whereas the low point is 1993 in my explain-it-all-by-changes-in-age-composition graph—but similar, both in the general pattern and in the size of the increase over time.

But Case and Deaton also see a dramatic drop in death rates for other countries (and for U.S. Hispanics), declines of about 30%. When compared to these 30% drops, a bias of 5% due to increasing average age in the cohort is pretty minor.

**Summary**

According to my quick calculations, the Case and Deaton estimates are biased because they don’t account for the increase in average age of the 45-54 bin during the period they study. After we correct for this bias, we no longer find an increase in mortality among whites in this category. Instead, the curve is flat.

So I don’t really buy the following statement by Case and Deaton:

If the white mortality rate for ages 45−54 had held at their 1998 value, 96,000 deaths would have been avoided from 1999–2013, 7,000 in 2013 alone. If it had continued to decline at its previous (1979‒1998) rate, half a million deaths would have been avoided in the period 1999‒2013.

According to my above calculation, the observed increase in death rate in the 45-54 cohort is roughly consistent with a *constant* white mortality rate for each year of age. So I think it’s misleading to imply that there were all these extra deaths.

However, Case and Deaton find dramatic *decreases* in mortality rates in other rich countries, decreases on the order of 30%. So, even after we revise their original claim that death rates for 45-54’s are going up, it’s still noteworthy that they haven’t sharply declined in the U.S., given what’s happened elsewhere.

So, one could rewrite the Case and Deaton abstract to something like this:

This paper documents a

~~marked increase~~flattening in the all-cause mortality of middle-aged white non-Hispanic men and women in the United States between 1999 and 2013. This change~~reversed~~ended decades of progress in mortality and was unique to the United States; no other rich country saw a similar~~turnaround~~stasis.

Still newsworthy.

**P.S.** Along similar lines, I’m not quite sure how to interpret Case and Deaton’s comparisons across education categories (no college; some college; college degree), partly because I’m not clear on why they used this particular binning but also because the composition of the categories have changed during the period under study. The group of 45-54-year-olds in 1999 with no college degree is different from the corresponding group in 2013, so it’s not exactly clear to me what is learned by comparing these groups. I’m not saying the comparison is meaningless, just that the interpretation is not so clear.

**P.P.S.** See here for a response to some comments by Deaton.

**P.P.P.S.** And still more here.

The post Correcting statistical biases in “Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century”: We need to adjust for the increase in average age of people in the 45-54 category appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post 4 for 4.0 — The Latest JAGS appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I just saw over on Martyn Plummer’s JAGS News blog that JAGS 4.0 is out. Martyn provided a series of blog posts highlighting the new features:

1. Reproducibility: Examples will now be fully reproducible draw-for-draw and chain-for-chain with the same seed. (Of course, compiler, optimization level, platform, CPU, and OS can also affect numeric computations.) They also added unit testing. (How does anyone develop anything this complex without tests? I’d be lost.)

2. Better Error Messages: Examples with undefined array elements or directed cycles get flagged as such.

3. More R-like Features: This includes some nice variable arg length sum and product functions, but what really caught my eye is allowing integer arrays as indexes R-style (and as loop “bounds” as in R). It makes writing hierarchical models very neat. I don’t like that JAGS now allows the equality sign (=) for assignment—multiple ways to do things can be confusing for people reading the code, though this case is mostly harmless. I’ve always worried about efficiency in using arrays for loop bounds, but my worry’s probably misplaced.

4. Easter Eggs: This is straight from Martyn’s blog post:

One motivation for writing these blog posts was to draw users’ attention to new features that I wanted people to be aware of, even though they are not documented. There are other features – new distributions and samplers – that are currently undocumented and hence hidden. These will miraculously appear as “new” features as they are documented during the JAGS 4.x.y release series.

There’s always the source code!

The post 4 for 4.0 — The Latest JAGS appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Why Retraction Watch remains necessary appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>As I wrote at the time, I have nothing against the authors of the paper in question. I expect they’re doing their best. It’s not their fault that (a) statistical methods are what they are, (b) statistical training is what is is, and (c) the editors of Psychological Science don’t know any better. It’s all too bad, but it’s not their fault. I laugh at these studies because I’m too exhausted to cry, that’s all. And, before you feel too sorry for these guys or for the editors of Psychological Science or think I’m picking on them, remember: if they didn’t want the attention, they didn’t need to publish this work in the highest-profile journal of their field. If you put your ideas out there, you have to expect (ideally, hope) that people will point out what you did wrong.

I’m honestly surprised that Psychological Science is still publishing this sort of thing. They’re really living up to their rep, and not in a good way. PPNAS I can expect will publish just about anything, as it’s not peer-reviewed in the usual way. But Psych Science is supposed to be a real journal, and I’d expect, or at least hope, better from them.

**The good news and the bad news**

*The good news* comes from a commenter, who reports that Psych Science just retracted the paper:

The authors still express what I view as naive and unrealistic hopes:

We will conduct a revised Experiment 2 that more directly tests the motivational interpretation and improves the assessment of BY accuracy. If this revised experiment yields the same findings as our original Experiment 2, we will seek publication of our original Experiment 1 with the new Experiment 2. We remain confident in the proposition that sadness impairs color perception, but would like to acquire clearer evidence before making this conclusion in a journal the caliber of Psychological Science.

I think they don’t fully understand how difficult it is to learn from noisy data. But I’m glad they retracted. And I can hardly blame them for still holding out hope in their hypothesis.

*The bad news* is that Psych Science has not yet promoted the retraction at the same level as it promoted the original claim.

What I’d like to see from them is a feature story, titled something like, “Blue and Seeing Blue: Desire for Publication May Impair Research Effectiveness.” Instead, though, this is what I see on their webpage:

They did tweet the retraction, so that’s something:

And they retracted the old press release. But I really think they should give the retraction the same publicity they gave to the original report.

Again, no shame on the researchers involved. They made a mistake, something that happens all the time as is no surprise given the null hypothesis significance testing approach in which researchers are trained. I make statistical mistakes all the time, so I’m not surprised that others do too. Post-publication peer review is a great way to catch such errors, and increased awareness of the problems with noisy studies may be a way to reduce such errors in the future.

**P.S.** More on the story at Retraction Watch.

The post Why Retraction Watch remains necessary appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Econometrics: Instrument locally, extrapolate globally appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Econometrics: Instrument locally, extrapolate globally appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Hey—looky here! This business wants to hire a Stan expert for decision making. appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Kevin Van Horn writes:

I currently work in a business analytics group at Symantec, and we have several positions to fill. I’d like at least one of those positions to be filled by someone who understands Bayesian modeling and is comfortable using R (or Python) and Stan (or other MCMC tools). The team’s purpose is to maximize revenue growth by using data and analytics to advise decision makers in key areas such as sales, marketing, and finance. The position involves identifying and modeling business processes and their corresponding data flows, and recommending improvements.

If you think you might be interested, reply to me directly (Kevin_VanHorn … at … symantec … dot … com).

The post Hey—looky here! This business wants to hire a Stan expert for decision making. appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post The tabloids strike again appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>What do you make of this?

At first sight I’m unimpressed; it looks like just another glamour journal fluff piece. For example, it seems to me that Figure 1F commits the error described here; and the authors seem to ignore the large increase (regression to the mean) in the second column (of 4) between Figures 1D and 1E. But maybe I’m being too instantly skeptical, in what I suppose may come to be known as “LaCour month”.

I replied: Wow—the tabloids strike again! What made you look at this article in the first place?

And Nick responded:

It was the #3 item in the “World News” section of the BBC app a couple of days ago. Not the Science section, or even the Health section under which they filed it, but apparently the third most important piece of news in the world. “FFS”, as the kids say (or maybe that’s the UK only, and “WTF” is the international English version).

All the tabloid-y discussion was about ethics, brainwashing, brave new world, etc. To me it looks like yet another study which is just “obviously wrong” (insufficient power, etc), even before I read it.

Nick then blogged it, under the heading, “Dream on: Playing pinball in your sleep does not make you a better person.”

But, hey, it was a net win for the journal Science: the BBC listed their article as the third-most-important piece of news in the world. And, unlike with LaCour and Green, the data were real. What more could you ask for??

**P.S.** I’m thinking that a better title for this post would be “Unlearning common sense during research.”

The post The tabloids strike again appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “Another reminder that David Brooks is very good at being David Brooks” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “Another reminder that David Brooks is very good at being David Brooks” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Neuroscience research in Baltimore appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Our goal is to work together with neuroexperimentalists to discover fundamental principles governing the relationship between mind and brain, via building and deploying open source petascale tools that run at scale on open access data. This includes analytics, databases, cloud computing, and Web-services applied to both big neuroimages and big neurographs.

We believe in the upward spiral of science, that is, that the scientific community is in a continual process of learning from past, building models on previous experiments, which lead to new experiments, etc. Therefore, we practice reproducible research and open science. Our interests span the spatiotemporal scales of analysis, ranging from nanoscale (serial electron microscopy) to microscale (e.g., array tomography, CLARITY, and calcium imaging) to macroscale (e.g., SPECT and multimodal magnetic resonance imaging). For more information, please visit our website, http://neurodata.io.

For all potential applicants

NeuroData is always hiring exceptional individuals at all levels. There are only a few requirements: (1) you are excited to work closely with a team of diverse thinkers, including computer scientists, biomedical engineers, and neuroscientists, (2) you are will to commit at least 2-3 years to the project, and (3) you understand that we do open science – this means that all code that we write is open sourced and therefore run by other people (one implication of this is that our code is always tested and documented). Assuming you fit those conditions, please read the below specialized instructions, and send us an email with your CV, your github handle, and other relevant information (such as recommendations and your first author publications that you are most proud of).Research Associate

If you have an undergraduate degree in a related field, eg, biomedical engineering, neuroscience, or computer science, and want to work with our group, please send your cv, transcript, and recommendations if you have them. Also, please explain why you want to be a research associate, and in particular, work with us :)Undergrad Research

There is always more work for undergrad research in our group! To make a useful contribution, you’ll need to commit a minimum of 10 hrs per week, and must be open to working together for years. The research will be for 3 credits per semester, so you will need to make sure you can take 3 additional credits. It is very likely that your research will include writing code, in either R or Python, processing data, paralllel/distributed computing, possibly proving some theorems, all in the service of answering some neuroscience questions using big data. If this sounds awesome, please send me an email, including your CV/resume, as well as your complete official JHU transcript. If you have any letters of recommendation, please send them as well.Potential Graduate Students

My group will always take exceptional PhD students, and potentially master’s students as well. If you would like to do graduate research with me, the best option is to apply to the BME Department at JHU, and indicate on the application that you want to work with me. I can also officially advise students in Neuroscience, Computer Science, Biostatistics, and Applied Mathematics & Statistics. There is no need to email me, you can simply apply, I will find your application, I promise :)Current JHU Graduate Students

My group will always take exceptional PhD students, and potentially master’s students as well. If you would like to do graduate research with me, and you are already in a graduate program at JHU, please send me your application materials (ie, transcript, essays, recommendations, etc.). If it seems that we are a good fit, we’ll schedule a time to meet up on campus.Post-Doc Positions

We are currently looking for 3 post-docs. In all cases, there will be a significant programming requirement, in either R or Python. Details for each position are below.For the graph statistics postdoc, we will be (i) writing reference implementations in R for a number of graph statistical methods for which we do not currently have said implementations, (ii) developing new graph statistical theory and methods, and (iii) applying said methods to neurographs. All reference implementations will be incorporated into FlashR. We will write papers on graph statistics aimed at statistics and machine learning audiences, as well as some aimed at neuroscientists.

For the computational anatomy postdoc, we will be (i) writing code using the NeuroData infrastructure to extract neuroanatomical objects of interest (e.g., cells, synapses, regions), and (ii) developing and deploying methods for scalable statistical analyses for these objects, such as 3D point processes for 100 million points. Some of this work is explained in more detail here.

The human MRI postdoc will also be writing code using the NeuroData infrastructure, processing lots of open source brains, and making discoveries and writing papers using graph statistics, spatial statistics, and more. One example of this work is the MRI-to-Graphs pipeline, lovingly refered to as m2g.

I copied the ad as written, except that at one point he wrote “Note that,” and I hate “Note that” so I took it out. I don’t want to go around editing other people’s job postings but this time I just couldn’t stop myself.

The post Neuroscience research in Baltimore appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post My job here is done appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>It was cool, back in the day, to be mocked in the House of Commons:

And of course I was happy a few months ago to be cited by the Supreme Court:

But the high point of my journalistic career is being mentioned in Private Eye (see above). I can retire now.

The post My job here is done appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Stop screaming already: Exaggeration of effects of fan distraction in NCAA basketball appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>John Ezekowitz writes:

I have been reading your work on published effect sizes, and I thought you might be interested in this example, which is of small consequence but grates me as a basketball and data fan. Kevin Quealy and Justin Wolfers published an analysis in The NYT on fans’ effectiveness in causing road teams to shoot worse from the free throw line in college basketball.

In the piece, Wolfers notes that players shoot better at home than on the road, but then compares “fan effectiveness” by looking at how much worse opponents shoot at a given arena vs. their home arena. I think it is pretty clear that the correct comparison is opponents’ road FT shooting, not their home shooting.

When I asked him about this, he admitted that the road vs. road effect was smaller. It looks like he just picked home vs. road because he could show a bigger “effect size.” This feels symptomatic of the larger problems you have continued to highlight on your blog.

By the way, not sure if you watch any Columbia basketball, but Maodo Lo can really, really play.

Good point (not about Maodo Lo, that I have no idea about, but regarding the NYT article). The goal of newsworthiness can get in the way of clear communication.

Specifically, Quealy and Wolfers wrote:

On average, college basketball players are about one percentage point less likely to make a free throw when in front of a hostile crowd than when at home. . . . On average, the sixth man’s ability to distract opposing free throwers is worth about 0.2 points per game.

Hmmm, 0.2 points a game is pretty irrelevant anyway. But they get estimates of over 1 point per game for a few teams, most strikingly Arizona State and Northwestern.

Or is it just (or mostly) noise? Quealy and Wolfers write:

Some of the tremendous variation among teams may reflect statistical noise, given that we’re evaluating only five seasons’ worth of data. But that’s still enough to suggest that the overall patterns are real.

They provide no quantitative evidence that for this claim. All they give is this graph:

This graph looks consistent with a small difference attributable to home-court advantage (recall Ezekowitz’s point), but I see no evidence, from this graph alone, that the differences between stadiums are real. I just don’t know.

Quealy and Wolfers write:

There are also a handful of arenas where visiting teams have actually hit a greater share of free throws than they typically do in front of their home fans. Boston College and Notre Dame are two prominent examples. It’s unfair to suggest that these fans actually hurt their team; rather, it’s more likely that they were of little or no help, and random luck means that visitors hit a few extra free throws.

Whoa baby. Hold up right there. First, according to the graph, it’s not “a handful” of teams, it’s about 110 of them. Second, that’s fine to credit these patterns to random luck. But then shouldn’t you also be considering random luck as an explanation for the success of certain teams?

And what’s with this sort of data dredging:

Duke’s Cameron Crazies are among the most famous fan groups in any sport in the country. And to some extent, they live up to their hype. Our data ranks them as one of the more distracting teams in the nation, although they’re outside our top 10. It could be that they’re actually better than that, and that their numbers will improve with more seasons of data. Or perhaps their creativity does not match their intensity.

Here’s the bad news for Duke fans: Their main rivals, the fans in Chapel Hill, have them slightly beaten here. North Carolina’s fans help the Tar Heels to the tune of about two-thirds of a point per game, relative to a typical home crowd.

This is getting ridiculous. These guys could give a story to coin flips.

What’s really needed here is a hierarchical model. Or, simpler than that, let’s just try computing these summaries for each arena in each season, and see if the arenas with these free-throw patterns in season 1, also show the patterns in season 2. At its simplest, if the differences between arenas are all noise, the year-to-year correlation between these results will be essentially zero. Next step is to fit a hierarchical model with arena effects and arena*year interactions.

The post Stop screaming already: Exaggeration of effects of fan distraction in NCAA basketball appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post On deck this week appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>**Tues:** My job here is done

**Wed:** The tabloids strike again

**Thurs:** Econometrics: Instrument locally, extrapolate globally

**Fri:** I wish Napoleon Bonaparte had never been born

**Sat:** This is a workshop you can’t miss: DataMeetsViz

**Sun:** You won’t believe these stunning transformations: How to parameterize hyperpriors in hierarchical models?

The post On deck this week appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post 2 new thoughts on Cauchy priors for logistic regression coefficients appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>In logistic regression, separation occurs when a linear combination of the predictors can perfectly classify part or all of the observations in the sample, and as a result, finite maximum likelihood estimates of the regression coefficients do not exist. Gelman et al. (2008) recommended independent Cauchy distributions as default priors for the regression coefficients in logistic regression, even in the case of separation, and reported posterior modes in their analyses. As the mean does not exist for the Cauchy prior, a natural question is whether the posterior means of the regression coefficients exist under separation. We prove two theorems that provide necessary and sufficient conditions for the existence of posterior means under independent Cauchy priors for the logit link and a general family of link functions, including the probit link. For full Bayesian inference, we develop a Gibbs sampler based on Polya-Gamma data augmentation . . .

It’s good to see research on this. Statistics is the science of defaults, and an important part of statistical theory at its best is the study of how defaults work on a range of problems. It’s a good idea to study the frequency properties of statistical methods—any methods, including Bayesian methods.

I have not read through the paper, but based the above abstract I have two quick comments:

1. We no longer recommend Cauchy as our first-choice default. Cauchy can be fine as a weakly informative prior, but in the recent applications I’ve seen, I’m not really expecting to get huge coefficients, and so a stronger prior such as normal(0,1) can often make sense. See, for example, section 3 of this recent paper. I guess I’m saying that, even for default priors, I recommend a bit of thought into the expected scale of the parameters.

2. I assume that any of the computations can be done in Stan, no need for all these Gibbs samplers. I’m actually surprised that anyone is writing Gibbs samplers anymore in 2015!

The post 2 new thoughts on Cauchy priors for logistic regression coefficients appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post The Rachel Tanur Memorial Prize for Visual Sociology appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Judy Tanur writes:

The Rachel Tanur Memorial Prize for Visual Sociology recognizes students in the social sciences who incorporate visual analysis in their work. The contest is open worldwide to undergraduate and graduate students (majoring in any social science). It is named for Rachel Dorothy Tanur (1958–2002), an urban planner and lawyer who cared deeply about people and their lives and was an acute observer of living conditions and human relationships.

The 2016 competition for the Rachel Tanur Memorial Prize for Visual Sociology is now accepting applications, with a deadline of January 25, 2016. Entries will be judged by members of the Visual Sociology Group (WG03) of the International Sociological Association (ISA). Up to three prizes will be awarded at the Third ISA Forum of Sociology, The Futures We Want: Global Sociology and the Struggles for a Better World, to be held in Vienna, Austria on July 10-14, 2016. Attendance at the forum is not a requirement but is encouraged. First prize is $2,500; second prize is $1,500; and third prize is $500.

For more information and to apply, go here.

The post The Rachel Tanur Memorial Prize for Visual Sociology appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>