As for today’s noontime matchup . . . What can I say? New Jersey’s an amazing place. Hoboken’s own Frank Sinatra is only the #3 seed of our entries from that state, and he’s pitted against Virginia Apgar, an unseeded Jerseyite. Who do you want to invite for our seminar: the Chairman of the Board, or a pioneering doctor who’s a familiar name to all parents of newborns?

Here’s an intriguing twist: I looked up Apgar on wikipedia and learned that she came from a musical family! Meanwhile, Frank Sinatra had friends who put a lot of people in the hospital. So lots of overlap here.

You can evaluate the two candidates on their own merits, or based on who has a better chance of besting Julia Child in round 2.

Again, the full bracket is here, and here are the rules:

]]>We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

The butterfly effect is the semi-serious claim that a butterfly flapping its wings can cause a tornado half way around the world. It’s a poetic way of saying that some systems show sensitive dependence on initial conditions, that the slightest change now can make an enormous difference later . . . Once you think about these things for a while, you start to see nonlinearity and potential butterfly effects everywhere. There are tipping points everywhere waiting to be tipped!

But it’s not so simple. Cook continues:

A butterfly flapping its wings usually has no effect, even in sensitive or chaotic systems. You might even say especially in sensitive or chaotic systems.

Sensitive systems are not always and everywhere sensitive to everything. They are sensitive in particular ways under particular circumstances, and can otherwise be quite resistant to influence.

And:

The lesson that many people draw from their first exposure to complex systems is that there are high leverage points, if only you can find them and manipulate them. They want to insert a butterfly to at just the right time and place to bring about a desired outcome. Instead, we should humbly evaluate to what extent it is possible to steer complex systems at all. We should evaluate what aspects can be steered and how well they can be steered. The most effective intervention may not come from tweaking the inputs but from changing the structure of the system.

Yes! That’s an excellent, Deming-esque point.

Bradley Groff pointed be to the above-linked post and noted the connection to my recent note on the piranha principle, where I wrote:

A fundamental tenet of social psychology, behavioral economics, at least how it is presented in the news media, and taught and practiced in many business schools, is that small “nudges,” often the sorts of things that we might not think would affect us at all, can have big effects on behavior. . . .

The model of the world underlying these claims is not just the “butterfly effect” that small changes can have big effects; rather, it’s that small changes can have big and predictable effects. It’s what I sometimes call the “button-pushing” model of social science, the idea that if you do X, you can expect to see Y. . . .

In response to this attitude, I sometimes present the “piranha argument,” which goes as follows: There can be some large and predictable effects on behavior, but not a lot, because, if there were, then these different effects would interfere with each other, and as a result it would be hard to see any consistent effects of anything in observational data.

I’m thinking of social science and I’m being mathematically vague (I do think there’s a theorem there somewhere, something related to random matrix theory, perhaps), whereas Cook is thinking more of physical systems with a clearer mathematical connection to nonlinear dynamics. But I think our overall points are the same, and with similar implications for thinking about interventions, causal effects, and variation in outcomes.

**P.S.** This is related to my skepticism of structural equation or path analysis modeling and similar approaches used in some quarters of sociology and psychology for many years and promoted in slightly different form by Judea Pearl and other computer scientists: These methods often seem to me to promise a sort of causal discovery that cannot be realistically delivered and in which in many cases I don’t even think makes sense (see this article, especially the last full paragraph on page 960 and the example on page 962), and I see this as connected with the naive view of the butterfly effect described above, an attitude that if you just push certain buttons in a complex social system that you can get predictable results.

In brief: I doubt that the claims deriving from such data analyses will replicate in new experiments, but I have no doubt that anything that doesn’t replicate will be explained as the results of additional butterflies in the system. What I’d really like is for researchers to just jump to the post-hoc explanation stage before even gathering those new validation data. The threat of replication should be enough to motivate people to back off of some of their extreme claims.

To speak generically:

1. Research team A publishes a paper claiming that X causes Y.

2. Research team B tries to replicate the finding, but it fails to replicate.

3. Research team A explains that the original finding is not so general; it only holds under conditions Z, which contain specifics on the experimental intervention, the people in the study, and the context of the study. The finding only holds if the treatment is done for 1 minute, not 3 minutes; it holds only in warm weather, not cold weather; it holds only in Israel, not in the United States; it works for some sorts of stimulus but not others.

4. Ideally, in the original published paper, team A could list all the conditions under which they are claiming their result will appear. That is, they could anticipate step 2 and jump right to step 3, saving us all a lot of time and effort.

**P.P.S.** This post was originally called “Of butterflies and piranhas”; after seeing some comments, I changed the title to focus the message.

After suffering so many bad gymnastics (mathematical, logical, statistical, you name it) at seminars, to have some performed by a true champion would be a welcome change.

But Parker takes it away, based on this formidable contribution of Dzhaughn:

Things I Have Learned From the Contest So Far:

(Cf. “Resume” by Dorothy Parker)Thorpe’s 1/8th hashtag

Babe’s just a champ

Oscar is all Gray

Hotdogs cause cramp

Serena’s a whiner

Erdos sylvan

Jeff’s gone ballistic

I might as well win.

Today’s contest features the second seed in the Creative Eaters category against an unseeded magician. (Regular listeners to This American Life will recall that Glass did magic shows when he was about 12 years old, I think it was.) Both have lots of experience performing in front of an audience. So what’ll it be? Public TV or public radio? In either case, the winner will be facing someone from New Jersey in the second round.

Again, the full bracket is here, and here are the rules:

]]>We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

Far more community college students transfer prior to completing the Associate’s degree than actually complete first. According to a new report from the National Student Clearinghouse Research Center, about 350,000 transfer before completion, compared to about 60,000 who complete first.

That matters in several ways.

Most basically, it suggests that measuring community colleges by their graduation rates misses the point. A student who does a year at Brookdale before transferring to Rutgers, and subsequently graduating, got what she wanted, but she shows up in our numbers as a dropout. In states with “performance funding,” the community college could be punished for her decision, even if it was what she intended to do all along. . .

People who only look at “headline” numbers, and don’t bother with the asterisks, look at graduation rates and assume that something is going horribly wrong. But a ratio of 35 to 6 is such a honker of an asterisk that failing to account for it amounts to misrepresentation. . . .

My preferred measures of community college performance would be based on actual student behavior. For example, does the percentage of bachelor’s grads in a given area with community college credits roughly match the percentage of undergrads who are enrolled at community colleges? (Nationally, it does.) If so, then the idea of community colleges as dropout factories is hard to sustain. For programs not built around transfer, how are the employment outcomes? I wouldn’t look at loan repayment rates, just because the percentage of students with loans is so low; it’s a skewed sample. I would look at achievement gaps by race, sex, age, and income. I would look at ROI for public investment, as well as at local reputation. . . .

And a bunch more. I don’t know much about the world of education policy: Maybe some of these things are already being measured? Seems important, in any case.

]]>For today we have the #2 ranked wit, the star of the Algonquin Round Table—no alcohol jokes in the comments, please—vs. the undisputed GOAT of gymnastics. Two completely different talents, and unfortunately only one can advance to the next round. Who should it be?

Again, the full bracket is here, and here are the rules:

]]>We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

Nasa is a major player in space science, so when a team from the agency this week presents evidence that “impossible” microwave thrusters seem to work, something strange is definitely going on. Either the results are completely wrong, or Nasa has confirmed a major breakthrough in space propulsion. . . .

He has built a number of demonstration systems, but critics reject his relativity-based theory and insist that, according to the law of conservation of momentum, it cannot work.

According to good scientific practice, an independent third party needed to replicate Shawyer’s results. As Wired.co.uk reported, this happened last year when a Chinese team built its own EmDrive and confirmed that it produced 720 mN (about 72 grams) of thrust, enough for a practical satellite thruster. . . . a US scientist, Guido Fetta, has built his own propellant-less microwave thruster, and managed to persuade Nasa to test it out. The test results were presented on July 30 at the 50th Joint Propulsion Conference in Cleveland, Ohio. Astonishingly enough, they are positive. . . .

OK, that was 3.5 years ago. Any followups? A quick google search revealed this article by Guilio Prisco from 2017, “Theoretical Physicists Are Getting Closer to Explaining How NASA’s ‘Impossible’ EmDrive Works: The EmDrive propulsion system might be able to take us to the stars, but first it must be reconciled with the laws of physics.”

If I wanted to be snarky, I’d say they could do a 2-for-1 deal and power the Em-drive with cold fusion. But my physics knowledge is weak, so I’ll just say . . . who knows, maybe this is the interstellar drive we’ve all been waiting for! I’ll believe it once it appears in PNAS.

]]>- Google. Responsible AI Practices.

Enjoy.

]]>I’ll go with Letterman because he looks like he could use the work.

Conversely, the strongest argument *against* Martin came from Adam, who wrote:

Steve Martin once said:

I know what you’re saying, you’re saying, “Steve, where do you find time to juggle?” Well, I juggle in my mind. … Whoops.

so that’s the problem: he might just do magic in his head. and that’s no fun to watch.

Then again, along the same lines as zbicyclist, he might be able to shed some light on the stuff you post on here. In the same routine, he said:

And then on the other hand science, you know, is just pure empiricism and by virtue of its method it excludes metaphysics. And uh, I guess I wouldn’t believe in anything if not for my lucky astrology mood watch.

Take the strongest case *for* Dave, and the strongest case *against* Steve, and Steve still comes out on top. So, no contest.

And now for today’s contest, featuring two people from the Creative Eaters category. (It’s the nature of the random assignment of unseeded competitors that sometimes two people from the same category will face off in the first round.)

Seeded #3 in the group is legendary globetrotting tell-it-like-it-is chef Anthony Bourdain. You can’t go wrong with Bourdain. But his unseeded opponent is formidable too: A. J. Liebling, one of the greatest and most versatile reporters who’s ever lived, author of The Honest Rainmaker and many other classics and the inspiration for O.G. blogger Mickey Kaus’s invention of the concept of Liebling optimality.

Bourdain was skinny and Liebling was fat; make of that what you will.

So give it your best: this round could turn out to be important!

Again, the full bracket is here, and here are the rules:

Consider the following range of responses to an outsider pointing out an error in your published work:

1. Look into the issue and, if you find there really was an error, fix it publicly and thank the person who told you about it.

2. Look into the issue and, if you find there really was an error, quietly fix it without acknowledging you’ve ever made a mistake.

3. Look into the issue and, if you find there really was an error, don’t ever acknowledge or fix it, but be careful to avoid this error in your future work.

4. Avoid looking into the question, ignore the possible error, act as if it had never happened, and keep making the same mistake over and over.

5. If forced to acknowledge the potential error, actively minimize its importance, perhaps throwing in an “everybody does it” defense.

6. Attempt to patch the error by misrepresenting what you’ve written, introducing additional errors in an attempt to protect your original claim.

7. Attack the messenger: attempt to smear the people who pointed out the error in your work, lie about them, and enlist your friends in the attack.

We could probably add a few more rungs to the latter, but the basic idea is that response 1 is optimal, responses 2 and 3 are unfortunate but understandable, response 4 represents at the very least a lost opportunity for improvement, and responses 5, 6, and 7 increasingly pollute the public discourse.

David Brooks is a pretty solid 4 on that scale, which isn’t great but in retrospect is like a breath of fresh air, given the 6’s and 7’s we’ve been encountering lately.

Most of the responses I’ve seen, in academic research and also the news media, have been 1’s. Or, at worst, 2’s and 3’s. From that perspective, Brooks’s stubbornness (his 4 on the above scale) has been frustrating. But it can, and has, been much worse. So I appreciate that, however Brooks handles criticism of his own writing, he does not go on the attack. Similarly, I was annoyed when Gregg Easterbrook did response 2, but, in retrospect, that 2 doesn’t seem so bad at all.

As I said, I put the above into a comment thread, but I thought it’s something we might want to refer to more generally, so it’s convenient to give it its own post.

]]>Fisher’s first book was “Serve It Forth,” which seems like good advice in tennis, as well. So, you’d get a two-fer there.

That was fine, but not as good as Jonathan’s endorsement of Williams:

Serena would be great at an academic seminar. Just like academics, she has a contempt for referees, even while purporting to regard them as valuable. Just don’t let the Chair interrupt her!

Which was echoed by Diana:

I was going to root for Fisher (whom I have never read) because her victory would make Auden happy. But then I thought about it some more and realized how incapable anyone is of *making* Auden happy—or unhappy, for that matter. In “The More Loving One,” he writes:

Were all stars to disappear or die,

I should learn to look at an empty sky

And feel its total dark sublime,

Though this might take me a little time.So with that motive gone or suspended, I vote for Williams. She’s likely to win a few matches before the end, and that’ll be fun. At the seminar itself, she might even treat us to a serve or two (not to mention a referee chew-out, as Jonathan noted). What could go wrong?

Most of that bit was irrelevant, but I’m a sucker for Auden so I liked it anyway.

Today the competition is a bit more serious. Steve Martin is seeded #4 in the Magicians category even though magic is not one of his main talents; and David Letterman, though unseeded in the TV personalities category, knows how to handle an audience. You can take it from there.

Again, the full bracket is here, and here are the rules:

So then it seems like one way to get a hot hand is to not be guarded, or, more generally, to not be guarded tightly. And I guess this would be even more the case for close-in shots, which just about any NBA player could make at a near-100% rate if there were no defense.

This all suggests that a key part—maybe the most important part—of the hot hand is what defense is on you, and more generally how you handle the defense.

I’m not quite sure what more to say about this right now, but this all seems different from the usual way we talk about the hot hand, with a focus on shooting. It also suggests that it could be a mistake to consider free-throw shooting as somehow a more pure test of the hot hand. If the above speculations are correct, then the hot hand in free-throw shooting is really a completely different thing than the hot-hand for regular shots.

]]>The ACIC 2019 Data Challenge is Live!

Datasets are available for download (no registration required) at https://sites.google.com/view/ACIC2019DataChallenge/data-challenge (bottom of the page).

Check out the FAQ at https://sites.google.com/view/ACIC2019DataChallenge/faq

The deadline for submitting results is April 15, 2019.The fourth Causal Inference Data Challenge is taking place as part of the 2019 Atlantic Causal Inference Conference (ACIC) to be held in Montreal, Canada

(https://www.mcgill.ca/epi-biostat-occh/news-events/atlantic-causal-inference-conference-2019). The data challenge focuses on computational methods of inferring causal effects from quasi-real world data. This year there are two tracks: low dimensional and high dimensional data. Participants will analyze 3200 datasets in either Track 1 or Track 2 to estimate marginal additive treatment effects and associated 95% confidence intervals. Entries will be evaluated with respect to bias, variance, mean squared error, and confidence interval coverage across a variety of data generating processes.

I’m not a big fan of 95% intervals, and I am aware of the general problems arising from this sort of competition: the problems in the contest are not necessarily similar to the problems to which a particular method might be applied. That said, Jennifer has assured me that she and others learned a lot from the results of previous competitions in this series, so on that basis I encourage all of you to take a look and check out this one.

]]>Leave Joe Pesci at home alone. Wilde’s jokes may be very old, but he can use slides from The PowerPoint of Dorian Gray.

As Martha put it, not great, but the best so far in this thread.

On the other side, Jonathan wrote, “I’d definitely rather hear Wilde, but I hate it when speakers aren’t live, and the video connections with Reading Gaol are lousy.”—which wasn’t bad—but then he followed it up with, “Please, though. No Frankie Valli stories.” If even the best Pesci endorsement is so lukewarm, we’ll have to go with Oscar to face off against hot dog guy in the next round.

Today our contest features the #1 food writer of all time vs. an unseeded GOAT. I’ve never actually read anything by M. F. K. Fisher but the literary types rave about her, hence her top seeding in that category. As for Serena Williams, I did go to the U.S. Open once but only to see some of those free qualifying rounds. So this particular matchup is a bit of a mystery to me. Whaddya got?

Again, the full bracket is here, and here are the rules:

It is sometimes said that LOO “relies on the data-exchangeability assumption,” but I don’t think that’s quite the right way to put it, but LOO does assume the relevance of a data partition. We discuss this briefly in section 3.5 of this article. For regular Bayes, p(theta|y) proportional to p(y|theta) * p(theta), there is no partition of data. “y” is just a single object. But for loo, y can be partitioned. At first this bothered me about loo, but then I decided that this is a fundamental idea, related to the idea of “internal replication” discussed by Ripley in his spatial statistics book. The idea is that with just “y” and no partitions, there is no internal replication and no statistically general way of making reliable statements about new cases.

This is similar to (but different from) the distinction in chapter 6 of BDA between the likelihood and the sampling distribution. To do inference for a given model, all we need from the data is the likelihood function. But to do model checking, we need the sampling distribution, p(y|theta), which implies a likelihood function but requires more assumptions (as can be seen, for example, in the distinction between binomial and negative binomial sampling). Similarly, to do inference for a given model, all we need is p(y|theta) with no partitioning of y, but to do predictive evaluation we need a partitioning.

]]>Garbage In, Garbage Out: Mass Consumption and Its Aftermath

Takeru KobayashiNote: Attendance at both sessions is mandatory.

Best. Seminar. Ever.

So hot dog guy is set to go to the next round, against today’s victor.

It’s the wittiest man who ever lived, vs. an unseeded entry in the People from New Jersey category. So whaddya want: some 125-year-old jokes, or a guy who probably sounds like a Joe Pesci imitator? You think I’m funny? I’m funny how, I mean funny like I’m a clown, I amuse you?

Again, the full bracket is here, and here are the rules:

Sharad Goel, Daniel Ho and I looked into the question, in response to a recent lawsuit. We wrote something for the Boston Review:

]]>What Statistics Can’t Tell Us in the Fight over Affirmative Action at Harvard

Asian Americans and Academics

“Distinguishing Excellences”

Adjusting and Over-Adjusting for Differences

The Evolving Meaning of Merit

Character and Bias

A Path Forward

The Future of Affirmative Action

The deciding argument, from Martha:

I’d like to see Von Neumann given four parameters and making an elephant wiggle his trunk. And if he could do it, there would be the chance that Jim Thorpe could do it if they met in a later round.

No way do I think that Neumann could fit that elephant. As I wrote earlier, that elephant quote just seems like bragging! For one thing, I can have a model with a lot more than five parameters and still struggle to fit my data.

I almost want to invite Neumann to speak, just so we can put him on the spot, ask him to fit the damn elephant, and watch him fail. But that’s not cool, to invite a speaker just for the purpose of seeing him crash and burn. That way lies madness.

Today’s contest features two unique talents. Carol Burnett was the last of the old-time variety-show hosts, she can sing, she can dance, and according to Wikipedia, she was “the first celebrity to appear on the children’s series Sesame Street.” But she’s facing stiff competition, from the Japanese dude who won the hot dog eating contest. That’s an accomplishment, to have done something so impressive that this one feat defines you. So I think that whoever advances to the next round will be a strong competitor. Neither Carol Burnett nor the Japanese dude who won the hot dog eating contest are top seeds, but both of them are interesting dark horse candidates.

Again, the full bracket is here, and here are the rules:

Why are stories so powerful? To answer this, we have to go back at least 100,000 years. This is when humans started to speak. For the following roughly 94,000 years, we could only use spoken words to communicate. Stories helped us survive, so our brains evolved to love them.

Paul Zak of the Claremont Graduate University in California researches what stories do to our brain. He found that once hooked by a story, our brain releases oxytocin. The hormone affects our mood and social behaviour. You could say stories are a shortcut to our emotions.

There’s more to it; stories also help us remember facts. Gordon Bower and Michal Clark from Stanford University in California let two groups of subjects remember random nouns. One group was instructed to create a narrative with the words, the other to rehearse them one by one. People in the story group recalled the nouns correctly about six to seven times more often than the other group.

But my collaborator Thomas Basboll is skeptical:

It seems to me that a paper that has been written to mimic the most compelling features of Hollywood blockbusters (which Anna explicitly invokes) is also, perhaps unintentionally, written to avoid critical engagement. Indeed, when Anna talks about “characters” she does not mention the reader as a character in the story, even though the essential “drama” of any scientific paper stems from the conversation that reader and writer are implicitly engaged in. The writer is not simply trying to implant an idea in the mind of the reader. In a research paper, we are often challenging ideas already held and, crucially, opening our own thinking to those ideas and the criticism they might engender.

Basboll elaborates:

Anna promises that storytelling can produce papers that are “concise, compelling, and easy to understand”. But I’m not sure that a scientific paper should actually be compelling. . . . A scientific paper should be vulnerable to criticism; it should give its secrets away freely, unabashedly. And the best way to do that is, not to organise it with the aim of releasing oxytocin in the mind of the reader, but by clearly identifying your premises and your conclusions and the logic that connects them. You are not trying to bring your reader to a narrative climax. You are trying to be upfront about where your argument will collapse under the weight of whatever evidence the reader may bring to the conversation. Science, after all, is not so much about what Coleridge called “the suspension of disbelief” as what Merton called “organised skepticism”.

In our article from a few years ago, Basboll and I wrote about how we as scientists learn from stories. In discourse about science communication, stories are typically presented as a way for scientists to frame, explain, and promote their already-formed ideas; in our article, Basboll and I looked from a different direction, considering how it is that scientists can get useful information from stories. We concluded that stories are a form of model checking, that a good story expresses true information that contradicts some existing model of the world.

Basboll’s above exchange with Clemens is interesting in a different way: Clemens is saying that stories are an effective way to communicate because they compelling and memorable. Basboll replies that science shouldn’t always be compelling: so much of scientific work is mistakes, false starts, blind alleys, etc., so you want the vulnerabilities of any scientific argument to be clear.

The resolution, I suppose, is to use stories—but not in a way that hides the potential weaknesses of a scientific argument. Instead, harness the power of storytelling to make it easier for readers to spot the flaws.

The point is that there are two dimensions to scientific communication:

1. The medium of expression. Storytelling can be more effective than a dry sequence of hypothesis, data, results, conclusion.

2. The goal of communication. Instead of presenting a wrapped package of perfection, our explanation should have lots of accessible points: readers should be able to pull the strings so the arguments can unravel, if that is possible.

**P.S.** More on this from Basboll here.

This course offers a rigorous mathematical survey of causal inference at the Master’s level. Inferences about causation are of great importance in science, medicine, policy, and business. This course provides an introduction to the statistical literature on causal inference that has emerged in the last 35-40 years and that has revolutionized the way in which statisticians and applied researchers in many disciplines use data to make inferences about causal relationships. We will study methods for collecting data to estimate causal relationships. Students will learn how to distinguish between relationships that are causal and non-causal; this is not always obvious. We shall then study and evaluate the various methods students can use — such as matching, sub-classification on the propensity score, inverse probability of treatment weighting, and machine learning — to estimate a variety of effects — such as the average treatment effect and the effect of treatment on the treated. At the end, we discuss methods for evaluating some of the assumptions we have made, and we offer a look forward to the extensions we take up in the sequel to this course.

Last year Bob Carpenter and I started to put together a Coursera course on Bayesian statistics and Stan, but we ended up deciding we weren’t quite ready to do so. In any case, causal inference is a (justly) popular topic, and I expect that this online version of Michael’s course at Columbia will be good.

]]>For Erdos, from Diana Senechal:

From an environmental perspective, Erdos is the better choice; his surname is an adjectival form of the Hungarian erdő, “forest,” whereas “Carson” clearly means “son of a car.” Granted, the son of a car, being rebellious and all, might prove especially attentive to the quality of the air, but we have no evidence of this.

On the other side Stephen Oliver had an excellent practical point:

Johnny Carson, because if Erdos gave a talk it would be overrun by mathematicians trying to get a paper with him.

But I had to call it for Erdos after this innovative argument from Ethan Bolker, who said, “I have a good argument for Erdos but will save it for a later round. If he loses this one you’ll never know . . .” I think you can only use that ploy once ever—but he used it!

Our next bout features two people who changed their own names. In one corner, one of the most brilliant mathematicians of all time, but a bit of a snob who enjoyed hobnobbing with government officials and apparently added “von” to his name to make himself sound more upper-class. In the other corner, a very funny man who goes by “Brooks” because he didn’t feel like going through life with the name Albert Einstein.

From what I’ve read about von Neumann, I find him irritating and a bit of a braggart. But, if we want to go negative, we can get on Brooks’s case for not fulfilling his early comedic promise. So maybe we should be looking for positive things to say about these two guys.

Again, the full bracket is here, and here are the rules:

Damn. This story makes me so frustrated I can’t even laugh. I can only cry.

Here’s the background. A few months ago, Aleksi Reito (who sent me the adorable picture above) pointed me to a short article by Yanik Bababekov, Sahael Stapleton, Jessica Mueller, Zhi Fong, and David Chang in Annals of Surgery, “A Proposal to Mitigate the Consequences of Type 2 Error in Surgical Science,” which contained some reasonable ideas but also made a common and important statistical mistake.

I was bothered to see this mistake in an influential publication. Instead of blogging it, this time I decided to write a letter to the journal, which they pretty much published as is.

My letter went like this:

An article recently published in the Annals of Surgery states: “as 80% power is difficult to achieve in surgical studies, we argue that the CONSORT and STROBE guidelines should be modified to include the disclosure of power—even if <80%---with the given sample size and effect size observed in that study”. This would be a bad idea. The problem is that the (estimated) effect size observed in a study is noisy, especially so in the sorts of studies discussed by the authors. Using estimated effect size can give a terrible estimate of power, and in many cases can lead to drastic overestimates of power . . . The problem is well known in the statistical and medical literatures . . . That said, I agree with much of the content of [Bababekov et al.] . . . I appreciate the concerns of [Bababekov et al.] and I agree with their goals and general recommendations, including their conclusion that “we need to begin to convey the uncertainty associated with our studies so that patients and providers can be empowered to make appropriate decisions.” There is just a problem with their recommendation to calculate power using observed effect sizes.

I was surgically precise, focusing on the specific technical error in their paper and separating this from their other recommendations.

And the letter was published, with no hassle! Not at all like my frustrating experience with the American Sociological Review.

So I thought the story was over.

But then my blissful slumber was interrupted when I received another email from Reito, pointing to a response in that same journal by Bababekov and Chang to my letter and others. Bababekov and Chang write:

We are greatly appreciative of the commentaries regarding our recent editorial . . .

So far, so good! But then:

We respectfully disagree that it is wrong to report post hoc power in the surgical literature. We fully understand that P value and post hoc power based on observed effect size are mathematically redundant; however, we would point out that being redundant is not the same as being incorrect. . . . We also respectfully disagree that knowing the power after the fact is not useful in surgical science.

No! My problem is not that their recommended post-hoc power calculations are “mathematically redundant”; my problem is that their recommended calculations *will give wrong answers* because they are based on extremely noisy estimates of effect size. To put it in statistical terms, their recommended method has *bad frequency properties*.

I completely agree with the authors that “knowing the power after the fact” can be useful, both in designing future studies and in interpreting existing results. John Carlin and I discuss this in our paper. But the authors’ recommended procedure of taking a noisy estimate and plugging it into a formula does *not* give us “the power”; it gives us a *very noisy estimate of the power*. Not the same thing at all.

Here’s an example. Suppose you have 200 patients: 100 treated and 100 control, and post-operative survival is 94 for the treated group and 90 for the controls. Then the raw estimated treatment effect is 0.04 with standard error sqrt(0.94*0.06/100 + 0.90*0.10/100) = 0.04. The estimate is just one s.e. away from zero, hence not statistically significant. And the crudely estimated post-hoc power, using the normal distribution, is approximately 16% (the probability of observing an estimate at least 2 standard errors away from zero, conditional on the true parameter value being 1 standard error away from zero). But that’s a noisy, noisy estimate! Consider that effect sizes consistent with these data could be anywhere from -0.04 to +0.12 (roughly), hence absolute effect sizes could be roughly between 0 and 3 standard errors away fro zero, corresponding to power being somewhere between 5% (if the true population effect size happened to be zero) and 97.5% (if the true effect size were three standard errors from zero). That’s what I call noisy.

Here’s an analogy that might help. Suppose someone offers me a shit sandwich. I’m not gonna want to eat it. My problem is not that it’s a sandwich, it’s that it’s filled with shit. Give me a sandwich with something edible inside; then we can talk.

I’m not saying that the approach that Carlin and I recommend—performing design analysis using substantively-based effect size estimates—is trivial to implement. As Bababekov and Chang write in their letter, “it would be difficult to adapt previously reported effect sizes to comparative research involving a surgical innovation that has never been tested.”

Fair enough. It’s not easy, and it requires assumptions. But that’s the way it works: if you want to make a statement about power of a study, you need to make some assumption about effect size. Make your assumption clearly, and go from there. Bababekov and Chang write: “As such, if we want to encourage the reporting of power, then we are obliged to use observed effect size in a post hoc fashion.” No, no, and no. You are not obliged to use a super-noisy estimate. You were allowed to use scientific judgment when performing that power analysis you wrote for your grant proposal, before doing the study, and you’re allowed to use scientific judgment when doing your design analysis, after doing the study.

The whole thing is so frustrating.

Look. I can’t get mad at the authors of this article. They’re doing their best, and they have some good points to make. They’re completely right that authors and researchers should not “misinterpret P > 0.05 to mean comparison groups are equivalent or ‘not different.'” This is an important point that’s not well understood; indeed my colleagues and I recently wrote a whole paper on the topic, actually in the context of a surgical example. Statistics is hard. The authors of this paper are surgeons and health policy researchers, not statisticians. I’m a statistician and I don’t know anything about surgery; no reason to expect these two surgeons to know anything about statistics. But, it’s still frustrating.

**P.S.** After writing the above post a few months ago, I submitted it (without some features such as the “shit sandwich” line) as a letter to the editor of the journal. To its credit, the journal is publishing the letter. So that’s good.

Next up is Johnny Carson #2 in the TV personalities category and arguably the best talk-show host ever, against Paul Erdos, one of the weirdest and prolific mathematicians of all time. I’m guessing that the commenters here will side with Erdos, but I dunno. From everything I’ve read about Erdos, he’s always seemed irritating to me. In some ways, I can relate to the guy: like me, he liked to solve research problems with lots of different collaborators, but there’s something about all those indulgent descriptions of the guy that rub me the wrong way. In contrast, Johnny Carson is just brilliant. But, in any case, it’s up to you, not me, to give the most compelling arguments on both sides.

Remember, the full bracket is here, and here are the rules:

Dear Contributor,

ADVANCES IN POLITICAL METHODOLOGY

[978 1 78347 485 1]

Regular price: $455.00

Special Contributor price: $113.75 (plus shipping)

We are pleased to announce the publication of the above title. Due to the limited print run of this collection and the high number of contributing authors, we are unable to offer a complimentary copy. In recognition of your contribution, however, we are delighted to offer you one copy of this title at a discount of 75% off the list price (excluding postage and packing). Please note that these purchases should be for personal use and not for resale.

If you would like to take advantage of this offer, please visit our website at the link below. To receive your 75% discount on one copy of this title enter ( FRANZESE75 ) in the discount code field during checkout.

http://www.e-elgar.com/shop/advances-in-political-methodology

50% DISCOUNT ON ALL TITLES

You can also purchase further copies of this title and other titles from the Elgar list at a 50% author discount.

As a thank you to our authors and contributors, Edward Elgar Publishing offers a 50% discount on all titles. Orders must be prepaid and are for personal use only. To take advantage of this offer at any time, please enter the discount code ‘EEAUTHOR’ on the payment page of our website: www.e-elgar.com Please note only one discount code is allowed per order. Any further questions please feel free to contact us.

With best wishes,

Research Collections Department

Edward Elgar PublishingIndependent Publisher of the Year 2017- Independent Publishers Guild

Academic & Professional Publisher of the Year 2017 & 2014 – Independent Publishers Guild

Digital Publisher of the Year 2015 – Independent Publishers Guild

Independent, Academic, Educational and Professional Publisher of the Year 2014 & 2013 – The Bookseller

Wow, a mere $113.75 (plus shipping), huh? I guess that’s what it takes to be named Digital Publisher of the Year.

Also, I just love it that this extremely-low price of $113.75 excludes “postage and packing.” No free lunches here, no siree!

]]>~~Still a couple glitches, though. Right now it doesn’t seem to be possible to comment. We hope to get that fixed soon (unfortunately it’s Friday evening and I don’t know if anyone’s gonna look at it over the weekend), will let you know when comments work again. Regularly scheduled posts will continue to appear.~~

Comments work too now!

]]>- Bayesian Data Analysis Meetup: Under the hood: Stan’s library, language, and algorithms

After summarizing what Stan does, this talk will focus on how Stan is engineered. The talk follows the organization of the Stan software.

*Stan math library*: differentiable math and stats functions, template metaprorgrams to manage constants and vectorization, matrix derivatives, and differential equation derivatives.

*Stan language*: block structure and execution, unconstraining variable transforms and automatic Jacobians, transformed data, parameters, and generated quantities execution.

*Stan algorithms*: Hamiltonian Monte Carlo and the no-U-turn sampler (NUTS), automatic differentiation variational inference (ADVI).

*Stan infrastructure and process*: Time permitting, I can also discuss Stan’s developer process, how the code repositories are organized, and the code review and continuous integration process for getting new code into the repository

**Slides**

- Bob Carpenter. Stan: Under the Bonnet. Slides for NYC meetup presented 17 January 2019.

I realized I’m missing a good illustration of NUTS and how it achieves detailed balance and preferentially selects positions on the Hamiltonian trajectory toward the end of the simulated dynamics (to minimize autocorrelation in the draws). It was only an hour, so I skipped the autodiff section and scalable algorithms section and jumped to the end. I’ll volunteer do another meetup with the second half of the talk.

]]>During your talk last week, you spoke about the role of stories in scientific theory. On page 104 of What Is Real: The Unfinished Quest for the Meaning of Quantum Physics, Adam Becker talks about stories and scientific theory in relation to alternative conceptions of quantum theory, particularly between Bohm’s pilot-wave interpretation and Bohr’s Copenhagen interpretation:

The picture of the world that comes along with a physical theory is an important component of that theory. Two theories that are identical in their predictions can have wildly different pictures of the world… and those pictures, in turn, determine a lot about the daily practice of science… The story that comes along with a scientific theory influences the experiments that scientists choose to perform, the way new evidence is evaluated, and ultimately, guides the search for new theories as well.

Anyways, I just wanted to share the passage as I think Becker has done a nice job of connecting the two.

A lot of things came up in my talk, but at the beginning I did discuss how in science we learn from stories. For researchers, stories for scientists are not just a way for us to vividly convey our findings to others. Stories also frame our understanding of the world. I discussed the idea of stories being anomalous and immutable (see second link above for more on this); the above Becker quote is interesting in that it captures the importance of story-like structures in our understanding as well as in our communication.

]]>We had some good arguments on both sides yesterday.

Jonathan writes:

In my experience, comedians are great when they’re on-stage and morose and unappealing off-stage. Sullivan, on the other hand, was morose and unappealing on-stage, and witty and charming off-stage, or so I’ve heard. This comes down, then, to deciding whether the speaker treats the seminar as a stage or not. I don’t think Sullivan would, because it’s not a “rilly big shew.”

That’s some fancy counterintuitive reasoning: Go with Sullivan because he won’t take it seriously so his pleasant off-stage personality will show up.

On the other hand, Zbicyclist goes with the quip:

Your Show of Shows -> Your Seminar of Seminars.

Render unto Caesar.

I like it. Sid advances.

For our next contest, things get more interesting. In one corner, the greatest female athlete of all time, an all-sport trailblazer. In the other, the chairman of the United States House Permanent Select Committee on Intelligence, who’s been in the news lately for his investigation of Russian involvement in the U.S. election. He knows all sorts of secrets.

If the seminar’s in the statistics department, Babe, no question. For the political science department, it would have to be Adam. But this is a university-wide seminar (inspired by this Latour-fest, remember?), so I think they both have a shot.

]]>*David Shor: “MRP itself is like, a 2009-era methodology.”*

Nope. The first paper on MRP was from 1997. And, even then, the component pieces were not new: we were just basically combining two existing ideas from survey sampling: regression estimation and small-area estimation. It would be more accurate to call MRP a methodology from the 1990s, or even the 1970s.

*Will Cubbison: “that MRP isn’t a magic fix for poor sampling seems rather obvious to me?”*

Yep. We need to work on both fronts: better data collection and better post-sampling adjustment. In practice, neither alone will be enough.

*David Shor: 2012 seems like a perfect example of how focusing on correcting non-response bias and collecting as much data as you can is going to do better than messing around with MRP.*

There’s a misconception here. “Correcting non-response bias” is not an alternative to MRP; rather, MRP is a method for correcting non-response bias. The whole point of the “multilevel” (more generally, “regularization”) in MRP is that it allows us to adjust for more factors that could drive nonresponse bias. And of course we used MRP in our paper where we showed the importance of adjusting for non-response bias in 2012.

And “collecting as much data as you can” is something you’ll want to do no matter what. Yair used MRP with tons of data to understand the 2018 election. MRP (or, more generally, RRP) is a great way to correct for non-response bias using as much data as you can.

Also, I’m not quite clear what was meant by “messing around” with MRP. MRP is a statistical method. We use it, we don’t “mess around” with it, any more than we “mess around” with any other statistical method. Any method for correcting non-response bias is going to require some “messing around.”

In short, MRP is a method for adjusting for nonresponse bias and data sparsity to get better survey estimates. There are other ways of getting to basically the same answer. It’s important to adjust for as many factors as possible and, if you’re going for small-area estimation with sparse data, that you use good group-level predictors.

MRP is a 1970s-era method that still works. That’s fine. Least squares regression is a 1790s-era method, and it still works too! In both cases, we continue to do research to improve and better understand what we’re doing.

]]>Our next contest features two legendary TV variety show hosts who, at the very least, can tell first-hand stories about Elvis Presley, the Beatles, Mel Brooks, Woody Allen, and many others. Should be fun.

The full bracket is here, and here are the rules:

And for our next contest, it’s the Bard of Newark vs. a man who’s only in this contest because it was hard for me to think of 8 people whose name ended in f, whose entire fame comes from the decades-old phrase, “Fresh Prince and DJ Jazzy Jeff.” So whaddya want: riffs on Anne Frank and suburban rabbis, or some classic 80s beats? I dunno. I think Roth would be much more entertaining when question time comes along, but he can’t scratch.

Does anyone know these people? Do they exist or are they spooks?

The full bracket is here, and here are the rules:

]]>We’re trying to pick ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

There are some things I don’t like about their book, and I’ll get to that, but I want to start with a central point of theirs with which I agree strongly.

**Division of labor**

A point that Pearl and Mackenzie make several times, even if not quite in this language, is that there’s a division of labor between qualitative and quantitative modeling.

The models in their book are qualitative, all about the directions of causal arrows. Setting aside any problems I have with such models (I don’t actually think the “do operator” makes sense as a general construct, for reasons we’ve discussed in various places on this blog from time to time), the point is that these are qualitative, on/off statements. They’re “if-then” statements, not “how much” statements.

Statistical inference and machine learning focuses on the quantitative: we model the relationship between measurements and the underlying constructs being measured; we model the relationships between different quantitative variables; we have time-series and spatial models; we model the causal effects of treatments and we model treatment interactions; and we model variation in all these things.

Both the qualitative and the quantitative are necessary, and I agree with Pearl and Mackenzie that typical presentations of statistics, econometrics, etc., can focus way too strongly on the quantitative without thinking at all seriously about the qualitative aspects of the problem. It’s usually all about how to get the answer given the assumptions, and not enough about where the assumptions come from. And even when statisticians write about assumptions, they tend to focus on the most technical and least important ones, for example in regression focusing on the relatively unimportant distribution of the error term rather than the much more important concerns of validity and additivity.

If all you do is set up probability models, without thinking seriously about their connections to reality, then you’ll be missing a lot, and indeed you can make major errors in casual reasoning, as James Heckman, Donald Rubin, Judea Pearl, and many others have pointed out. And indeed Heckman, Rubin, and Pearl have (each in their own way) advocated for substantive models, going beyond data description to latch on to underlying structures of interest.

Pearl and Mackenzie’s book is pretty much all about qualitative models; statistics textbooks such as my own have a bit on qualitative models but focus on the quantitative nuts and bolts. We need both.

Judea Pearl, like Jennifer Hill and Frank Sinatra, are right that “you can’t have one without the other”: If you think you’re working with a purely qualitative model, it turns out that, no, you’re actually making lots of data-based quantitative decisions about which effects and interactions you decide are real and which ones you decide are not there. And if you think you’re working with a purely quantitative model, no, you’re really making lots of assumptions (causal or otherwise) about how your data connect to reality.

**The Book of Why**

Pearl and Mackenzie’s book is really three books woven together:

**1.** An exposition of Pearl’s approach to causal inference based on graphs and the do-operator.

**2.** An intellectual history of this and other statistical approaches to causal inference.

**3.** A series of examples including some interesting discussions of smoking and cancer, going far beyond what you’ll generally see in a popular book or a textbook on statistics or causal inference.

**About the exposition of causal inference**, I have little to say. As regular readers of this blog know, I have difficulty understanding the point of Pearl’s writing on causal inference (see, for example, here). Just as Pearl finds it baffling that statisticians keep taking causal problems and talking about them in the language of statistical models, I find it baffling that Pearl and his colleagues keep taking statistical problems and, to my mind, complicating them by wrapping them in a causal structure (see, for example, here).

I’m *not* saying that I’m right and Pearl is wrong here—lots of thoughtful people find Pearl’s ideas valuable, and I accept that, for many people, Pearl’s approach is a good way—perhaps the best way—to frame causal inference. I’m just saying that I don’t really have anything more to say on the topic.

**About the intellectual history of causal inference**: this is interesting. I disagree with a lot of what Pearl says, but I guess that’s kinda the point, as Pearl is fighting against the statistics establishment, which I’m part of. For example, there’s this from the promotional material that came with the book:

Using a calculus of cause and effect developed by Pearl and others, scientists now have the ability to answer such questions as whether a drug cured an illness, when discrimination is to blame for disparate outcomes, and how much worse global warming can make a heat wave.

Ummm, I’m pretty sure that scientists could do all these without the help of Pearl! Actually, for that last one, I think the physicists don’t really need statisticians at all.

On page 66 of the book, Pearl and Mackenzie write that statistics “became a model-blind data reduction enterprise.” Hey! What the hell are you talking about?? I’m a statistician, I’ve been doing statistics for 30 years, working in areas ranging from politics to toxicology. “Model-blind data reduction”? That’s just bullshit. We use models all the time. There *are* some statisticians who avoid models, or at least there used to be—I used to work in the same department with some of them—but that’s really a minority view within the field. Statisticians use models all the time, statistics textbooks are full of models, and so forth.

The book has a lot more examples along these lines, and I’ll append them to the end of this post.

I think the key line in the Pearl and Mackenzie book comes on page 90, where they write, “Linguistic barriers are not surmounted so easily.” My many long and frustrating exchanges with Pearl have made me realize how difficult it is to have a conversation when you can’t even agree on the topic to be discussed!

Although I was bothered by a lot of Pearl and Mackenzie’s offhand remarks, I could well imagine that their book could be valuable to outsiders who want to get a general picture of causal reasoning and its importance. In some sense I’m too close to the details. The big picture, if you set aside disputes about who did what when, is important, no matter what notation or language is used to frame it.

And that brings me to **the examples** in the book. These are great. I find some of the reasoning hard to follow, and Pearl and Mackenzie’s style is different from mine, but that’s fine. The examples are interesting and they engage the reader—at least, they engage me—and I think they are a big part of what makes the book work.

**Mischaracterizations of statistics and statisticians**

As noted above, Pearl and Mackenzie have a habit of putting down statisticians in a way that seems to reflect ignorance of our field.

On page 356, they write, “Instead of seeing the difference between populations as a threat to the ‘external validity’ of a study, we now have a methodology for establishing validity in situations that would have appeared hopeless before.” No, they would not have appeared hopeless before, at least not to statisticians who knew about regression models with interactions, poststratification, and multilevel models.

Again, on page 357: “the culture of ‘external validity’ is totally preoccupied with listing and categorizing the threats to validity rather than fighting them.” No. You could start with Jennifer’s 2011 paper, for example.

On page 371, they thank Chris Winship, Steven Morgan, and Felix Elwert “for ushering social science into the age of causation.” Look. I think Winship, Morgan, and Elwert are great. I positively reviewed Morgan and Winship’s book. But even they wouldn’t say “they ushered social science into the age of causation.” Social scientists have been doing good work on causation long before Winship, Morgan, and Elwert came along. By writing the above sentence, Pearl and Mackenzie are just gratuitously insulting all the social scientists who came before these people. It’s kind of like when Kingsley Amis sang the praises of Ian Fleming and Dick Francis: that was his poke in the eye to all the other authors out there.

I don’t know which lines of the book were written by Pearl, which by Mackenzie, and which by both. In any case, I find it unfortunate that they feel the need to keep putting down statisticians and social scientists. If they were accurate in their putdowns, I’d have no problem. But that’s not what’s happening here. Kevin Gray makes a similar point here, from the perspective of a non-academic statistician.

Look. I know about the pluralist’s dilemma. On one hand, Pearl believes that his methods are better than everything that came before. Fine. For him, and for many others, they *are* the best tools out there for studying causal inference. At the same time, as a pluralist, or a student of scientific history, we realize that there are many ways to bake a cake. It’s challenging to show respect to approaches that you don’t really work for you, and at some point the only way to do it is to step back and realize that real people use these methods to solve real problems. For example, I think making decisions using p-values is a terrible and logically incoherent idea that’s led to lots of scientific disasters; at the same time, many scientists do manage to use p-values as tools for learning. I recognize that. Similarly, I’d recommend that Pearl recognize that the apparatus of statistics, hierarchical regression modeling, interactions, poststratification, machine learning, etc etc., solves real problems in causal inference. Our methods, like Pearl’s, can also mess up—GIGO!—and maybe Pearl’s right that we’d all be better off to switch to his approach. But I don’t think it’s helping when he gives out inaccurate statements about what we do.

**P.S.** I also noticed a couple of what seem to be technical errors. No big deal, we all make mistakes, and there’s plenty of time to correct them for the second edition.

– Figure 2.3 of the book reproduces Galton’s classic regression of adult children’s heights on the average of parents’ heights. But even though the graph is clearly labeled “mid-parents,” in the caption and the text, Pearl and Mackenzie refer to it as “father’s height.”

Here’s one lesson you can learn from statisticians: Look at the data. (I’m also suspicious of Figure 2.2, as the correlation it shows between fathers’ and sons’ heights looks too large to me. But I could be wrong here; I guess it would be good to know where the data for that graph came from.)

This is not a big deal, but it shows a lack of care. Someone had to write that passage, and there’s no way to get it wrong if you read Galton’s graph carefully. What’s the point of reproducing the graph in your book if you don’t even look at it?

– From the caption of Figure 4.3: “R. A. Fisher with one of his many innovations: a Latin square . . . Such designs are still used in practice, but Fisher would lager argue convincingly that a randomized design is even more effective.”

Huh? A Latin square design *is* randomized. It’s a restricted randomization. Just read any textbook on experimental design. Or maybe I’m missing something here?

**P.P.S.** More from Pearl here.

**P.P.P.S.** And more from Pearl here. He also brings up “cause of effect” questions, a topic that Guido Imbens and I discuss in this paper. I’m not saying that Pearl’s framework is wrong and ours is right—I expect that different approaches will be useful in different problems—I’m just pointing out that these questions can be addressed in the potential outcome framework.

Pearl also writes, “Can one really make progress on a lot of applied problems in causal inference without dealing with identification? Evidently, potential outcome folks think so, at least those in Gelman’s circles.” No, I never said that, not at all! Indeed I stated very clearly in many places, including in this post and its comment thread, that causal identification is necessary for causal inference. The thing I wrote, that Pearl was responding to, was my statement, “The methods that I’ve learned have allowed my colleagues and I to make progress on a lot of applied problems in causal inference . . .” When talking about “the methods that I’ve learned,” of course that includes what I’ve learned about causal identification.

And Pearl writes, “Gelman wants to move identification to separate books . . .” No, I do not want to do this nor did I ever say such a thing. Indeed, my book with Jennifer has three chapters on causal inference, so it can be clearly seen from my actions that I do not want to move causal identification to separate books.

On the plus side, Pearl says he’ll no longer characterize me as being frightened or lacking courage. So that’s a plus.

]]>But did she really?

Paul Campos points us to this post, where he writes:

Here’s a statistical series, laying out various points along the 100 longest known durations of a particular event, of which there are billions of known examples. The series begins with the 100th longest known case:

100th: 114 years 93 days

90th: 114 years 125 days

80th: 114 years 182 days

70th: 114 years 208 days

60th: 114 years 246 days

50th: 114 years 290 days

40th: 115 years 19 days

30th: 115 years 158 days

20th: 115 years 319 days

10th: 116 years 347 days

9th: 117 years 27 days

8th: 117 years 81 days

7th: 117 years 137 days

6th: 117 years 181 days

5th: 117 years 230 days

4th: 117 years 248 days

3rd: 117 years 260 days

Based on this series, what would you expect the second-longest and the longest known durations of the event to be?

These are the maximum verified — or as we’ll see “verified” — life spans achieved by human beings, at least since it began to be possible to measure this with some loosely acceptable level of scientific accuracy . . .

Given the mortality rates observed between ages 114 and 117 in the series above, it would be somewhat surprising if anybody had actually reached the age of 118. Thus it’s very surprising to learn that #2 on the list, an American woman named Sarah Knauss, lived to be 119 years and 97 days. That seems like an extreme statistical outlier, and it makes me wonder if Knauss’s age at death was recorded correctly (I know nothing about how her age was verified).

But the facts regarding the #1 person on the list — a French woman named Jeanne Calment who was definitely born in February of 1875, and was determined to have died in August of 1997 by what was supposedly all sorts of unimpeachable documentary evidence, after reaching the astounding age of 122 years, 164 days — are more than surprising. . . .

A Russian mathematician named Nikolay Zak has just looked into the matter, and concluded that, despite the purportedly overwhelming evidence that made it certain beyond a reasonable doubt that Calment reached such a remarkable age, it’s actually quite likely, per his argument, that Jeanne Calment died in the 1930s, and the woman who for more than 20 years researchers all around the world considered to be the oldest person whose age had been “conclusively” documented was actually her daughter, Yvonne. . . .

I followed the link and read Zak’s article, and . . . I have no idea.

The big picture is that, after age 110, the probability of dying is about 50% per year. For reasons we’ve discussed earlier, I don’t think we should take this constant hazard rate too seriously. But if we go with that, and we start with 100 people reaching a recorded age of 114, we’d expect about 50 to reach 115, 25 to reach 116, 12 to reach 117, 6 to reach 118, 3 to reach 119, etc. . . . so 122 is not at all out of the question. So I don’t really buy Campos’s statistical argument, which all seems to turn on there being a lot of people who reached 117 but not 118, which in turn is just a series of random chances that can just happen.

Although I have nothing to add to the specific question of Jeanne or Yvonne Calment, I do have some general thoughts on this story:

– It’s stunning to me how these paradigm shifts come up, where something that everybody believes is true, is questioned. I’ve been vaguely following discussions about the maximum human lifespan (as in the link just above), and the example of Calment comes up all the time, and I’d never heard anyone suggest her story might be fake. According to Zak, there *had* been some questioning, but it it didn’t go far enough for me to have heard about it.

Every once in awhile we hear about these exciting re-thinkings of the world. Sometimes it seems that turn out to be right (for example, that story about the asteroid collision that indirectly killed the dinosaurs. Or, since we’re on the topic, the story that modern birds are dinosaurs’ descendants). Other times these new ideas seem to have been dead ends (for example, claim that certain discrepancies in sex ratios could be explained by hepatitis). As Joseph Delaney discusses in the context of the latter example, sometimes an explanation can be too convincing, in some way. The challenge is to value paradigm-busting ideas without falling in love with them.

– The Calment example is a great illustration of Bayesian inference. Bayesian reasoning should lead us to be skeptical of Calment’s claimed age. Indeed, as Zak notes, Bayesian reasoning should lead us to be skeptical of *any* claim on the tail of any distribution. Those 116-year-olds and 117-year-olds on Campos’s list above: we should be skeptical of each of them too. It’s just simple probabilistic reasoning: there’s some baseline probability that anyone’s claimed age will be fake, and if the distribution of fake ages has wider tails than the distribution of real ages, then an extreme claimed age is some evidence of an error. The flip side is that there must be some extreme ages out there that we *haven’t* heard about.

– The above discussion also leads to a sort of moral hazard of Bayesian inference: If we question the extreme reported ages without correspondingly researching other ages, we’ll be shrinking our distribution. As Phil and I discuss in our paper, All maps of parameters are misleading, there’s no easy solution to this problem, but we at least should recognize it.

**P.S.** Campos adds:

]]>I hadn’t considered that the clustering at 117 is probably just random, but of course that makes sense. Calment does seem like a massive outlier, and as you say from a Bayesian perspective the fact that she’s such an outlier makes the potential holes in the validation of her age more probable than otherwise. What I don’t understand about the inheritance fraud theory is that Jeanne’s husband lived until 1942, eight years after Jeanne’s hypothesized death. It would be unusual, I think, for French inheritance law not to give a complete exemption to a surviving spouse for any inheritance tax liability (that’s the case in the legal systems I know something about), but I don’t know anything about French inheritance law.

Here’s the bracket:

And now we begin! The first matchup is Jim Thorpe, seeded #1 in the GOATs category, vs. John Oliver, unseeded in the TV personalities category.

This is a tough one. Jim Thorpe is the GOAT of GOATs, arguably the greatest athlete ever lived, and with an interesting personal story as well. On the other hand, John Oliver is an undeniably entertaining speaker.

Remember the rules:

We’re trying to pick ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

It’s your duty as commenters to give the strongest and amusing arguments on both sides. So go to it! As you know, the comments are the fuel on which this blog runs.

]]>- “The Book of Why” by Pearl and Mackenzie

- Reproducibility and Stan

- MRP (multilevel regression and poststratification; Mister P): Clearing up misunderstandings about
- Becker on Bohm on the important role of stories in science

- This is one offer I
*can*refuse

- How post-hoc power calculation is like a shit sandwich
- Storytelling: What’s it good for?

- Freud expert also a Korea expert

- Data partitioning as an essential element in evaluation of predictive properties of a statistical method
- A thought on the hot hand in basketball and the relevance of defense

- A ladder of responses to criticism, from the most responsible to the most destructive

- “Either the results are completely wrong, or Nasa has confirmed a major breakthrough in space propulsion.”
- Moneyball for evaluating community colleges

- Of butterflies and piranhas

- Science as an intellectual “safe space”?
- Just when you thought it was safe to go back into the water . . . SHARK ATTACKS in the Journal of Politics

- One more reason to remove letters of recommendation when evaluating candidates for jobs or scholarships.

- When doing regression (or matching, or weighting, or whatever), don’t say “control for,” say “adjust for”
- If this article portrays things accurately, the nutrition literature is in even worse shape than I thought

- What should JPSP have done with Bem’s ESP paper, back in 2010? Click to find the surprisingly simple answer!

- The bullshit asymmetry principle
- “Objective: Generate evidence for the comparative effectiveness for each pairwise comparison of depression treatments for a set of outcomes of interest.”

- Autodiff! (for the C++ jockeys in the audience)

- Principal Stratification on a Latent Variable (fitting a multilevel model using Stan)
- Of multiple comparisons and multilevel models

- If you want to measure differences between groups, measure differences between groups.

- New estimates of the effects of public preschool
- Facial feedback is back

- The Stan Core Roadmap

- Fitting big multilevel regressions in Stan?
- Fitting multilevel models when the number of groups is small

- Our hypotheses are not just falsifiable; they’re actually false.

- “Using 26,000 diary entries to show ovulatory changes in sexual desire and behavior”
- Votes vs. $

- Should he go to grad school in statistics or computer science?

- Michael Crichton on science and storytelling
- Simulation-based statistical testing in journalism

- More on that horrible statistical significance grid

- P-hacking in study of “p-hacking”?
- “Do you have any recommendations for useful priors when datasets are small?”

- Deterministic thinking meets the fallacy of the one-sided bet

- Are GWAS studies of IQ/educational attainment problematic?
- I believe this study because it is consistent with my existing beliefs.

- Healthier kids: Using Stan to get more information out of pediatric respiratory data

- “News Release from the JAMA Network”
- Kevin Lewis has a surefire idea for a project for the high school Science Talent Search

- HMC step size: How does it scale with dimension?

- Does diet soda stop cancer? Two Yale Cancer Center docs have diametrically opposite views!
- Evidence distortion in clinical trials

- Separated at birth?

- “Light Privilege? Skin Tone Stratification in Health among African Americans”
- George Orwell meets statistical significance: “Politics and the English Language” applied to science

- Good news! Researchers respond to a correction by acknowledging it and not trying to dodge its implications

- “Yes, not only am I suspicious of the claims in that op-ed, I’m also suspicious of all the individual claims from the links in these two sentences”
- Journalist seeking scoops is as bad as scientist doing unreplicable research

- Yes on design analysis, No on “power,” No on sample size calculations

- (back to basics:) How is statistics relevant to scientific discovery?
- A corpus in a single survey!

- The neurostatistical precursors of noise-magnifying statistical procedures in infancy

- Not Dentists named Dennis, but Physicists named Li studying Li
- Remember that paper we wrote, The mythical swing voter? About shifts in the polls being explainable by differential nonresponse? Mark Palko beat us to this idea, by 4 years.

- Political polarization and gender gap

- Junk science + Legal system = Disaster
- Yes, you can include prior information on quantities of interest, not just on parameters in your model

- From the Stan forums: “I’m just very thirsty to learn and this thread has become a fountain of knowledge”

- “No, cardiac arrests are not more common on Monday mornings, study finds”
- One more reason I hate letters of recommendation

- Statistical-significance thinking is not just a bad way to publish, it’s also a bad way to think

- Estimating treatment effects on rates of rare events using precursor data: Going further with hierarchical models.
- Are male doctors better for male heart attack patients and female doctors better for female heart attack patients?

- When and how do politically extreme candidates get punished at the polls?

- He asks me a question, and I reply with a bunch of links
- New golf putting data! And a new golf putting model!

- Balancing rigor with exploration

- Yes, I really really really like fake-data simulation, and I can’t stop talking about it.
- Should we talk less about bad social science research and more about bad medical research?

- Mister P for surveys in epidemiology — using Stan!

- Here’s a puzzle: Why did the U.S. doctor tell me to drink more wine and the French doctor tell me to drink less?
- Surgeon promotes fraudulent research that kills people; his employer, a leading hospital, defends him and attacks whistleblowers. Business as usual.

- A world of Wansinks in medical research: “So I guess what I’m trying to get at is I wonder how common it is for clinicians to rely on med students to do their data analysis for them, and how often this work then gets published”

- An interview with Tina Fernandes Botts
- How to approach a social science research problem when you have data and a couple different ways you could proceed?

- “Boston Globe Columnist Suspended During Investigation Of Marathon Bombing Stories That Don’t Add Up”

- Here’s an idea for not getting tripped up with default priors . . .
- Impact of published research on behavior and avoidable fatalities

- How did our advice about research ethics work out, three years later?

- What’s a good default prior for regression coefficients? A default Edlin factor of 1/2?
- “The Long-Run Effects of America’s First Paid Maternity Leave Policy”: I need that trail of breadcrumbs.

- Question on multilevel modeling reminds me that we need a good modeling workflow (building up your model by including varying intercepts, slopes, etc.) and a good computing workflow

- “Heckman curve” update: The data
*don’t*seem to support the claim that human capital investments are most effective when targeted at younger ages. - Why “bigger sample size” is not usually where it’s at.

- Emile Bravo and agency

- “How Sloppy Science Creates Worthless Cures, Crushes Hope, and Wastes Billions” . . . and still stays around even after it’s been retracted
- Prestigious journal publishes sexy selfie study

- What sort of identification do you get from panel data if effects are long-term? Air pollution and cognition example.

- Treatment interactions can be hard to estimate from data.
- Works of art that are about themselves

- All statistical conclusions require assumptions.

- State-space models in Stan
- The network of models and Bayesian workflow, related to generative grammar for statistical models

- No, its not correct to say that you can be 95% sure that the true value will be in the confidence interval

- Differential effects of research trauma on fatigue and functioning of journal editors in chronic sloppy research syndrome
- Claims about excess road deaths on “4/20” don’t add up

- Here’s a supercool controversy for ya

- Wanted: Statistical success stories
- R-squared for multilevel models

- “Appendix: Why we are publishing this here instead of as a letter to the editor in the journal”

- Ballot order update
- “Incentives to Learn”: How to interpret this estimate of a varying treatment effect?

- Conditioning on post-treatment variables when you expect self-selection

- “How many years do we lose to the air we breathe?” Or not.
- How to think scientifically about scientists’ proposals for fixing science

- Continuing discussion of status threat and presidential elections, with discussion of challenge of causal inference from survey data

- “Boosting intelligence analysts’ judgment accuracy: What works, what fails?”
- “One should always beat a dead horse because the horse is never really dead”

- Olivia Goldhill and Jesse Singal report on the Implicit Association Test

- Automation and judgment, from the rational animal to the irrational machine
- Do regression structures affect research capital? The case of pronoun drop

- Difference-in-difference estimators are a special case of lagged regression

- The Arkansas paradox
- Gremlin time: “distant future, faraway lands, and remote probabilities”

- That illusion where you think the other side is united and your side is diverse

- Maintenance cost is quadratic in the number of features
- Poetry corner

- Bayesian analysis of data collected sequentially: it’s easy, just include as predictors in the model any variables that go into the stopping rule.

- BizStat: Modeling performance indicators for deals
- Scandal! Mister P appears in British tabloid.

- “We see MRP as a way to combine all the data—pre-election voter file data, early voting, precinct results, county results, polling—into a single framework”

- “In 1997 Latanya Sweeney dramatically demonstrated that supposedly anonymized data was not anonymous,” but “Over 20 journals turned down her paper . . . and nobody wanted to fund privacy research that might reach uncomfortable conclusions.”
- On the term “self-appointed” . . .

- Hey, people are doing the multiverse!

- Vigorous data-handling tied to publication in top journals among public heath researchers
- Software for multilevel conjoint analysis in marketing

- Neural nets vs. statistical models

- I’m no expert
- John Le Carre is good at integrating thought and action

- Donald J. Trump and Robert E. Lee

- Pushing the guy in front of the trolley
- Crystallography Corner: The result is difficult to reproduce, but the result is still valid.

- They’re working for the clampdown

- What pieces do chess grandmasters move, and when?
- Let’s publish everything.

- Why edit a journal?

- Should we mind if authorship is falsified?
- Solutions to the 15 questions on our applied regression exam

So, yeah, the usual range of topics.

P.S. I listed the posts in groups of 3 just for easier readability. There’s no connection between the three posts in each batch.

]]>Here’s the full list:

Wits:

Oscar Wilde (seeded 1 in group)

Dorothy Parker (2)

David Sedaris (3)

Voltaire (4)

Veronica Geng

Albert Brooks

Mel Brooks

Monty Python

Creative eaters:

M. F. K. Fisher (1)

Julia Child (2)

Anthony Bourdain (3)

Alice Waters (4)

A. J. Liebling

Nora Ephron

The Japanese dude who won the hot dog eating contest

John Belushi

Magicians:

Harry Houdini (1)

George H. W. Bush (2)

Penn and Teller (3)

Steve Martin (4)

David Blaine

Eric Antoine

Martin Gardner

Ira Glass

Mathematicians:

Carl Friedrich Gauss (1)

Pierre-Simon Laplace (2)

John von Neumann (3)

Alan Turing (4)

Leonhard Euler

Paul Erdos

Stanislaw Ulam

Benoit Mandelbrot

TV personalities:

Oprah Winfrey (1)

Johnny Carson (2)

Ed Sullivan (3)

Carol Burnett (4)

Sid Caesar

David Letterman

Ellen DeGeneres

John Oliver

People from New Jersey:

Bruce Springsteen (1)

Chris Christie (2)

Frank Sinatra (3)

Philip Roth (4)

William Carlos Williams

Virginia Apgar

Meryl Streep

Joe Pesci

GOATs:

Jim Thorpe (1)

Babe Didrikson Zaharias (2)

LeBron James (3)

Bobby Fischer (4)

Serena Williams

Pele

Simone Biles

Lance Armstrong

People whose names end in f:

Riad Sattouf (1)

Ian McKellen (2)

Boris Karloff (3)

Darrell Huff (4)

Yakov Smirnoff

DJ Jazzy Jeff

Adam Schiff

Anastasia Romanoff

**The rules!**

We’re trying to pick ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

Our new list includes eight current or historical figures from each of the eight categories listed above.

I’ll post one matchup each day at noon, starting tomorrow.

Once each pairing is up, all of you can feel free (indeed, are encouraged) to comment. I’ll announce the results when posting the next day’s matchup.

As with our previous contest four years ago, we’re continuing the regular flow of statistical modeling, causal inference, and social science posts. They’ll be in their usual 9-10am slot, alternate with these matchup postings which will appear at noon each day.

Last time we did this competition was a few years ago. See here and here for the first two contests, here for an intermediate round, and here for the conclusion of that one.

I’m stoked for this new tournament. The above bracket features some interesting pairings. Johnny Carson vs. Paul Erdos! Frank Sinatra vs. Virginia Apgar! Carl Friedrich Gauss vs. Nora Ephron! Harry Houdini vs. Yakov Smirnoff! Julia Child vs. Ira Glass! Lots of upsets are possible.

]]>A quick search seems to imply that you haven’t discussed the Fermi equation for a while.

This looks to me to be in the realm of Miller and Sanjurjo: a simple probabilistic explanation sitting right under everyone’s nose. Comment?

“This” is a article, Dissolving the Fermi Paradox, by Anders Sandberg, Eric Drexler and Toby Ord, which begins:

The Fermi paradox is the conflict between an expectation of a high ex ante probability of intelligent life elsewhere in the universe and the apparently lifeless universe we in fact observe. The expectation that the universe should be teeming with intelligent life is linked to models like the Drake equation, which suggest that even if the probability of intelligent life developing at a given site is small, the sheer multitude of possible sites should nonetheless yield a large number of potentially observable civilizations. We show that this conflict arises from the use of Drake-like equations, which implicitly assume certainty regarding highly uncertain parameters. . . . When the model is recast to represent realistic distributions of uncertainty, we find a substantial ex ante probability of there being no other intelligent life in our observable universe . . . This result dissolves the Fermi paradox, and in doing so removes any need to invoke speculative mechanisms by which civilizations would inevitably fail to have observable effects upon the universe.

I solicited thoughts from astronomer David Hogg, who wrote:

I have only skimmed it, but it seems reasonable. Life certainly could be rare, and technological life could be exceedingly rare. Some of the terms do have many-order-of-magnitude uncertainties.

That said, we now know that a large fraction of stars host planets and many host planets similar to the Earth, so the uncertainties on planet-occurrence terms in any Drake-like equation are now much lower than order-of-magnitude.

And Hogg forwarded the question to another astronomer, Jason Wright, who wrote:

The original questioner’s question (Thomas Basbøll’s submission from December) is addressed explicitly here.

In short, only the duration of transmission matters in steady-state, which is the final L term in Drake’s famous equation. Start time does not matter.

Regarding Andrew’s predicate “given that we haven’t hard any such signals so far” in the OP: despite the high profile of SETI, almost no actual searching has occurred because the field is essentially unfunded (until Yuri Milner’s recent support). Jill Tarter analogizes the idea that we need to update our priors based on the searching to date as being equivalent to saying that there must not be very many fish in the ocean based on inspecting the contents of a single drinking glass dipped in it (that’s a rough OOM, but it’s pretty close). And that’s just searches for narrowband radio searches; other kinds of searches are far, far less complete.

And Andrew is not wrong that the amount of popular discussion of SETI has gone way down since the ’90’s. A good account of the rise and fall of government funding for SETI is Garber (1999).

I have what I think is a complete list of NASA and NSF funding since the (final) cancellation of NASA’s SETI work in 1993, and it sums to just over $2.5M (not per year—total). True, Barnie Oliver and Paul Allen contributed many millions more, but most of this went to develop hardware and pay engineers to build the (still incomplete and barely operating) Allen Telescope Array; it did not train students or fund much in the way of actual searches.

So you haven’t heard much about SETI because there’s not much to say. Instead, most of the literature is people in their space time endlessly rearranging, recalculating, reinventing, modifying, and critiquing the Drake Equation, or offering yet another “solution” to the Fermi Paradox in the absence of data.

The central problem is that for all of the astrobiological terms in the Drake Equation we have a sample size on 1 (Earth), and since that one is us we run into “anthropic principle” issues whenever we try to use it to estimate those terms.

The recent paper by Sandberg calculates reasonable posterior distributions on N in the Drake Equation, and indeed shows that they are so wide that N=0 is not excluded, but the latter point has been well appreciated since the equation was written down, so this “dissolution” to the Fermi Paradox (“maybe spacefaring life is just really rare”) is hardly novel. It was the thesis of the influential book Rare Earth and the argument used by Congress as a justification for blocking essentially all funding to the field for the past 25 years.

Actually, I would say that an equally valid takeaway from the Sandberg paper is that very large values of N are possible, so we should definitely be looking for them!

So make of that what you will.

**P.S.** I posted this in July 2018. The search for extraterrestrial intelligence is one topic where I don’t think much is lost in our 6-month blog delay.

Here was our bracket, back in 2015:

And here were the 64 contestants:

– Philosophers:

Plato (seeded 1 in group)

Alan Turing (seeded 2)

Aristotle (3)

Friedrich Nietzsche (4)

Thomas Hobbes

Jean-Jacques Rousseau

Bertrand Russell

Karl Popper

– Religious Leaders:

Mohandas Gandhi (1)

Martin Luther King (2)

Henry David Thoreau (3)

Mother Teresa (4)

Al Sharpton

Phyllis Schlafly

Yoko Ono

Bono

– Authors:

William Shakespeare (1)

Miguel de Cervantes (2)

James Joyce (3)

Mark Twain (4)

Jane Austen

John Updike

Raymond Carver

Leo Tolstoy

– Artists:

Leonardo da Vinci (1)

Rembrandt van Rijn (2)

Vincent van Gogh (3)

Marcel Duchamp (4)

Thomas Kinkade

Grandma Moses

Barbara Kruger

The guy who did Piss Christ

– Founders of Religions:

Jesus (1)

Mohammad (2)

Buddha (3)

Abraham (4)

L. Ron Hubbard

Mary Baker Eddy

Sigmund Freud

Karl Marx

– Cult Figures:

John Waters (1)

Philip K. Dick (2)

Ed Wood (3)

Judy Garland (4)

Sun Myung Moon

Charles Manson

Joan Crawford

Stanley Kubrick

– Comedians:

Richard Pryor (1)

George Carlin (2)

Chris Rock (3)

Larry David (4)

Alan Bennett

Stewart Lee

Ed McMahon

Henny Youngman

– Modern French Intellectuals:

Albert Camus (1)

Simone de Beauvoir (2)

Bernard-Henry Levy (3)

Claude Levi-Strauss (4)

Raymond Aron

Jacques Derrida

Jean Baudrillard

Bruno Latour

We did single elimination, one match per day, alternating with the regular blog posts. See here and here for the first two contests, here for an intermediate round, and here for the conclusion.

**2019 edition**

Who would be the ultimate seminar speaker? I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

Our new list includes eight current or historical figures from each of the following eight categories:

– Wits

– Creative eaters

– Magicians

– Mathematicians

– TV hosts

– People from New Jersey

– GOATs

– People whose names end in f

All these categories seem to be possible choices to reach the sort of general-interest intellectual community that was implied by the [notoriously hyped] announcement of ~~Slavoj Zizek~~ Bruno Latour’s visit to Columbia a few years ago.

**The rules**

I’ll post one matchup each day at noon, starting sometime next week or so, once we have the brackets prepared.

Once each pairing is up, all of you can feel free (indeed, are encouraged) to comment. I’ll announce the results when posting the next day’s matchup.

As with our previous contest four years ago, we’re continuing the regular flow of statistical modeling, causal inference, and social science posts. They’ll alternate with these matchup postings.

]]>– Tips for Active Learning in the College Setting

– Tips for Active Learning in Teacher Prep or in the K-12 Setting

This is great stuff (see my earlier review here).

Every mathematician and math teacher in the universe should read this. So, if any of you happen to be well connected to the math world, please pass this along.

]]>- R-squared for Bayesian regression models. {\em American Statistician}. (Andrew Gelman, Ben Goodrich, Jonah Gabry, and Aki Vehtari)
- Voter registration databases and MRP: Toward the use of large scale databases in public opinion research. {\em Political Analysis}. (Yair Ghitza and Andrew Gelman)
- Limitations of “Limitations of Bayesian leave-one-out cross-validation for model selection.” {\em Computational Brain and Behavior}. (Aki Vehtari, Daniel P. Simpson, Yuling Yao, and Andrew Gelman)
- Post-hoc power using observed estimate of effect size is too noisy to be useful. {\em Annals of Surgery}. (Andrew Gelman)
- Abandon statistical significance. {\em American Statistician}. (Blakeley B. McShane, David Gal, Andrew Gelman, Christian Robert, and Jennifer L. Tackett)
- The statistical significance filter leads to overconfident expectations of replicability. {\em Journal of Memory and Language} {\bf 103}, 151–175. (Shravan Vasishth, Daniela Mertzen, Lena A. Jäger, and Andrew Gelman)
- Large scale replication projects in contemporary psychological research. {\em American Statistician}. (Blakely B. McShane, Jennifer L. Tackett, Ulf Bockenholt, and Andrew Gelman)
- Do researchers anchor their beliefs on the outcome of an initial study? Testing the time-reversal heuristic. {\em Experimental Psychology} {\bf 65}, 158–169. (Anja Ernst, Rink Hoekstra, Eric-Jan Wagenmakers, Andrew Gelman, and Don van Ravenzwaaij)
- Ethics in statistical practice and communication: Five recommendations. {\em Significance}. (Andrew Gelman)
- Bayesian inference under cluster sampling with probability proportional to size. {\em Statistics in Medicine}. (Susanna Makela, Yajuan Si, and Andrew Gelman)
- Yes, but did it work?: Evaluating variational inference. {\em Proceedings of the 35th International Conference on Machine Learning}. (Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman)
- Why high-order polynomials should not be used in regression discontinuity designs. {\em Journal of Business and Economic Statistics}. (Andrew Gelman and Guido Imbens)
- Gaydar and the fallacy of decontextualized measurement. {\em Sociological Science}. (Andrew Gelman, Greggor Matson, and Daniel Simpson)
- Global shifts in the phenological synchrony of species interactions over recent decades. {\em Proceedings of the National Academy of Sciences}. (Heather M. Kharouba, Johan Ehrlén, Andrew Gelman, Kjell Bolmgren, Jenica M. Allen, Steve E. Travers, and Elizabeth M. Wolkovich)
- The Millennium Villages Project: A retrospective, observational, endline evaluation. {\em Lancet Global Health} {\bf 6}. (Shira Mitchell, Andrew Gelman, Rebecca Ross, Joyce Chen, Sehrish Bari, Uyen Kim Huynh, Matthew W. Harris, Sonia Ehrlich Sachs, Elizabeth A. Stuart, Avi Feller, Susanna Makela, Alan M. Zaslavsky, Lucy McClellan, Seth Ohemeng-Dapaah, Patricia Namakula, Cheryl A. Palm, and Jeffrey D. Sachs)

Supplementary appendix. - Don’t calculate post-hoc power using observed estimate of effect size. {\em Annals of Surgery}. (Andrew Gelman)
- Visualization in Bayesian workflow (with discussion). {\em Journal of the Royal Statistical Society A}. (Jonah Gabry, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman)
- Disentangling bias and variance in election polls. {\em Journal of the American Statistical Association}. (Houshmand Shirani-Mehr, David Rothschild, Sharad Goel, and Andrew Gelman)
- Don’t characterize replications as successes or failures. Discussion of “Making replication mainstream,” by Rolf A. Zwaan et al. {\em Behavioral and Brain Sciences}. (Andrew Gelman)
- Using stacking to average Bayesian predictive distributions (with discussion). {\em Bayesian Analysis} {\bf 13}, 917–1003. (Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman)
- Review of {\em New Explorations into International Relations: Democracy, Foreign Investment, Terrorism, and Conflict}, by Seung-Whan Choi. {\em Perspectives on Politics}. (Andrew Gelman)
- Benefits and limitations of randomized controlled trials. Discussion of “Understanding and misunderstanding randomized controlled trials,” by Angus Deaton and Nancy Cartwright. {\em Social Science \& Medicine}. (Andrew Gelman)
- The failure of null hypothesis significance testing when studying incremental changes, and what to do about it. {\em Personality and Social Psychology Bulletin} {\bf 44}, 16–23. (Andrew Gelman)
- Bayesian aggregation of average data: An application in drug development. {\em Annals of Applied Statistics} {\bf 12}, 1583–1604.

(Sebastian Weber, Andrew Gelman, Daniel Lee, Michael Betancourt, Aki Vehtari, and Amy Racine-Poon) - How to think scientifically about scientists’ proposals for fixing science. {\em Socius}. (Andrew Gelman)
- Learning from and responding to statistical criticism. {\em Observational Studies}. (Andrew Gelman)
- Donald Rubin. In {\em Encyclopedia of Social Research Methods}, ed.\ Paul Atkinson, Sara Delamont, Melissa Hardy, and Malcolm Williams. Thousand Oaks, Calif.: Sage Publications. (Andrew Gelman)

Enjoy. They’re listed in approximate reverse chronological order of publication date, so I guess some of the articles at the top of the list will be officially published in 2019.

]]>I would like to ask you for an advice regarding obtaining data for reanalysis purposes from an author who has multiple papers with statistical errors and doesn’t want to share the data.

Recently, I reviewed a paper that included numbers that had some of the reported statistics that were mathematically impossible. As the first author of that paper wrote another paper in the past with one of my collaborators, I have checked their paper and also found multiple errors (GRIM, DF, inappropriate statistical tests, etc.). I have enquired my collaborator about it and she followed up with the first author who has done the analysis and said that he agreed to write an erratum.

Independently, I have checked further 3 papers from that author and all of them had a number of errors, which sheer number is comparable to what was found in Wansink’s case. At that stage I have contacted the first author of these papers asking him about the data for reanalysis purposes. As the email was unanswered, after 2 weeks I have followed up mentioning this time that I have found a number of errors in these papers and included his lab’s contact email address. This time I received a response swiftly and was told that these papers were peer-reviewed so if there were any errors they would have been caught (sic!), that for privacy reasons the data cannot be shared with me and I was asked to send a list of errors that I found. In my response I sent the list of errors and emphasized the importance of independent reanalysis and pointed out that the data comes from lab experiments and any personally identifiable information can be removed as it is not needed for reanalysis. After 3 weeks of waiting, and another email sent in the meantime, the author wrote that he is busy, but had time to check the analysis of one of the papers. In his response, he said that some of the mathematically impossible DFs were wrongly copied numbers, while the inconsistent statistics were due to wrong cells in the excel file selected that supposedly don’t change much. Moreover, he blamed the reviewers for not catching these mistypes (sic!) and said that he found the errors only after I contacted him. The problem is that it is the same paper for which my collaborator said that they checked the results already, so he must have been aware of these problems even before my initial email (I didn’t mention that I know that collaborator).

So here is my dilemma how to proceed. Considering that there are multiple errors, of multiple types across multiple papers it is really hard to trust anything else reported in them. The author clearly does not intend to share the data with me so I cannot verify if the data exists at all. If it doesn’t, as I have sent him the list of errors, he could reverse engineer what tools I have used and come up with numbers that will pass the tests that can be done based solely on the reported statistics.

As you may have more experience dealing with such situations, I thought that I may ask you for an advice how to proceed. Would you suggest contacting the involved publishers, going public or something else?

My reply:

I hate to say it, but your best option here might be to give up. The kind of people who lie and cheat about their published work may also play dirty in other ways. So is it really worth it to tangle with these people? I have no idea about your particular case and am just speaking on general principles here.

You could try contacting the journal editor. Some journal editors really don’t like to find out that they’ve published erroneous work; others would prefer to sweep any such problems under the rug, either because they have personal connections to the offenders or just because they don’t want to deal with cheaters, as this is unpleasant.

Remember: journal editing is a volunteer job, and people sign up for it because they want to publish exciting new work, or maybe because they enjoy the power trip, or maybe out of a sense of duty—but, in any case, they typically aren’t in it for the controversy. So, if you do get a journal editor who can help on this, great, but don’t be surprised if the editors slink away from the problem, for example by putting the burden in your lap by saying that your only option is to submit your critique in the form of an article for the journal, which can then be sent to the author of the original paper for review, and then rejected on the grounds that it’s not important enough to publish.

Maybe you could get Retraction Watch to write something on this dude?

Also is the paper listed on PubPeer? If so, you could comment there.

]]>Mikhail Shubin has this great post from a few years ago on Bayesian visualization. He lists the following principles:

Principle 1: Uncertainty should be visualized

Principle 2: Visualization of variability ≠ Visualization of uncertainty

Principle 3: Equal probability = Equal ink

Principle 4: Do not overemphasize the point estimate

Principle 5: Certain estimates should be emphasized over uncertain

And this caution:

These principles (as any visualization principles) are contextual, and should be used (or not used) with the goals of this visualization in mind.

And this is not just empty talk. Shubin demonstrates all these points with clear graphs.

Interesting how this complements our methods for visualization in Bayesian workflow.

]]>And here’s Part 2. Jordan Anaya reports:

Uli Schimmack posted this on facebook and twitter.

I [Anaya] was annoyed to see that it mentions “a handful” of unreliable findings, and points the finger at fraud as the cause. But then I was shocked to see the 85% number for the Many Labs project.

I’m not that familiar with the project, and I know there is debate on how to calculate a successful replication, but they got that number from none other than the “the replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%” people, as Sanjay Srivastava discusses here.

Schimmack identifies the above screenshot as being from Myers and Twenge (2018); I assume it’s this book, which has the following blurb:

Connecting Social Psychology to the world around us. Social Psychology introduces students to the science of us: our thoughts, feelings, and behaviors in a changing world. Students learn to think critically about everyday behaviors and gain an appreciation for the world around us, regardless of background or major.

But according to Schimmack, there’s “no mention of a replication failure in the entire textbook.” That’s fine—it’s not necessarily the job of an intro textbook to talk about ideas that didn’t work out—but then why mention replications in the first place? And why try to minimize it by talking about “a handful of unreliable findings”? A handful, huh? Who talks like that. This is a “Politics and the English Language” situation, where sloppy language serves sloppy thinking and bad practice.

Also, to connect replication failures to “fraud” is just horrible, as it’s consistent with two wrong messages: (a) that to point out a failed replication is to accuse someone of fraud, and (b) that, conversely, honest researchers can’t have replication failures. As I’ve written a few zillion times, honesty and transparency are not enuf. As I wrote here, it’s a mistake to focus on “p-hacking” and bad behavior rather than the larger problem of researchers expecting routine discovery.

So, the blurb for the textbook says that students learn to think critically about everyday behaviors—but they won’t learn to think critically about published research in the field of psychology.

Just to be clear: I’m *not* saying the authors of this textbook are bad people. My guess is they just want to believe the best about their field of research, and enough confused people have squirted enough ink into the water to confuse them into thinking that the number of unreliable findings really might be just “a handful,” that 85% of experiments in that study replicated, that the replication rate in psychology is statistically indistinguishable from 100%, that elections are determined by shark attacks and college football games, that single women were 20 percentage points more likely to support Barack Obama during certain times of the month, that elderly-priming words make you walk slower, that Cornell students have ESP, etc etc etc. There are lots of confused people out there, not sure where to turn, so it makes sense that some textbook writers will go for the most comforting possible story. I get it. They’re not trying to mislead the next generation of students; they’re just doing their best.

There are no bad guys here.

Let’s just hope 2019 goes a little better.

A good start would be for the authors of this book to send a public note to Uli Schimmack thanking them for pointing out their error, and then replacing that paragraph with something more accurate in their next printing. They could also write a short article for Perspectives on Psychological Science on how they got confused on this point, as this could be instructive for other teachers of psychology. They don’t have to do this. They can do whatever they want. But this is my suggestion how they could get 2019 off to a good start, in one small way.

]]>The topic is the combination of apparently contradictory evidence.

Let’s start with a simple example: you have some ratings on a 1-10 scale. These could be, for example, research proposals being rated by a funding committee, or, umm, I dunno, gymnasts being rated by Olympic judges. Suppose there are 3 judges doing the ratings, and consider two gymnasts: one receives ratings of 8, 8, 8; the other is rated 6, 8, 10. Or, forget about ratings, just consider students taking multiple exams in a class. Consider two students: Amy, whose three test scores are 80, 80, 80; and Beth, who had scores 80, 100, 60. (I’ve purposely scrambled the order of those last three so that we don’t have to think about trends. Forget about time trends; that’s not my point here.)

How to compare those two students? A naive reader of test scores will say that Amy is consistent while Beth is flaky; or you might even say that you think Beth is better as she has a higher potential. But if you have some experience with psychometrics, you’ll be wary of overinterpreting results from three exam scores. Inference about an average from N=3 is tough; inference about *variance* from N=3 is close to impossible. Long story short: from a psychometrics perspective, there’s very little you can say about the relative consistency of Amy and Beth’s test-taking based on just three scores.

Academic researchers will recognize this problem when considering reviews of their own papers that they’ve submitted to journals. When you send in a paper, you’ll typically get a few reviews, and these reviews can differ dramatically in their messages.

Here’s a hilarious example supplied to me by Wolfgang Gaissmaier and Julian Marewski, from reviews of their 2011 article, “Forecasting elections with mere recognition from small, lousy samples: A comparison of collective recognition, wisdom of crowds, and representative polls.”

Here are some positive reviewer comments:

– This is a very interesting piece of work that raises a number of important questions related to public opinion. The major finding — that for elections with large numbers of parties, small non-probability samples looking only at party name recognition do as well as medium-sized probility samples looking at voter intent — is stunning.

– There is a lot to like about this short paper… I’m surprised by the strength of the results… If these results are correct (and I have no real reason to suspect otherwise), then the authors are more than justified in their praise of recognition-based forecasts. This could be an extremely useful forecasting technique not just for the multi-party European elections discussed by the authors, but also in relatively low-salience American local elections.

– This is concise, high-quality paper that demonstrates that the predictive power of (collective) recognition extends to the important domain of political elections.

And now the fun stuff. The negative comments:

– This is probably the strangest manuscript that I have ever been asked to review… Even if the argument is correct, I’m not sure that it tells us anything useful. The fact that recognition can be used to predict the winners of tennis tournaments and soccer matches is unsurprising – people are more likely to recognize the better players/teams, and the better players/teams usually win. It’s like saying that a football team wins 90% (or whatever) of the games in which it leads going into the fourth quarter. So what?

– To be frank, this is an exercise in nonsense. Twofold nonsense. For one thing, to forecast election outcomes based on whether or not voters recognize the parties/candidates makes no sense… Two, why should we pay any attention to unrepresentative samples, which is what the authors use in this analysis? They call them, even in the title, “lousy.” Self-deprecating humor? Or are the authors laughing at a gullible audience?

So, their paper is either “a very interesting piece of work” whose main finding is “stunning”—or it is “an exercise in nonsense” aimed at “a gullible audience.”

]]>Typically, discrete choice modelers develop ever-more advanced models and estimation methods. Compared to the impressive progress in model development and estimation, model-checking techniques have lagged behind. Often, choice modelers use only crude methods to assess how well an estimated model represents reality. Such methods usually stop at checking parameter signs, model elasticities, and ratios of model coefficients. In this paper, I [Brathwaite] greatly expand the discrete choice modelers’ assessment toolkit by introducing model checking procedures based on graphical displays of predictive simulations. . . . a general and ‘semi-automatic’ algorithm for checking discrete choice models via predictive simulations. . . .

He frames model checking in terms of “underfitting,” a connection I’ve never seen before but which makes sense. To the extent that there are features in your data that are not captured in your model—more precisely, features that don’t show up, even in many different posterior predictive simulations from your fitted model—then, yes, the model is underfitting the data. Good point.

]]>I have mused on drafting a simple paper inspired by your paper “Why we (usually) don’t have to worry about multiple comparisons”.

The initial idea is simply to revisit frequentist “weak FWER” or “omnibus tests” (which assume the null everywhere), connecting it to a Bayesian perspective. To do this, I focus on the distribution of the posterior maximum or extrema (not the maximum a posteriori point estimate) of the joint posterior, given a data-set simulated under the omnibus null hypothesis. This joint posterior may be, for example, defined on a set of a priori exchangeable random coefficients in a multilevel model: it’s maxima just encodes my posterior belief in the magnitude of the largest of those coefficients (which “should” be zero for this data) and can be estimated for example by MCMC. The idea is that hierarchical Bayesian extreme values helpfully contract to zero with the number of coefficients in this setting, while non-hierarchical frequentist extreme values increase. The latter being more typically quantified by other “error” parameters such as FWER “multiple comparisons problem” or MSE “overfitting”. Thus, this offers a clear way to show that hierarchical inference can automatically control the (weak) FWER, without Bonferroni-style adjustments to the test threshold. Mathematically, I imagine some asymptotic – in the number of coefficients – argument for this behavior of the maxima, that I would need time or collaboration to formalize (I am not a mathematician by any means). In any case, the intuition is that because posterior coefficients are all increasingly shrunk, so is their maximum. I have chosen to study the maxima because it is applicable across the very different hierarchical and frequentist models used in practice in the fields I work on (imaging, genomics): spatial, cross-sectional, temporal, neither or both. For example, the posterior maximum is defined for a discretely indexed, exchangeable random process, or a continuously-indexed, non-stationary process. As a point of interest, frequentist distribution of spatial maxima is used for standard style multiple-comparisons adjusted p-values in mainstream neuroimaging, e.g. SPM.

I am very keen to learn more about the possible pros or cons of the idea above.

-Its “novelty”

– How it fares relative to alternative Bayesian omnibus “tests”, e.g. based on comparison of posterior model probabilities for an omnibus null model – a degenerate spike prior – versus some credible alternative model.

-How generally it might be formalized.

-How to integrate type II error and bias into the framework.

… and any more!

My reply:

This idea is not really my sort of thing—I’d prefer a more direct decision analysis on the full posterior distribution. But given that many researchers are interested in hypothesis testing but still want to do something better than classical null hypothesis significance testing, I thought there might be interest in these ideas. So I’m sharing them with the blog readership. Comment away!

]]>Funny you should blog about Jaynes. Just a couple of days ago I was looking for something in his book’s References/Bibliography (it along with “Godel, Escher, Bach” and “Darwin’s Dangerous Idea” have bibliographies which I find not just useful but entertaining), and ran across something I wanted to send you but I was going to wait until I could track down a copy of the actual referenced paper. But since Jayne’s is the current topic here the cited work and his comment which I thought might amuse you relating to our previous exchange. From “References”:

Boring, E.G. (1955), ‘The present status of parapsychology’, Am. Sci., 43, 108-16

Concludes that the curious phenomena to be studied is the behavior of parapsychologists. Points out that, having observed any fact, attempts to prove that no natural explanation of it exists are logically impossible; one cannot prove a universal negative (quantum theorist who deny the existence of casual explanations please take note)And just for the record, I’m more comfortable with quantum uncertainty, to the extent I understand it, than Jaynes. And I don’t fully agree about not being able to prove a negative. The ancient Greeks proved long again that there’s no largest prime number. I guess you just have to be careful about how you define the negative.

Amusing, and of course it relates to some of our recent discussions about unreplicable work in the social and behavioral sciences, including various large literatures which seem to be based on little more than the shuffling of noise, the ability of certain theories to explain any possible patterns in data, and the willingness of journals to publish any sort of junk as long as it combines an attractive storyline with “p less than 0.05.”

It’s only been 63 years, I guess no reason to expect much progress!

]]>Probability is a mathematical concept. I think Martha Smith’s analogy to points, lines, and arithmetic is a good one. Probabilities are probabilities to the extent that they follow the Kolmogorov axioms. (Let me set aside quantum probability for the moment.) The different definitions of probabilities (betting, long-run frequency, etc), can be usefully thought of as models rather than definitions. They are different examples of paradigmatic real-world scenarios in which the Kolmogorov axioms (thus, probability).

Probability is a mathematical concept. To define it based on any imperfect real-world counterpart (such as betting or long-run frequency) makes about as much sense as defining a line in Euclidean space as the edge of a perfectly straight piece of metal, or as the space occupied by a very thin thread that is pulled taut. Ultimately, a line is a line, and probabilities are mathematical objects that follow Kolmogorov’s laws. Real-world models are important for the application of probability, and it makes a lot of sense to me that such an important concept has many different real-world analogies, none of which are perfect.

We discuss some of these different models in chapter 1 of BDA.

**P.S.** There’s been some discussion and I’d like to clarify my key point, why I wrote this post. My concern is that I’ve read lots of articles and books that claim to give the single correct foundation of probability, which might be uncertainty, betting, or relative frequency, or coherent decision making, or whatever. My point is that none of these frameworks is *the* foundation of probability; rather, probability is a mathematical concept which applies to various problems, including long-run frequencies, betting, uncertainty, decision making, statistical inference, etc. In practice, probability is not a perfect model for any of these scenarios: long-run frequencies are in practice not stationary, betting depends on your knowledge of the counterparty, uncertainty includes both known and unknown unknowns, decision making is open-ended, and statistical inference is conditional on assumptions that in practice will be false. That said, probability can be a useful tool for all these problems.

Loss aversion is a core principle of the heuristics and biases paradigm of psychology and behavioral economics.

But it’s been controversial for a long time.

For example, back in 2005 I wrote about the well-known incoherence that people express when offered small-scale bets. (“If a person is indifferent between [x+$10] and [55% chance of x+$20, 45% chance of x], for any x, then this attitude cannot reasonably be explained by expected utility maximization. The required utility function for money would curve so sharply as to be nonsensical (for example, U($2000)-U($1000) would have to be less than U($1000)-U($950)).”)

When Matthew Rabin and I had (separately) published papers about this in 1998 and 2000, we’d attributed the incoherent risk-averse attitude at small scales to “loss aversion” and “uncertainty aversion.” But, as pointed out by psychologist Deb Frisch, it can’t be loss aversion, as the way the problem is set up above, no losses are involved. I followed up that “uncertainty aversion” could be logically possible but I didn’t find that labeling so convincing either; instead:

I’m inclined to attribute small-stakes risk aversion to some sort of rule-following. For example, it makes sense to be risk averse for large stakes, and a natural generalization is to continue that risk aversion for payoffs in the $10, $20, $30 range. Basically, a “heuristic” or a simple rule giving us the ability to answer this sort of preference question.

By the way, I’ve used the term “attitude” above, rather than “preference.” I think “preference” is too much of a loaded word. For example, suppose I ask someone, “Do you prefer $20 or [55% chance of $30, 45% chance of $10]?” If he or she says, “I prefer the $20,” I don’t actually consider this any sort of underlying preference. It’s a response to a question. Even if it’s set up as a real choice, where they really get to pick, it’s just a preference in a particular setting. But for most of these studies, we’re really talking about attitudes.

The topic came up again the next year, in the context of the (also) well-known phenomenon that, when it comes to political attitudes about the government, people seem to respond to the trend rather than the absolute level of the economy. Again, I felt that terms such as “risk aversion” and “loss aversion” were being employed as all-purpose explanations for phenomena that didn’t really fit these stories.

And then, in the midst of all that, David Gal published an article, “A psychological law of inertia and the illusion of loss aversion,” in the inaugural issue of the Journal of Judgment and Decision Making, saying:

The principle of loss aversion is thought to explain a wide range of anomalous phenomena involving tradeoffs between losses and gains. In this article, I [Gal] show that the anomalies loss aversion was introduced to explain — the risky bet premium, the endowment effect, and the status-quo bias — are characterized not only by a loss/gain tradeoff, but by a tradeoff between the status-quo and change; and, that a propensity towards the status-quo in the latter tradeoff is sufficient to explain these phenomena. Moreover, I show that two basic psychological principles — (1) that motives drive behavior; and (2) that preferences tend to be fuzzy and ill-defined — imply the existence of a robust and fundamental propensity of this sort. Thus, a loss aversion principle is rendered superfluous to an account of the phenomena it was introduced to explain.

I’d completely forgotten about this article until learning recently of a new review article by Gal and Derek Rucker, “The Loss of Loss Aversion: Will It Loom Larger Than Its Gain?”, making this point more thoroughly:

Loss aversion, the principle that losses loom larger than gains, is among the most widely accepted ideas in the social sciences. . . . The upshot of this review is that current evidence does not support that losses, on balance, tend to be any more impactful than gains.

But if loss aversion is unnecessary, why do psychologists and economists keep talking about it? Gal and Rucker write:

The third part of this article aims to address the question of why acceptance of loss aversion as a general principle remains pervasive and persistent among social scientists, including consumer psychologists, despite evidence to the contrary. This analysis aims to connect the persistence of a belief in loss aversion to more general ideas about belief acceptance and persistence in science.

In Table 1 of their paper, Gal and Rucker consider several phenomena, all of which are taken to provide evidence of loss aversion, can be easily explained in other ways. Here are the phenomena they talk about:

– Status quo bias

– Endowment effect

– Risky bet premium

– Hedonic impact ratings

– Sunk cost effect

– Price elasticity

– Equity risk premium

– Disposition effect

– Loss/gain framing.

The article also comes with discussions by Tory Higgins and Nira Liberman and Itamar Simonson and Ran Kivetz and rejoinder by Gal and Rucker.

]]>