The post Oh, it’s so frustrating when you’re trying to help someone out, and then you realize you’re dealing with a snake. appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>And then you find out you’re dealing with a scammer. Maybe it’s a couple of ambitious researchers who don’t really want your help at all, they’re just yanking your chain. Or maybe, if you’re journalist Charles Seife, it’s the president of a rich foundation, a guy who for reasons unbeknownst to us, seems set on investing his foundation’s assets in a company founded by (as Seife puts it) someone with “a very standard profile for a pseudoscientific huckster and a very rare profile for a true scientific innovator.” I came across this one via a link from Retraction Watch.

Seife is pretty annoyed that the president of that foundation was pumping him for advice, not because he cared about what Seife would say, but as part of a “continuing effort to obtain a positive outside review to use with our board and potential investors.” As the foundation president wrote to the potential recipient of his money, “I’ll keep trying. Negative responses I will just share with you.”

That’s selection bias for you!

But my guess is that what really steamed Seife—at least, it’s what would annoy me the most—is that this foundation president was acting like he wanted Seife’s advice when all he wanted was an endorsement. If the guy had just been open about his desire for a positive outside review, then Seife could’ve responded appropriately, yes or no depending on his take on the whole thing. But nobody likes to be set up to be a sap.

**P.S.** The whole biased-advice thing is another story. I’ve done legal consulting where my opinion is being asked for, and it’s pretty clear to me that if I give a positive opinion they will hire me and if I give a negative opinion they won’t (although presumably they’ll pay me for the time it took for me to form my negative judgment). Beyond the moral hazard here, there’s also the selection issue that they can hire 10 experts and then go with the opinion they like the most. But somehow that doesn’t bother me so much. Maybe because I get paid, maybe because the people involved seem more open about what they are doing.

The post Oh, it’s so frustrating when you’re trying to help someone out, and then you realize you’re dealing with a snake. appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Documenting a class-participation activity appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Also lots of details on the results. The point here is not exactly what happened (but, yes, the demo did work) but rather the idea that you can use photos and graphs to document what worked in class. We should be able to do this sort of scrapbooking all the time as teachers.

Next time: take some photos of the kids in class doing the activities, too (assuming that’s ok with them).

The post Documenting a class-participation activity appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Economics/sociology phrase book appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Along these lines, I’ve always been bothered by economists’ phrase “willingness to pay” which, in practice, often means “ability to pay.”

And, of course, “earnings” which means “how much money you make.”

But, to be fair, statisticians have some of these too. For example, in psychometrics we use the term “ability” to refer to the very specific ability to get certain questions correct on a test.

The post Economics/sociology phrase book appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Cognitive vs. behavioral in psychology, economics, and political science appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The starting point is “behavioral economics,” also known as the “heuristics and biases” subfield of cognitive psychology. It’s associated with various studies of cognitive illusions, settings where people systematically mispredict uncertain events or make decisions. Within psychology, this work is generally accepted but with some controversy which could be summed up in the phrase, “Kahneman versus Gigerenzer,” but it’s my impression that in recent years there’s been a bit of a convergence: for Kahneman the glass is half-empty and for Gigerenzer the glass is half-full, but whether you’re talking about “heuristics and biases” or “fast and frugal decision making,” there’s been a focus on understanding how our brains use contextual cues to decide how to solve a problem.

In economics, this work is more disputed because it seems to be in head-on conflict with models of utility-maximizing rationality from the 1930s-50s associated with the theories of Neumann and others on economic decision making. While some economists have embraced so-called “behavioral” ideas to explain imperfect markets, other economists are (a) skeptical about the relevance to real-world high-stakes behavior of laboratory findings on cognitive illusions and (b) wary of the political implications of social engineers who want to use cognitive biases to “nudge” people toward behavior they otherwise wouldn’t have done.

Within economics, I’d say that the behavioral/classical debate roughly follows left/right lines: on the left are the behaviorists who say that individuals and firms are irrational and thus we should not trust the judgment of the markets, instead we should regulate and protect people from their irrationality. On the right are the classicists who hold that people are rational when it comes to real economic decisions and thus any interference in the market, whether from governments or labor unions, will tend to make things worse.

The conservative position has some difficulties when dealing with customs and culture and roles and various non-governmental constraints on economic behavior: from one sort of conservative perspective these are unnecessary restrictions on the economy, silly traditions that rule-breaking entrepreneurs will shatter; from another conservative perspective, these traditions represent collective wisdom and we should be wary of reformers who try to start anew without recognizing that traditions are traditions for a good reason. But for now I will set all this aside and focus on the question of behavioral economics.

Step aside from economics for a moment, though, and things look a little different. Instead of thinking of “heuristics and biases” or “behavioral economics” in opposition to simplistic models of rationality (I’ve said it before and I’ll say it again; I see no reason why a long-discredited psychology model from the 1930s and 1940s should be taken as any sort of starting point for understanding human decision making; utility theory is, at best, one framework for such modeling), and put this work in a more general context of disparagement of human decision making.

To put it another way, think about “behavioral economics” not so much as “economics” but as “behavioral.” From a psychology point of view, behaviorism is a nearly century-old theory that was in many ways superseded by cognitive psychology. And, in many ways, “behavioral economics” is a sort of counter-revolution: it’s full of tropes under which people are doing things for irrational reasons, in which actions speak louder than words etc.

The full story here is complicated but one reason I think these ideas are popular in neoclassical economics is that they are, in some sense, anti-democratic. If people’s votes are determined based on the time of the menstrual cycle or on the outcomes of college football games, then elections are pretty silly, no? Which is an implicit argument in favor of lower taxes and more power for business, as compared to government (or, for that matter, unions).

This becomes particularly clear when we look at work along these lines in political science. If, for example, subliminal smiley faces have big effects on political attitudes, then this should cause us to think twice about how seriously to take such attitudes, no? Or if men’s views on economic redistribution are in large part determined by physical strength, or if women’s vote preferences are in large part determined by what time of the month it is, or if both sexes’ choice to associate with co-partisans is in large part determined by how they smell, then this calls into question a traditional civics-class view of the will of the people.

Luckily (or, perhaps, depending on your view, unluckily), the evidence for the empirical claims in the above paragraphs ranges from weak to nonexistent.

But my point is that there is a wave of research, coming from different directions, but all basically saying that our political attitudes are shallow and easily manipulated and thus, implicitly, not to be trusted. I don’t find this evidence convincing and, beyond this, I’m troubled by the eagerness some people seem to show to grab on to such claims, with their ultimately anti-democratic implications.

Let’s be clear here, though: I do have a dog in this fight, as the saying goes. In 1993, Gary King and I published an influential paper claiming that wide swings in the polls, swings that had often been taken as evidence of the capriciousness of voters or of their easily-manipulated nature, could be reinterpreted as evidence in favor of voters moving to their “enlightened preferences.” And then, more recently, David Rothschild, Sharad Goel, Doug Rivers and I updated this argument by providing evidence that some poll swings can be mostly explained by differential nonresponse without any large attitude changes. I’ve published work (with Aaron Edlin and Noah Kaplan) arguing why voting can be rational. And I’ve worked with Jeff Lax and Justin Phillips on their series of papers on the responsiveness of state legislators to state-level opinion. In my research I’ve been strongly committed, in many different ways, to the model in which voter preferences and attitudes should be taken seriously. So it would be fair enough to read my resistance to voters-are-influenced-by-irrelevant-stimuli arguments in that context. I’m providing you with my perspective, but I recognize that other perspectives are out there.

What I’m getting at is that I see a common thread in a lot of the counterintuitive, tabloid, Psychological-Science-type work out there, and that thread is a dismissal of human rationality and even human agency in the political (and, to some extent, the economic) arena. Here I’m speaking of “rationality” not in the limited sense of utility maximization but in the more general sense of thoughtful, purposeful decision making.

In the “Psychological Science” world, voters’ attitudes are determined by upper-body strength and the time of the month, their attitudes on important issues are influenced by meaningless subliminal stimuli, and their elections turn on the outcomes of late-October football games, and they flub any decisions involving uncertainty. Throw the words “Florida” and “bingo” at them and they walk slower, without even realizing why, they’re influenced by stereotype threat even without realizing it, and even their choice of clothing is not under their conscious control. Put it all together and you get a pre-cognitive conception of the citizen: not a man or woman who weighs the evidence, forms political views, and makes economic and political decisions, but a creature who is continually pushed to and fro by influences of which he or she is not even aware, an unstable product of hormones and the manipulators of political and social marketers, a sort of particle in the water being jostled by invisible Brownian forces.

Let me repeat that the evidence for many of these claims is weak, indeed I have the feeling that a lot of people want to believe in these things so they grab on to whatever “p less than .05″ comparisons they find, and take them as representative of the general population, as scientific truth. On the other hand, I perhaps am coming from the opposite direction.

What I’m getting at is that there’s a political theme here, and also a scientific theme: I see a lot (although not all!) of this “behavioral” work as being *behaviorist* in the sense of being faithful to a pre-cognitive, and pre-modern conception of psychology.

The cognitive-psychology perspective, as I see it, is that we *are* thinking beings, and to the extent that we are influenced in irrational ways (whether by hormones, or subliminal marketing, or whatever), we mediate these influences through our thought processes. One reason I found the work of Cengiz Erisen so interesting (even while I disagreed with Larry Bartels’s more dramatic claims for the importance of that work) is that Erisen was not just treating his subliminal stimulus as a black box but rather was investigating how our conscious reasoning process might mediate the effects of a non-rational stimulus. In that particular case, the stimulus had no consistent effect on attitudes but I like the general approach of the study.

The post Cognitive vs. behavioral in psychology, economics, and political science appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Six quick tips to improve your regression modeling appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>**A.1. Fit many models**

Think of a series of models, starting with the too-simple and continuing through to the hopelessly messy. Generally it’s a good idea to start simple. Or start complex if you’d like, but prepare to quickly drop things out and move to the simpler model to help understand what’s going on. Working with simple models is not a research goal—in the problems we work on, we usually find complicated models more believable—but rather a technique to help understand the fitting process.

A corollary of this principle is the need to be able to fit models relatively quickly. Realistically, you don’t know what model you want to be fitting, so it’s rarely a good idea to run the computer overnight fitting a single model. At least, wait until you’ve developed some understanding by fitting many models.

**A.2. Do a little work to make your computations faster and more reliable**

This sounds like computational advice but is really about statistics: if you can fit models faster, you can fit more models and better understand both data and model. But getting the model to run faster often has some startup cost, either in data preparation or in model complexity.

*Data subsetting* . . .

*Fake-data and predictive simulation* . . .

**A.3. Graphing the relevant and not the irrelevant**

*Graphing the fitted model*

Graphing the data is fine (see Appendix B) but it is also useful to graph the estimated model itself (see lots of examples of regression lines and curves throughout this book). A table of regression coefficients does not give you the same sense as graphs of the model. This point should seem obvious but can be obscured in statistical textbooks that focus so strongly on plots for raw data and for regression diagnostics, forgetting the simple plots that help us understand a model.

*Don’t graph the irrelevant*

Are you sure you really want to make those quantile-quantile plots, influence dia- grams, and all the other things that spew out of a statistical regression package? What are you going to do with all that? Just forget about it and focus on something more important. A quick rule: any graph you show, be prepared to explain.

**A.4. Transformations**

Consider transforming every variable in sight:

• Logarithms of all-positive variables (primarily because this leads to multiplicative models on the original scale, which often makes sense)

• Standardizing based on the scale or potential range of the data (so that coefficients can be more directly interpreted and scaled); an alternative is to present coefficients in scaled and unscaled forms

• Transforming before multilevel modeling (thus attempting to make coefficients more comparable, thus allowing more effective second-level regressions, which in turn improve partial pooling).

Plots of raw data and residuals can also be informative when considering transformations (as with the log transformation for arsenic levels in Section 5.6).

In addition to univariate transformations, consider interactions and predictors created by combining inputs (for example, adding several related survey responses to create a “total score”). The goal is to create models that could make sense (and can then be fit and compared to data) and that include all relevant information.

**A.5. Consider all coefficients as potentially varying**

Don’t get hung up on whether a coefficient “should” vary by group. Just allow it to vary in the model, and then, if the estimated scale of variation is small (as with the varying slopes for the radon model in Section 13.1), maybe you can ignore it if that would be more convenient.

Practical concerns sometimes limit the feasible complexity of a model—for example, we might fit a varying-intercept model first, then allow slopes to vary, then add group-level predictors, and so forth. Generally, however, it is only the difficulties of fitting and, especially, understanding the models that keeps us from adding even more complexity, more varying coefficients, and more interactions.

**A.6. Estimate causal inferences in a targeted way, not as a byproduct of a large regression**

Don’t assume that a regression coefficient can be interpreted causally. If you are interested in causal inference, consider your treatment variable carefully and use the tools of Chapters 9, 10, and 23 to address the difficulties of comparing comparable units to estimate a treatment effect and its variation across the population. It can be tempting to set up a single large regression to answer several causal questions at once; however, in observational settings (including experiments in which certain conditions of interest are observational), this is not appropriate, as we discuss at the end of Chapter 9.

The post Six quick tips to improve your regression modeling appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post First day of class update appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>My first-day-of-class activity was ok but I think I needed another activity for the students, something more statistical, to better set the tone of the course.

I think I should’ve given them a 10-minute work-in-pairs activity where I’d first give them some real-world problem and then ask them, in pairs, to design a study to address it. The problem could be anything: it could be to assess Ebola risks or answer questions about political ideology and personality, or even to design a plan for assessing the effectiveness of this course that they’re taking. Just something that would get them communicating, but also thinking about statistics in some detail. Not just talking generally about the cool problems they’re working on or are interested in, but some attempt to get into details.

We could do this next class, of course, but I already have things planned. So maybe this will have to wait until the next time I teach the course.

The post First day of class update appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Just in case appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Also:

My Amtrak is rescheduled and it is scheduled to arrive in Boston at 4:35. This should give me plenty of time to get to class on time, but Amtrak is sometimes delayed. So if class begins and I am not there yet, please start without me!

If I’m not there, please do the following:

- Get to the room 10 minutes early. Before class begins, chat with the students as they are coming in. You can talk about any topic, as long as it’s statistical: tell them about your qualifying exam, or discuss how to express uncertainty in weather forecasts, or talk about the Celtics (ha ha). No need to be lecturing here, just get them on track, thinking and talking about statistics. Also during that time, please get the projector set up so that, when I do arrive, I can plug in my laptop and be all ready to go.

- Once class begins (I don’t remember the convention at Harvard; will it start exactly at the scheduled time, or 5 minutes later?), start right away with a statistics story. I have stories of my own prepared, but if I’m not there, you can do one yourself. Prepare something; feel free to use the blackboard. It doesn’t have to be a long story; 5 or 10 minutes will be fine.

- Then write the following on the blackboard: “(a) Say something about yourself or your work in relation to statistics, (b) Why are you in this class?”

- Have the students divide into pairs. In pairs, they meet each other:

(3 min) A talks to B

(2 min) B asks a question to A, and A responds

(3 min) B talks to A

(2 min) A asks a question to B, and B responds

They are supposed to be talking to each other about their work in relation to statistics.

- If not all the students fit in the room, that’s not really a problem; you can have the overflow people in the lounge area, doing the same thing.

Once the students have done the intros in pairs, take a few volunteers (or, if there are no volunteers, pick some students and ask them to pick other students) to stand up and answer questions (a) and (b) above. Use these to lead the class into discussions that loop around to consider the relevance and different varieties of statistical communication.

Really, this can take all the class period. But I assume that at some point I’ll arrive—how delayed could Amtrak be, after all?? I just wanted to give you some contingency plan so that nobody has to worry if it’s 6:25 and I’m still not there.

Thanks!

See you

Andrew

**P.S.** Here’s what happened.

The post Just in case appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post About a zillion people pointed me to yesterday’s xkcd cartoon appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>and this:

(which I copied from Wikipedia, except that, unlike you-know-who, I didn’t change the n’s to d’s and remove the superscripting).

Either way, I don’t buy the numbers, and I certainly don’t buy the words that go with them.

I do admit, though, to *using* the phrase “statistically significant.” It doesn’t mean so much, but, within statistics, everyone knows what it means, so it’s convenient jargon.

**P.S.** Kruschke had a similar reaction.

The post About a zillion people pointed me to yesterday’s xkcd cartoon appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Crowdsourcing data analysis: Do soccer referees give more red cards to dark skin toned players? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Twenty-nine teams involving 61 analysts used the same data set to address the same research questions: whether soccer referees are more likely to give red cards to dark skin toned players than light skin toned players and whether this relation is moderated by measures of explicit and implicit bias in the referees’ country of origin. Analytic approaches varied widely across teams. For the main research question, estimated effect sizes ranged from 0.89 to 2.93 in odds ratio units, with a median of 1.31. Twenty teams (69%) found a significant positive effect and nine teams (31%) observed a nonsignificant relationship. The causal relationship however remains unclear. No team found a significant moderation between measures of bias of referees’ country of origin and red card sanctionings of dark skin toned players. Crowdsourcing data analysis highlights the contingency of results on choices of analytic strategy, and increases identification of bias and error in data and analysis. Crowdsourcing analytics represents a new way of doing science; a data set is made publicly available and scientists at first analyze separately and then work together to reach a conclusion while making subjectivity and ambiguity transparent.

The post Crowdsourcing data analysis: Do soccer referees give more red cards to dark skin toned players? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “It is perhaps merely an accident of history that skeptics and subjectivists alike strain on the gnat of the prior distribution while swallowing the camel that is the likelihood” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Younger readers of this journal may not be fully aware of the passionate battles over Bayesian inference among statisticians in the last half of the twentieth century. During this period, the missionary zeal of many Bayesians was matched, in the other direction, by a view among some theoreticians that Bayesian methods are absurd—not merely misguided but obviously wrong in principle. Such anti-Bayesianism could hardly be maintained in the present era, given the many recent practical successes of Bayesian methods. But by examining the historical background of these beliefs, we may gain some insight into the statistical debates of today. . . .

The whole article is just great. I love reading my old stuff!

Also we were lucky to get several thoughtful discussions:

“Bayesian Inference: The Rodney Dangerfield of Statistics?” — Steve Stigler

“Bayesian Ideas Reemerged in the 1950s” — Steve Fienberg

“Bayesian Statistics in the Twenty First Century” — Wes Johnson

“Bayesian Methods: Applied? Yes. Philosophical Defense? In Flux” — Deborah Mayo

And our rejoinder, “The Anti-Bayesian Moment and Its Passing.”

Good stuff.

The post “It is perhaps merely an accident of history that skeptics and subjectivists alike strain on the gnat of the prior distribution while swallowing the camel that is the likelihood” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “The Statistical Crisis in Science”: My talk this Thurs at the Harvard psychology department appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The Statistical Crisis in Science

Andrew Gelman, Dept of Statistics and Dept of Political Science, Columbia University

Top journals in psychology routinely publish ridiculous, scientifically implausible claims, justified based on “p < 0.05.” And this in turn calls into question all sorts of more plausible, but not necessarily true, claims, that are supported by this same sort of evidence. To put it another way: we can all laugh at studies of ESP, or ovulation and voting, but what about MRI studies of political attitudes, or embodied cognition, or stereotype threat, or, for that matter, the latest potential cancer cure? If we can’t trust p-values, does experimental science involving human variation just have to start over? And what to we do in fields such as political science and economics, where preregistered replication can be difficult or impossible? Can Bayesian inference supply a solution? Maybe. These are not easy problems, but they’re important problems.

Here are the slides from the last time I gave this talk, and here are some relevant articles:

[2014] Beyond power calculations: Assessing Type S (sign) and Type M (magnitude) errors. {\em Perspectives on Psychological Science} {\bf 9}, 641–651. (Andrew Gelman and John Carlin)

[2013] It’s too hard to publish criticisms and obtain data for replication. {\em Chance} {\bf 26} (3), 49–52. (Andrew Gelman)

[2012] P-values and statistical practice. {\em Epidemiology}. (Andrew Gelman)

The post “The Statistical Crisis in Science”: My talk this Thurs at the Harvard psychology department appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post The (hypothetical) phase diagram of a statistical or computational method appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>So here’s the deal. You have a new idea, call it method C, and you try it out on problems X, Y, and Z and it works well—it destroys the existing methods A and B. And then you publish a paper with the pithy title, Method C Wins. And, hey, since we’re fantasizing here anyway, let’s say you want to publish the paper in PPNAS.

But reviewers will—and should—have some suspicions. How great can your new method really be? Can it really be that methods A and B, which are so popular, have nothing to offer anymore?

Instead give a sense of the bounds of your method. Under what conditions does it win, and under what conditions does it not work so well?

In the graph above, “Dimension 1″ and “Dimension 2″ can be anything, they could be sample size and number or parameters, or computing time and storage cost, or bias and sampling error, whatever. The point is that a method can be applied under varying conditions. And, if a method is great, what that really means is that it works well under a wide range of conditions.

So, make that phase diagram. Even if you don’t actually draw the graph or even explicitly construct a definition of “best,” you can keep in mind the idea of exploring the limitations of your method, coming up with places where it doesn’t perform so well.

The post The (hypothetical) phase diagram of a statistical or computational method appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post On deck this week appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>**Tues:** “It is perhaps merely an accident of history that skeptics and subjectivists alike strain on the gnat of the prior distribution while swallowing the camel that is the likelihood”

**Wed:** Six quick tips to improve your regression modeling

**Thurs:** “Another bad chart for you to criticize”

**Fri:** Cognitive vs. behavioral in psychology, economics, and political science

**Sat:** Economics/sociology phrase book

**Sun:** Oh, it’s so frustrating when you’re trying to help someone out, and then you realize you’re dealing with a snake

The post On deck this week appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Tell me what you don’t know appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Why? Consider your understanding of a particular topic to be divided into three parts:

1. What you know.

2. What you don’t know.

3. What you don’t know you don’t know.

If you ask someone about 1, you get some sense of the boundary between 1 and 2.

But if you ask someone about 2, you implicitly get a lot of 1, you get a sense of the boundary between 1 and 2, and you get a sense of the boundary between 2 and 3.

As my very rational friend Ginger says: More information is good.

The post Tell me what you don’t know appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Postdoc opportunity here, with us (Jennifer Hill, Marc Scott, and me)! On quantitative education research!! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>As many of you are aware, education research is both important and challenging. And, on the technical level, we’re working on problems in Bayesian inference, multilevel modeling, survey research, and causal inference.

There are various ways that you can contribute as a postdoc: You can have a PhD in psychometrics or education research, and this is your chance to go in depth with statistical inference and computation, or maybe you can do all sorts of Bayesian computation and you’d like to move into education research. We’re looking for top people to join our team.

If you’re interested, send me an email with a letter describing your qualifications and reason for applying, a C.V., and at least one article you’ve written, and have three letters of recommendation sent to me. All three of us (Jennifer, Marc, and I) will evaluate the applications.

We have openings for two 2-year postdocs. As per federal government regulations, candidates must be United States citizens or permanent residents.

The post Postdoc opportunity here, with us (Jennifer Hill, Marc Scott, and me)! On quantitative education research!! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “What then should we teach about hypothesis testing?” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Last week, I was looking forward to a blog post titled “Why continue to teach and use hypothesis testing?” I presume that this scheduled post merely became preempted by more timely posts. But I am still interested in reading the exchange that will follow.

My feeling is that we might have strong reservations about the utility of NHST [null hypothesis significance testing], but realize that they aren’t going away anytime soon. So it is important for students to understand what information other folks are trying to convey when they report their p-values, even if we would like to encourage them to use other frameworks (e.g. a fully Bayesian decision theoretic approach) in their own decision making.

So I guess the next question is, what then should we teach about hypothesis testing? What proportion of the time in a one semester upper level course in Mathematical Statistics should be spent on the theory and how much should be spent on the nuance and warnings about misapplication of the theory? These are questions I’d be interested to hear opinions about from you and your thoughtful readership.

A related question I have is on the “garden of forking paths” or “researcher degrees of freedom”. In applied research, do you think that “tainted” p-values are the norm, and that editors, referees, and readers basically assume some level of impurity of reported p-values?

I wonder, because it seems, if applied statistics textbooks are any guide, that the first recommendation in a data analysis seems to often be: plot your data. And I suspect that many folks might do this *before* settling in on the model they are going to fit. e.g. If they see nonlinearity, they will then consider a transformation that they wouldn’t have considered before. So whether they make the transformation or not, they might have, thus affecting the interpretability of p-values and whatnot. Perhaps I am being an extremist. Pre-registration, replication studies, or simply splitting a data set into training and testing sets may solve this problem, of course.

So to tie these two questions together, shouldn’t our textbooks do a better job in this regard, perhaps in making clear a distinction between two types of statistical analysis: a data analysis, which is intended to elicit the questions and perhaps build a model, and a confirmatory analysis which is the “pure” estimation and prediction from a pre-registered model, from which a p-value might retain some of its true meaning?

My reply: I’ve been thinking about this a lot recently because Eric Loken, Ben Goodrich, and I have been designing an introductory statistics course, and we have to address these issues. One way I’ve been thinking about it is that statistical significance is more of a negative than a positive property:

Traditionally we say: If we find statistical significance, we’ve learned something, but if a comparison is not statistically significant, we can’t say much. (We can “reject” but not “accept” a hypothesis.)

But I’d like to flip it around and say: If we see something statistically significant (in a non-preregistered study), we can’t say much, because garden of forking paths. But if a comparison is not statistically significant, we’ve learned that the noise is too large to distinguish any signal, and that can be important.

The post “What then should we teach about hypothesis testing?” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post What’s the point of the margin of error? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The 5 minutes of conversation we did have was pretty good, though. I was impressed. The webinar was billed as a “debate” which didn’t make me happy—I wasn’t looking forward to hearing a bunch of pious nonsense about probability sampling and statistical theory—but the actual discussion was very reasonable.

The first thing that came up was, Are everyday practitioners in market research concerned about the margins of error for non-probability samples? The consensus among the market researchers on the panel was: No, users pretty much just take samples and margins of error as they are, without worrying about where the sample came from or how it was collected.

I pointed out that if you’re concerned about non-probability samples and if you don’t trust the margin of error for non-probability samples, then you shouldn’t trust the margin of error for *any* real sample from a human population, given the well-known problems of nonavailability and nonresponse. *When the nonresponse rate is 91%, any sample is a convenience sample.*

**Sampling and adjustment**

The larger point is that just about any survey requires two steps:

1. Sampling.

2. Adjustment.

There are extreme settings where either 1 or 2 alone is enough.

If you have a true probability sample from a perfect sampling frame, with 100% availability and 100% response, and if your sampling probabilities don’t vary much, and if your data are dense relative to the questions you’re asking, then you can get everything you need—your estimate and your margin of error—from the sample, with no adjustment needed.

From the other direction, if you have a model for the underlying data that you really believe, and if you have a sample with no selection problems, or if you have a selection model that you really believe (which I assume can happen in some physical settings, maybe something like sampling fish from a lake), then you can take your data and adjust, with no concerns about random sampling. Indeed, this is standard in non-sampling areas of statistics, where people just take data and run regressions and that’s it.

In general, though, it makes sense to be serious about *both* sampling and adjustment, to sample as close to randomly as you can, and to adjust as well as you can.

Remember: just about no sample of humans is really a probability sample or even close to a probability sample, and just about no regression model applied to humans is correct or even close to correct. So we have to worry about sampling, and we have to worry about adjustment. Sorry, Michael Link, but that’s just the way things are. No “grounding in theory” is going to save you.

**What’s the point of the margin of error?**

Where, then, does the margin of error come in? (Note to outsiders: to the best of my knowledge, “margin of error” is not a precisely-defined term, but I think it is usually taken to be 2 standard errors.)

What I said, during our abbreviated 5-minute panel discussion, is that, in practice, we often don’t need the margin of error at all. Anything worth doing is worth doing multiple times, and once you have multiple estimates from different samples, you can look at the variation between them to get an external measure of variation that is more relevant than an internal margin of error, in any case.

The margin of error is an approximate *lower bound* on the expected error of an estimate from a sample, and that such a lower bound can be useful, but that in most cases I’d get more out of the between-survey variation (which includes sampling error as well as variation over time, variation between sampling methods, and variation in nonsampling error).

Where the margin of error often *is* useful is in design, in deciding how large a sample size you want to estimate a quantity of interest to some desired precision.

In an email discussion afterward, John Bremer pointed out that in tracking studies you are interested particularly in measuring change, and in that case it might not be so easy to get an external measure of variance. Indeed, if you only measure something at time 1 and time 2, then the margin of error is indeed relevant to assessing the evidence. To get an external measure of uncertainty and variation you need a longer time series. I just wanted to emphasize the point that the margin of error is a lower bound and, as such, can be useful if it is interpreted in that way. Even if sampling is perfect probability sampling and there is 100% response, the margin of error is *still* an underestimate because the sample is only giving a snapshot, and attitudes change over time.

The post What’s the point of the margin of error? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Patience and research appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>This is an odd post for me to write because I’m usually not a patient person. In some ways, though, and surprising as it may sound, blogging is a good way for me to exercise my patience. I’m writing this on 2 Sept (at 3:55 in the afternoon, I’m still coming down from the high of teaching two classes so no way I can do real work, and it’s not 4pm so I can’t yet read my email) but I’ve scheduled it to appear on 18 Oct, at the current spot at the end of the queue. I love the feedback from commenters (in the old days, I also loved when I’d get reactions from other blogs, but we don’t see so much of that anymore), but I’ll patiently wait a month and a half for that.

Anyway, I don’t have anything deep to say here, just the commonplace notion that we typically have to try lots of things until we get some success. And apparent success is often illusory. (Obligatory link here to that 50 shades of gray paper.) The fractal nature of discovery.

The post Patience and research appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Debate on using margin of error with non-probability panels appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Most marketing researchers know that using Margin of Error with convenience samples, non-probability samples, and online research panels is inappropriate. However, some researchers continue to report MOE as there does not seem to be a simple or any alternative.

Join Ipsos and a panel of experts for a webinar discussion about:

Why is it appropriate or inappropriate to use MOE with online research panels?

Is it appropriate to use MOE with other types of research, e.g., telephone surveys / RDD

Are there any appropriate alternatives that give similar guidance?

If there are no appropriate alternatives, what should researchers do to guide people interpreting their data?

How can researchers/pollsters who do not use MOE compete with pollsters who do use MOE, particularly when research users demand it?

How are research users supposed to know good from bad without seeing MOE or alternatives?

I can’t tell you how much I hate that first sentence in the above blurb. “Most marketing researchers know” should be replaced by “Many marketing researchers believe.”

I don’t really know what I’ll have to say, beyond yapping out “91% nonresponse! 91% nonresponse!” like a demented version of Long John Silver’s parrot. Any of you who want all my content without hearing the discussion can read this post or this article.

Anyway, the panel will be 30-45 minutes long, and it seems that you can sign up here. Too bad they didn’t get Michael Link, president of AAPOR, to participate; then I could’ve asked him why he didn’t respond to my request for clarification.

**P.S.** Due to technical difficulties this event never happened. It got rescheduled to a time next week that I can’t make, but you can go hear the others, I suppose. I’ll post something tomorrow on what we did say during our brief panel discussion. And here it is.

The post Debate on using margin of error with non-probability panels appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post High risk, low return appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Director of Paris journalism school suspended for plagiarism:

Executive director of journalism school at Sciences-Po university suspended while the university investigates accusations she was plagiarising other people’s articles for columns in the Huffington Post . . .The website Arret Sur Images said it fed around 20 of her columns into an online plagiarism checker and found that in half of them at least one sentence, but more often two or three, had been lifted from other articles and presented unchanged and without attribution.

Hey, I taught at Sciences-Po! (But I didn’t know this person.)

What’s funny about this story, though, was that she plagiarized, and risked losing her career, to publish at . . . the Huffington Post!? That’s what I call high risk, low return.

This is as ridiculous as if a prominent statistician had destroyed his reputation by plagiarizing review articles in some obscure journal on, umm, I dunno, “Interdisciplinary Reviews”? Nah, that could never happen.

The post High risk, low return appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Plans for reboot of Statistical Communication class appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Plans for reboot of Statistical Communication class appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Github cheat sheet appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Github cheat sheet appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Another benefit of bloglag appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>It’s not so much like I [Jones] can see the future; it’s more that I’ve got one foot stuck in the past. I can’t shake it loose. I’m retarded; I’m reliving one year of my life forever.

But this post is more upbeat, it’s a return discussion of my practice of posting blog entries a month ahead of time. One thing that can be frustrating about lagged posting is that I have some great idea (for example, The paradoxical nature of anecdotal evidence, which I just posted on and so will appear in a month and a half from now, i.e. “yesterday” to you) but I don’t get the discussion for a month and a half.

But the plus side—and I think it outweighs the minus—is that I’m so overwhelmed, that if I posted every idea right when it came to me, and I got the feedback right away, I might easily forget the whole incident. By spreading things out over two months, I get another chance to think about the subject, to fit the piece into the larger puzzle.

The post Another benefit of bloglag appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Workshop on science communication for graduate students appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Applications are now open for the Communicating Science 2015 workshop (http://comscicon.com/apply-comscicon15), to be held in Cambridge, MA on June 18-20th, 2015. Graduate students at US institutions in all fields of science and engineering are encouraged to apply. The application will close on March 1st.

Acceptance to the workshop is competitive; attendance of the workshop is free and travel support will be provided to accepted applicants.

Participants will build the communication skills that scientists and other technical professionals need to express complex ideas to their peers, experts in other fields, and the general public. There will be panel discussions on the following topics:

- Communicating with Non-Scientific Audiences

– Science Communication in Popular Culture

– Communicating as a Science Advocate

– Multimedia Communication for Scientists

– Addressing Diversity through CommunicationIn addition to these discussions, ample time is allotted for interacting with the experts and with attendees from throughout the country to discuss science communication and develop science outreach collaborations. Workshop participants will produce an original piece of science writing and receive feedback from workshop attendees and professional science communicators, including journalists, authors, public policy advocates, educators, and more.

Founded in 2013 by graduate students at Harvard University and MIT, 150 graduate students nationwide have already participated in ComSciCon programs. Attendees have founded new science communication organizations in collaboration with other students at the event, published more than 25 articles written at the conference in popular publications with national impact, and formed lasting networks with our student alumni and invited experts. Visit http://comscicon.com/programs to learn more about our past workshop programs and participants.

Perhaps some of the organizers will be taking my class on statistical communication (see here).

The post Workshop on science communication for graduate students appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “Surely our first response to the disproof of a shocking-but-surprising claim should be to be un-shocked and un-surprised, not to try to explain away the refutation” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The quote came up in the context of a statistical claim made by a political activist which was widely promoted and discussed but which turned out to be false. As I wrote at the time, I was disappointed that the activist’s response to the disproof of his claim was not to recalibrate his understanding but rather to try to explain away the refutation and to attack the people who went to the trouble of figuring out where he’d gone wrong. Later on in the comments I continued along the same lines:

If you think being extremely numerate is protection against making a statistical mistake, you are naive about the process of scientific discovery. Extremely numerate people make mistakes all the time. Everybody makes mistakes all the time. Being open to learning from your mistakes, that’s how to move forward. Denying your mistakes and fighting, that’s not a way to move forward in your understanding.

Also this:

As they say in AA (or someplace like that), it’s only after you admit you’re a sinner that you can be redeemed. I know that I’m a sinner. I make statistical mistakes all the time. It’s unavoidable.

As you can see, it’s my general position that if something’s worth saying, it’s worth saying over and over and over.

The issue of accepting error in a shocking-but-surprising claim has connections to two statistical issues I’ve been thinking about recently, as I’ll discuss.

**The paradoxical nature of anecdotal evidence (and of evidence more generally)**

Thomas Basbøll and I recently published a couple of articles on the role of stories in social science (see here and here). Our key point is that stories should be anomalous and immutable: *anomalous* because the role of a story is to change our view of the world, to represent a solid piece of information that contradicts, in some way, our current understanding; and *immutable* because the value of this contradiction comes from the story having sharp edges that do not fit into conventional structures.

To the extent that a story becomes pliable, so that its details can be altered to fit a point of view, it loses its ability to inform us, as social scientists (or as humans, acting in the role of amateur scientists in our goal of learning about the social world).

That’s (one reason) why it’s important, when your surprising story is shot down, to accept that you might be wrong. Your story is *surprising*—that is, it contains information—but this surprise is conditional on the information being true. When it turns out the information is false, it’s a horrible mistake to hold on to the surprise and discard the truth. Then you’re in the position of this guy:

Your belief has no foundation, and you’re supporting yourself on nothing but a cloud of ignorance.

Time to turn around before you end up here:

**“Psychological Science”-style papers**

The other thing the above quote reminds me of, is all the controversy about noise-mining research articles that have appeared in journals such as Psychological Science. My fullest discussion of such issues appears in this recent paper, but, for here, let me reiterate Jeremy Freese’s point that research about the unknown is, well, it’s full of unknowns, and there should be no shame in accepting that a once-promising idea didn’t work out.

Surprising, newsworthy, statistically significant, and wrong: it happens all the time.

The post “Surely our first response to the disproof of a shocking-but-surprising claim should be to be un-shocked and un-surprised, not to try to explain away the refutation” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post On deck this week appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>**Tues:** Another benefit of bloglag

**Wed:** High risk, low return

**Thurs:** Patience and research

**Fri:** This is why I’m a political scientist and not a psychologist

**Sat:** “What then should we teach about hypothesis testing?”

**Sun:** Tell me what you don’t know

The post Lee Sechrest appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Yesterday we posted on Lewis Richardson, a scientist who did pioneering work in weather prediction and, separately, in fractals, in the early twentieth century. I was pointed to Richardson by Lee Sechrest, who I then googled.

His first major book [was] “Psychotherapy and the Psychology of Behavior Change” . . . Sechrest may be best known, however, for another book he co-authored in 1966: “Unobtrusive Measures: A Survey of Nonreactive Research in Social Science” . . .

“‘Unobtrusive Measures’ invoked the notion that we do not have the correct, right, accurate, valid measure of anything,” says Sechrest. “We have measures that are more or less useful under different circumstances. And the best response that we can make to our measurement problem is to use measures that get at the construct of interest in very different way.”

The book has served as an inspiration to many psychologists, encouraging them to go beyond surveys and questionnaires in their attempts to understand behavior . . .

Cool. Measurement is important, and the title and theme of “unobtrusive measurement” seems closely related to ideas we’ve been talking about for awhile regarding the ways in which quantum-mechanical concepts such as Heisenberg’s uncertainty principle might be usefully applied to measurement in the human sciences. I’m still a bit stuck on how this should all be done, but I think the framework makes sense, and it’s interesting to know that a book was written on the topic back in 1966. I guess this is just another demonstration of a longstanding principle of statistics (see also here).

The post Lee Sechrest appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Lewis Richardson, father of numerical weather prediction and of fractals appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>If you get a chance, Wiki this guy:

I [Sechrest] did and was gratifyingly reminded that I read some bits of his work in graduate school 60 years ago. Specifically, about his math models for predicting wars and his work on fractals to arrive at better estimates of the lengths of common boundaries between nations. Pretty remarkable.

Cool indeed.

Lots and lots of great stuff in this mini-bio, for example:

One of Richardson’s most celebrated achievements is his retroactive attempt to forecast the weather during a single day—20 May 1910—by direct computation. At the time, meteorologists performed forecasts principally by looking for similar weather patterns from past records, and then extrapolating forward. Richardson attempted to use a mathematical model of the principal features of the atmosphere, and use data taken at a specific time (7 AM) to calculate the weather six hours later ab initio. As Lynch makes clear, Richardson’s forecast failed dramatically, predicting a huge 145 hectopascals (4.3 inHg) rise in pressure over six hours when the pressure actually was more or less static. However, detailed analysis by Lynch has shown that the cause was a failure to apply smoothing techniques to the data, which rule out unphysical surges in pressure. When these are applied, Richardson’s forecast is revealed to be essentially accurate—a remarkable achievement considering the calculations were done by hand, and while Richardson was serving with the Quaker ambulance unit in northern France.

It also mentions his statistical modeling of international disputes. I wonder what today’s international relations scholars think of this work. I’m sure they’ve gone much farther along in sophistication, but I wonder whether they see Richardson’s work as an interesting precursor or as a dead end.

He also appears to have come up with the idea of fractal dimension in the length of coastlines, inspiring the famous writings of Mandelbrot on the topic:

At the time, Richardson’s research was ignored by the scientific community. Today, it is considered an element of the beginning of the modern study of fractals. Richardson’s research was quoted by mathematician Benoît Mandelbrot in his 1967 paper How Long Is the Coast of Britain? Richardson identified a value (between 1 and 2) that would describe the changes (with increasing measurement detail) in observed complexity for a particular coastline; this value served as a model for the concept of fractal dimension.

I’d never heard of this guy but apparently he’s pretty well known. For one thing, he has this long wikipedia page; for another, it says that the European Geosciences Union has an award named after him. But perhaps his closest connection to fame is that he’s the uncle of actor Ralph Richardson. Which is a little bit like me being famous because my distant relation to Marge Simpson (apparently, she’s married to a cousin of mine in L.A. whom I’ve never met).

**P.S.** I gave the post this title (which I adapted from the link to the above Wikipedia image) because it reminds me of the song, “Cezanne, father of cubism,” which I only heard once, on the radio many years ago, but which Google and Youtube assure me actually exists.

The post Lewis Richardson, father of numerical weather prediction and of fractals appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post When a study fails to replicate: let’s be fair and open-minded appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>One person in the discussion wrote, “As Simone Schnall suggests, this may not call into question the existence of the phenomenon; but it does raise concerns about boundary conditions, robustness, etc. It also opens up doors for examining exceptions, new factors (e.g., cultural factors outside US / North America), etc.” All this indeed is possible, but let’s also keep in mind the very real possibility that what we are seeing is simple sampling variation.

That is, suppose study 1 is performed under conditions A and is published with p less than .05, and then replication study 2 is performed under conditions B (which are intended to reproduce conditions A but in practice no replication is perfect), and replication study 2 is not statistically significant.

(i) One story (perhaps the preferred story of the researcher who published study 1) is that study 1 discovered a real effect and that study 2 is flawed, either because of poor data collection or analysis or because the replication wasn’t done right.

(ii) Another story (perhaps the back-up) is that study 2 did not reach statistical significance because it was a poorly done study with low power.

(iii) Yet another story (the back-up back-up) is that study 2 differed from study 1 because the effect is variable and occurs in setting A but not in setting B.

(iiii) But I’d like to advance another story (not mentioned at all as a possibility by Schnall in her post that got this recent discussion started) which is that any real effect is so small as to be essentially undetectable (as in the power=.06 example here, and, yes, power=.06 is no joke, it’s a real possibility), and so the statistically significant pattern in study 1 is actually just happening within that particular sample and *doesn’t* reflect any general story even under setting A.

Again, let me emphasize that I’m not speaking of Schnall’s research in particular, which I’ve barely looked at; rather, I’m speaking more generally about how to think about the results of replications.

I think we should be fair and open-minded—and part of being fair and open-minded is to consider option (iiii) above as a real possibility.

The post When a study fails to replicate: let’s be fair and open-minded appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Cross-validation, LOO and WAIC for time series appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Jonah asked in Stan users mailing list

Suppose we have J groups and T time periods, so y[t,j] is the observed value of y at time t for group j. (We also have predictors x[t,j].) I’m wondering if WAIC is appropriate in this scenario assuming that our interest in predictive accuracy is for existing groups only (i.e. we might get data for new time periods but only for the same J groups). My hunch is that this scenario requires a more complicated form of cross-validation that WAIC does not approximate, but the more I think about it the more confused I seem to become. Am I right that WAIC is not appropriate here?

I’ll try to be more specific than in my previous comments on this topic.

As WAIC is an approximation of leave-one-out (LOO) cross-validation, I’ll first start considering when LOO is appropriate for time series.

LOO is appropriate if we are interested how well our model describes structure in the observed time series. For example, in the birthday example (BDA3 p. 505 and here), we can say that we have learned about the structure if we can predict any single date with missing data and thus LOO is appropriate. Here we are not concerned so much about the birthdays in the future. The fact that the covariate x is deterministic (fixed) doesn’t change how we estimate the expected predictive performance (for a single date with missing data), but since x is fixed there is no uncertainty of the future values of x.

If we are interested in making predictions for the next not yet observed date and we want to get better estimate than LOO for the expected predictive performance we can use sequential prediction. I don’t recommend using all the terms

p(y_1)p(y_2,|y_1)p(y_3|y_1,y_2)…p(y_{T}|y_{1..T-1})

because the beginning of this series is sensitive to prior. I would use terms

p(y_k|y_{1..k-1})p(y_{k+1}|y_{1..k})…p(y_{T}|y_{1..T-1})

How many terms (k-1) to remove depends on the properties of the time series.

When the number time points is much larger than the number of hyperparameters \theta, to make the series even more stable and to better correspond the prediction task I would define

p(y_k|y_{1..k-1})=int p(y_k|y_{1..k-1},\theta)p(\theta|y_{1..T}) d\theta

If we are interested in making predictions for several not yet observed dates I recommend using, for example for d days ahead prediction

p(y_{k..k+d}|y_{1..k-1})p(y_{k+1…k+d+1}|y_{1..k})…p(y_{T-d,…T},|y_{1..T-d-1})

If we are interested in making predictions for future dates, we could still use LOO to select a model which can describe well the structure in the time series. It is likely that such model would also make good predictions for future data, but LOO will give an optimistic estimate of the expected predictive performance (for the next not yet observed date). This bias may be such that it does not affect which model is selected. This optimistic bias is harmful, for example, if we use the predictions for resource allocation and due to underestimating how difficult is to predict the future we might end not allocating enough resources (doctors for handling births, electricity generation to match the load, etc.).

If we are interested in making predictions for future dates, I think it is OK to use LOO in preliminary phase but sequential methods should be used for final reporting and decision making. Reason for using LOO could be that we can get LOO estimate with a small additional computational cost after the full posterior inference. LOO approximations, which are obtained as a by-product or with a small additional cost after the full posterior inference has been made, are discussed in the papers Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models and WAIC and cross-validation in Stan.

Note that when using Kalman filter type inference for time series models, these sequential estimates can be obtained as a by product or only with a small additional cost.

So now I’ve covered when LOO or the sequential approach is appropriate for time series and I’ll return to the actual question which states

(i.e. we might get data for new time periods but only for the same J groups)

That is, the group ids are fixed and time periods are deterministic

As I told before, LOO (WAIC) is fine for estimating whether the model has found some structure in the data and it does not matter that x is combination of fixed and deterministic parts. If it is important to know the actual predictive performance for the future data, you need to use a version of the sequential approach.

WAIC is just and approximation of LOO. I’m now convinced that there is no need to use WAIC. The paper Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models shows that there are better methods than WAIC for Gaussian latent variable models. We are also working on a better method to approximate LOO in Stan (maybe we call it Very Good Information Criterion?). I just need to make some additional experiments and write the paper…

The post Cross-validation, LOO and WAIC for time series appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post The bracket! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Knocked together in Excel. I’m European, so I may not have respected the North American system for brackets i.e. I split each category into seeded pools and randomly drew from them. The French Intellectuals get a bit of a rough draw in this regard with a lot of early matchups.

I take back all the bad things I ever said about Excel, as this image looks pretty good. Sure, the font is pretty unreadable, but other than that it looks cool.

I just feel bad for Plato, having to go up against Henny Youngman in the very first round, followed by a probable Mark Twain if he can get past Henny. The philosopher-king has a tough road to the Final Four.

The pairings will start on 3 Feb, so get your witticisms ready now!

The post The bracket! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post I need your help in setting up the ultimate bracket: Picking the ideal seminar speaker appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>This came in the departmental email awhile ago:

CALL FOR APPLICATIONS: LATOUR SEMINAR — DUE DATE AUGUST 11 (extended)

The Brown Institute for Media Innovation, Alliance (Columbia University, École Polytechnique, Sciences Po, and Panthéon-Sorbonne University), The Center for Science and Society, and The Faculty of Arts and Sciences are proud to present

BRUNO LATOUR AT COLUMBIA UNIVERSITY, SEPTEMBER 22-25

You are invited to apply for a seminar led by Professor Bruno Latour on Tuesday, September 23, 12-3pm. Twenty-five graduate students from throughout the university will be selected to participate in this single seminar given by Prof. Latour. Students will organize themselves into a reading group to meet once or twice in early September for discussion of Prof. Latour’s work. They will then meet to continue this discussion with a small group of faculty on September 15, 12-2pm. Students and a few faculty will meet with Prof. Latour on September 23. A reading list will be distributed in advance.

If you are interested in this 3-4 session seminar (attendance at all 3-4 sessions is mandatory), please send

Name:

Uni:

Your School:

Your Department:

Year you began your terminal degree at Columbia:

Thesis or Dissertation title or topic:

Name of main advisor:

In one short, concise paragraph tell us what major themes/keywords from Latour’s work are most relevant to your own work, and why you would benefit from this seminar. Please submit this information via the sitehttp://brown.submittable.com/submit

The due date for applications is August 11 and successful applicants will be notified in mid-August.

This is the first time I’ve heard of a speaker who’s so important that you have to apply to attend his seminar! And, don’t forget, “attendance at all 3-4 sessions is mandatory.”

At this point you’re probably wondering what exactly is it that Bruno Latour does. Don’t worry—I googled him for you. Here’s the description of his most recent book, “An Inquiry Into Modes of Existence”:

The result of a twenty five years inquiry, it offers a positive version to the question raised, only negatively, with the publication, in 1991, of ”We have never been modern”: if ”we” have never been modern, then what have ”we” been? From what sort of values should ”we” inherit? In order to answer this question, a research protocol has been developed that is very different from the actor-network theory. The question is no longer only to define ”associations” and to follow networks in order to redefine the notion of ”society” and ”social” (as in ”Reassembling the Social”) but to follow the different types of connectors that provide those networks with their specific tonalities. Those modes of extension, or modes of existence, account for the many differences between law, science, politics, and so on. This systematic effort for building a new philosophical anthropology offers a completely different view of what the ”Moderns” have been and thus a very different basis for opening a comparative anthropology with the other collectives – at the time when they all have to cope with ecological crisis. Thanks to a European research council grant (2011-2014) the printed book will be associated with a very original purpose built digital platform allowing for the inquiry summed up in the book to be pursued and modified by interested readers who will act as co-inquirers and co-authors of the final results. With this major book, readers will finally understand what has led to so many apparently disconnected topics and see how the symmetric anthropology begun forty years ago can come to fruition.

Huh? I wonder if this is what they mean by “one short, concise paragraph” . . .

Update: We just got an announcement in the mail. The due date has been extended a second time, this time to Aug 18. This seems like a good sign, if fewer Columbia grad students than expected wanted to jump through the hoops to participate in this seminar.

**The ultimate bracket**

But I’m getting a bit off topic. What really got me interested in this was the idea of a speaker who is so important, so much in demand, that you have to fill out an application just to be in the same small room with him. Not to mention the labor involved by whoever is screening the applications (assuming, that is, that more than 25 people actually apply).

So here’s the question: who would be the ultimate seminar speaker—the one person who you could only get to speak in a limited-access venue? I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I thought the best way for us to work this out would be via a single-elimination bracket, March Madness style. Which is why I’ve exercised the ultimate in patience and scheduled this post for January, 2015—nearly half a year after I wrote it!

So who’s the ultimate seminar speaker? Of course there’s an endless list of possibilities, ranging from celebrity academics (Paul Krugman, etc.) to cult figures of the past (Philip K. Dick, Ayn Rand, etc.) to actual rock stars (from Elvis on down). But to narrow things down I’ve chosen a list of 64 for us to work through.

My list includes eight current or historical figures from each of the following eight categories:

– Philosophers

– Religious Leaders

– Authors

– Artists

– Founders of Religions

– Cult Figures

– Comedians

– Modern French Intellectuals.

All these categories seem to be possible choices to reach the sort of general-interest intellectual community that is implied by the Latour announcement.

I’ve purposely *not* included any statisticians or indeed any academics (with the exception of Bruno Latour himself) because I don’t want to turn this competition into a mudfest.

I’ll give the list in a moment, along with the seedings, but first let me explain where I need help. I’m sure one of you has access to a computer program that makes one of those pretty brackets—you know what I’m talking about, four little trees of 16 teams each, all meeting in the middle. I want my potential seminar speakers set up in such a bracket which I can then post on this website and which we can go through, one pairing at a time.

With 64 speakers, we’ll need 63 matches to come to a winner. We can do one a day starting on February 3, so that the final bout will come on April 6, the final day of the NCAA men’s basketball tournament.

So here’s what I need from one of you: a full bracket with all 64 seminar speakers, displayed in that pretty “bracket” form, and with the speakers from the different categories all mixed up. It would be pretty boring to have all the artists against all the artists, all the religious leaders against all the religious leaders, etc. Instead, each group of 8 in the bracket should include one from each of the 8 occupational categories, and it should also include one #1 seed, one #2 seed, one #3 seed, one #4 seed, and 4 unseeded people, with the seedings set up as is standard: each seeded speaker is matched against an unseeded person, then the pairings are set up so that, if the seeds advance, #1 faces #4, and #2 faces #3.

Send me the bracket, I’ll post it on the blog, and we’ll go from there, once a day starting on 3 Feb. It will be fun, and the results won’t be obvious. These sorts of matchup can be highly nontransitive because we are implicitly comparing people on many different dimensions.

**The 64**

- Philosophers:

Plato (seeded 1 in group)

Alan Turing (seeded 2)

Aristotle (3)

Friedrich Nietzsche (4)

Thomas Hobbes

Jean-Jacques Rousseau

Bertrand Russell

Karl Popper

- Religious Leaders:

Mohandas Gandhi (1)

Martin Luther King (2)

Henry David Thoreau (3)

Mother Teresa (4)

Al Sharpton

Phyllis Schlafly

Yoko Ono

Bono

- Authors:

William Shakespeare (1)

Miguel de Cervantes (2)

James Joyce (3)

Mark Twain (4)

Jane Austen

John Updike

Raymond Carver

Leo Tolstoy

- Artists:

Leonardo da Vinci (1)

Rembrandt van Rijn (2)

Vincent van Gogh (3)

Marcel Duchamp (4)

Thomas Kinkade

Grandma Moses

Barbara Kruger

The guy who did Piss Christ

- Founders of Religions:

Jesus (1)

Mohammad (2)

Buddha (3)

Abraham (4)

L. Ron Hubbard

Mary Baker Eddy

Sigmund Freud

Karl Marx

- Cult Figures:

John Waters (1)

Philip K. Dick (2)

Ed Wood (3)

Judy Garland (4)

Sun Myung Moon

Charles Manson

Joan Crawford

Stanley Kubrick

- Comedians:

Richard Pryor (1)

George Carlin (2)

Chris Rock (3)

Larry David (4)

Alan Bennett

Stewart Lee

Ed McMahon

Henny Youngman

- Modern French Intellectuals:

Albert Camus (1)

Simone de Beauvoir (2)

Bernard-Henry Levy (3)

Claude Levi-Strauss (4)

Raymond Aron

Jacques Derrida

Jean Baudrillard

Bruno Latour

I don’t know how far Bruno Latour will go in this competition, but at least he’s in the running. May the best man (or woman) win!

And here it is (courtesy of Paul Davidson):

The post I need your help in setting up the ultimate bracket: Picking the ideal seminar speaker appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Stan comes through . . . again! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Erikson Kaszubowski writes in:

I missed your call for Stan research stories, but the recent post about stranded dolphins mentioned it again.

When I read about the Crowdstorming project in your blog, I thought it would be a good project to apply my recent studies in Bayesian modeling.

The project coordinators shared a big dataset (with 124,621 cases) and each research team had to independently analyze the data and answer two research questions:

1) Are soccer referees more likely to give red cards to dark skin toned players?

2) Do referees from countries with high skin tone bias are more likely to be biased towards dark skin toned players?Given the data structure (each case is a player-referee dyad, with variables about how many games occurred between them and more) and inpired by the recent reading of ARM, I thought a multilevel binomial-normal regression could be a good model to analyze the data.

I initially created a model using a different Bayesian software, but it only worked in small samples of the dataset. When I tried to analyze the whole thing, this other program couldn’t get off the ground. So, I decided to give Stan a try… And it worked like a charm!

The project article is still being written, but all analyses are already published in the Open Science Framework. Here’s the link for my analysis.

A short report, source codes and Stan chains are all there, in case anyone is interested.

I know the model isn’t such a great novelty and there is plenty of criticism to be done about what I did. But we can say, at least, that when people first crowdstormed a dataset, Stan was there!

Thank you and all the Stan team for such a great tool!

I haven’t looked at this in detail so don’t take this post as an endorsement of this particular model, coding, or data analysis—but it does demonstrate the success of our goal of allowing people to fit models directly, with a minimum of fuss, so that users can focus on the statistical modeling, not on the computation.

The post Stan comes through . . . again! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Some art so far appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I thanked Ed and asked if it would be possible to take the image and add to it so it’s clear that the “.05” is being drawn from a sea of other numbers, also with a little bucket next to the fisherman with some successful .04, .01, .03, etc. Or would this get too crowded?

Ed replied:

It’s doable. I think as a cartoon or illustration it would be fine. If it’s going on a t-shirt it might be too crowded. However, there are other ways you could simplify. For instance, you could make it two dimensional and have just a silhouette of the fisherman and the boat above the water and then have a bunch of other numbers below the surface. The closest I can think of to this is the DreamWorks logo, if you’ve seen that before (http://upload.wikimedia.org/wikipedia/en/thumb/2/24/DreamWorks_Studios_logo.svg/1280px-DreamWorks_Studios_logo.svg.png). The silhouette approach is more effective for t-shirts anyway, as the less detail their is, the easier it is to see what the image is from far away and it works about the same on a number of different background colors (and is cheaper to screen print in some instances). I could replicate the image in whatever color you want (black, white) to fit the proper background color. I’m assuming these are all going on t-shirts, right?

A few hours later he sent me this:

Along with the following comments:

Here’s kind of an idea of what I [Ed] mean with the silhouette. This would show up rather nicely on a shirt (more so than the previous picture). You could change the color as well to white to fit on other backgrounds (like a black shirt). I’ve included one with a white background and one with no background. I also tried to change the sizes of the “fish” to be comparable to what he might be fishing for (i.e., those closer to .05 are bigger). I can always add more fish and change their angles/orientations to make them look more unique. Also, this design lends itself nicely to text (e.g., “teach the controversy”) if that’s what you want to do. Let me know what you think.

Cool.

And, the rest of you: don’t forget requests #2, 3, 4, and 5!

The post Some art so far appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Artist needed! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>**1. “Gone Fishing” T-shirt**

A person is standing in a boat, fishing. The lake is full, not of fish but of little numbers: “.14″, “.31″, “.08″, etc etc. And also a few shiny glittery numbers like “.04″, “.01″, “.03″. The fisherman is reeling in a glittering “.05″. Then, next to him on the boat is a bucket with a few shiny “.04″‘s and “.03″‘s.

Possible titles:

- Teach the Controversy (as in here).

– *This* is Science?

– *Psychological Science*

– Or we could play it straight and give it the title, We Can Do Better Than This.

In any case, all this would be on the back, and then the front would just have a tasteful Stan logo over the pocket. Or something like that.

**2. “Garden of Forking Paths” T-shirt**

Something similar with a “forking paths” design. I’m not quite sure how to do it.

**3. “Tabloid Science” T-shirt**

Some sort of collage of all the “power = .06″ studies we talk about here: himmicanes and hurricanes, beauty and sex ratio, ovulation and just about anything, MRI’s, etc etc, we must have dozens of examples so we could just cram them in to a single overstuffed image.

Or maybe do it as a mock-up of a tabloid newspaper, with the title “Psychological Science” (in that newspaper-title font like they have for the New York Times) and then various tabloid-style headlines laid out, newspaper style).

Then on the front: “Stan. We can do better.”

Or something like that. Use your creativity!

**4. Handy Statistical Lexicon**

I think this could work in the form of an illustrated book. This is for someone who really likes to draw!

**5. “Second Chance U” and “The New Dirty Dozen” movie posters**

I think these could be awesome. See here for the descriptions of the movies, to get a sense of what you’d be working with.

What I really want to see are the movies themselves, but that would take a lot of work. Maybe I can throw together a script or two sometime. Or at least a Gong Show act.

**I’m serious!**

I’m serious! Anyone who wants to draw any of the above, please let me know. There’s a free T-shirt in it for you. Also glory.

**P.S.** The submissions (from Ed Witt) are here.

The post Artist needed! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post What’s misleading about the phrase, “Statistical significance is not the same as practical significance” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>And, indeed, sometimes this sort of thing comes up (and, irritatingly, such studies get publicity in part because of their huge sample size, which seems a bit unfair in that they *need* the huge sample size in order to detect anything at all), but not so often.

What is much more common are small studies where estimated effects are statistically significant but the estimates are unrealistically huge (remember, the statistical significance filter).

We’ve spend a lot of space on this blog recently on studies where the noise overwhelms the signal, where any comparisons in the data, statistically significant or not, are essentially meaningless.

But today (actually, in the future, whenever this post appears; I’m actually writing it on 22 Nov), I’d like to focus on a more interesting example where an interesting study was performed on an important topic, the estimate was statistically significant, but I think the estimate is biased upward, for the usual reason of the statistical significance filter.

It’s the story of an early childhood intervention on children that, based on a randomized experiment, was claimed by a bunch of economists to have increased their earnings (as young-adults, 20 years later) by 25% or 42%. Here’s what I wrote:

From the press release: “This study adds to the body of evidence, including Head Start and the Perry Preschool programs carried out from 1962-1967 in the U.S., demonstrating long-term economic gains from investments in early childhood development.” But, as I wrote on an earlier post on the topic, there is some skepticism about those earlier claims.

And this:

From the published article: “A substantial literature shows that U.S. early childhood interventions have important long-term economic benefits.”

From the press release: “Results from the Jamaica study show substantially greater effects on earnings than similar programs in wealthier countries. Gertler said this suggests that early childhood interventions can create a substantial impact on a child’s future economic success in poor countries.”

I don’t get it. On one hand they say they already knew that early childhood interventions have big effects in the U.S. On the other hand they say their new result shows “substantially greater effects on earnings.” I can believe that their point estimate of 25% is substantially higher than point estimates from other studies, or maybe that other studies showed big economic benefits but not big gains on earnings? In any case I can only assume that there’s a lot of uncertainty in this estimated difference.

**Here’s the point**

The problem with the usual interpretation of this study is *not* that it’s statistically significant but not practical significant. We’re not talking about an estimate of .003 with a standard error of .001. No, things are much different. The effect is statistically significant and *huge*—indeed, small sample and high variation ensure that, if the estimate is statistically significant, it will have to be huge. But I don’t believe that huge estimate (why should I? It’s biased, it’s the product of a selection effect, the statistical significance filter).

And all this “statistically significant but not practically significant” talk can completely lead us astray, by leading us to be wary of very *small* estimates, while what we should really be suspicious of, is very large estimates!

The post What’s misleading about the phrase, “Statistical significance is not the same as practical significance” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>**Tues:** Artist needed!

**Wed:** Stan comes through . . . again!

**Thurs:** I need your help in setting up the ultimate bracket: Picking the ideal seminar speaker

**Fri:** When a study fails to replicate: let’s be fair and open-minded

**Sat:** Lewis Richardson, father of numerical weather prediction and of fractals

**Sun:** Lee Sechrest

The post “Epidemiology and Biostatistics: competitive or complementary?” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>I have a 20 minute lecture on “Epidemiology and Biostatistics: competitive or complementary?” at Tehran University of Medical Sciences in the next month. I should mention the difference between an epidemiologist and a biostatistician and their competitive or complementary roles in public health. I am wondering if you have any thoughts on this subject.

P.S.

“Knowledge of relevant aspects of statistics … is a prerequisite for proper study of epidemiological research …, just as, for example broad and deep command of mathematics is a prerequisite to a career in physics.”

– Olli Miettinen“Epidemiologist: one who thinks that an odds ratio is an approximation to a relative risk as opposed to a statistician who knows the opposite.”

– Stephen Senn“There are two types of statisticians: those who do causal inference and those who lie about it”

– Larry Wasserman

My reply: Perhaps the following two papers will be helpful:

Causality and statistical learning and

Experimental reasoning in social science.

They’re not specifically on epidemiology but they do address different perspectives in causal inference. By the way, I disagree with the above quote from Larry: as a statistician who works on surveys, I recognize that there is a long and important tradition in statistics of descriptive inference.

To elaborate on this, I do think that essentially all statistical problems are about *comparisons*. And forward causal inference is a form of comparison (what would happen under intervention 1, compared to what would happen under intervention 2). But I often find myself in the business of making comparisons that are not causal (for example, comparing changes in public opinion among two or more groups of people). Such comparisons can have causal implications and they can suggest reverse causal questions as discussed in this paper with Guido, but I wouldn’t quite call them “causal inference” in the usual sense of the term.

To return to epidemiology vs. biostatistics: it’s my impression that there’s a lot of forward causal inference and a lot of reverse causal inference in both fields. That is, researchers spend a lot of time trying to estimate particular causal effects (“forward causal inference”) and a lot of time trying to uncover the causes of phenomena (“reverse causal questioning”).

And, from my perspective (as elaborated in that paper with Guido), these two tasks are fundamentally different and are approached differently: forward causal inference is done via estimation within a model, whereas reverse causal questioning is an elaboration of model checking, exploring aspects of data that are not explained by existing theories.

The post “Epidemiology and Biostatistics: competitive or complementary?” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Wow—this is much more impressive than anything Frank Flynn ever did! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Wow—this is much more impressive than anything Frank Flynn ever did! appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post A completely reasonable-sounding statement with which I strongly disagree appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>In the context of a listserv discussion about replication in psychology experiments, someone wrote:

The current best estimate of the effect size is somewhere in between the original study and the replication’s reported value.

This conciliatory, split-the-difference statement sounds reasonable, and it might well represent good politics in the context of a war over replications—but from a *statistical* perspective I strongly disagree with it, for the following reason.

The original study’s estimate typically has a huge bias (due to the statistical significance filter). The estimate from the replicated study, assuming it’s a preregistered replication, is unbiased. I think in such a setting the safest course is to **use the replication’s reported value as our current best estimate**. That doesn’t mean that the original study is “wrong,” but it *is* wrong to report a biased estimate as if it’s unbiased.

And this doesn’t even bring in the possibility of an informative prior distribution, which in these sorts of examples could bring the estimate even closer to zero.

The post A completely reasonable-sounding statement with which I strongly disagree appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Planning my class for this semester: Thinking aloud about how to move toward active learning? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>- Design and Analysis of Sample Surveys (in the political science department, but the course has lots of statistics content);

- Statistical Communication and Graphics (in the statistics department, but last time I taught it, many of the students were from other fields).

I’ve taught both classes before. I taught Statistical Communication last semester. It went well and I’m rearranging it a bit for the spring. It should go well.

I’ve taught Design and Analysis of Sample Surveys twice before, and each time the students have wanted a bit more statistics and a bit less social science. Most of the students in the class are studying political science but they can get that from the other profs in their program; when they take my course they’re looking for the hard statistics stuff they can’t get anywhere else. Their favorite part of the course was when I taught them about practical regression modeling.

These exam questions should give you an idea of what was in my surveys class before. It’s ok but this time I’m going to go lighter on the traditional sampling topics (ratio and regression estimation, stratified cluster sampling bla bla bla) and instead have them do Mister P for real in R and Stan, just like the grownups do. These are Columbia grad students, for chrissake—I don’t know what I was thinking before. If they don’t learn serious survey analysis now, when will they?

Don’t get me wrong here. I won’t teach *only* MRP. But it will flow naturally from (a) regression modeling, and (b) the goal of using a sample to make inferences for the population. From this perspective, it would be perverse to teach regression and sample surveys and *not* show them how to do MRP. And, once they’re fitting multilevel models, it makes sense to do it in Stan, since that’s what everybody’s gonna be using soon anyway.

OK, so here’s the deal. In revamping my Design and Analysis of Sample Surveys, I need to fix two things:

1. The course material. Less of the boring classical stuff that I used to force myself to teach and force the students to remember (for example, the expression for the standard error of the ratio estimate) and more of the good stuff. To get more specific, I need to write some R and Stan code to do MRP in some simple examples, I need to get the relevant census data together, etc. And of course I need to put this in the context of 14 weeks of class.

2. The classroom experience. Me standing up and talking in front of a class of 25 students? What a joke. Anything important I can say, I can write instead, and the students can read (remember, they’re Columbia grad students: if they can read AJPS papers, they can read whatever tutorial material I write). Classroom time is mostly wasted unless it involves active student learning. I know this in the context of my other course, now it’s time to walk the walk and do it for all my other classes. Starting with this one.

**What to do during those 28 sessions, each 75 minutes long?**

But . . . what should I actually do in class? I’m not sure. The first week of class I can lecture and have discussion, that’s no problem, the students need to get a sense of what’s coming and why it’s important. I guess I should prepare a few work-in-pairs problems, though. Then, after that first week, their homework assignments will start to come in, and we can spend time on that.

I’ll require that students bring their laptops to every class so that, whenever we want, we can break them out and start working. More efficient to get their R and Stan issues resolved in 15 minutes during class than during tearful overnight sessions at home.

I still think I need a specific plan, though.

It goes like this: Each week we have topics, readings, homeworks, and the skills and concepts I want the students to learn. This all drives the class period. I’ll prepare some slides to spark discussion.

No fear of dead time. That’s important. The students have tons that they have to figure out, that they ultimately have to work out for themselves. Two 75-minute periods a week are not a lot of time, it’s precious time for me to help them out.

So, I still need to make a plan for how to spend each class, starting in week 2.

In the meantime, here’s my current schedule of topics for the 14 weeks of class. Any comments are appreciated.

Introduction (week 1):

1a: Overview of the course

1b: Examples of surveys in the newsStatistics review (weeks 2–4)

2a: Basic statistics

2b: Statistical inference in the context of large variation3a: Linear regression

3b: Logistic regression4a: Statistical graphics

4b: Causal inferenceClassical design and analysis of surveys (weeks 5–7)

5a: Survey interviewing

5b: Survey measurement6a: Simple and stratified random sampling

6b: Weighting and poststratification7a: Cluster sampling

7b: Analysis of data from cluster samplingSocial and political science (weeks 8–10)

8a: Surveys in the United States

8b: Surveys in other countries9a: Voting and political participation

9b: Public opinion10a: Network sampling

10b: Survey experimentsAdvanced analysis of survey data (weeks 11–14)

11a: Bayesian regression

11b: Multilevel modeling12a: Item-response and ideal-point modeling

12b: Multilevel regression and poststratification13a: Constructing survey weights

13b: Missing-data imputation14a: Open problems in analysis of survey data

14b: Summary of the course

Maybe we should do some role-playing activities? Maybe the students should design and conduct a survey together? I don’t know.

The post Planning my class for this semester: Thinking aloud about how to move toward active learning? appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “Life Paths and Accomplishments of Mathematically Precocious Males and Females Four Decades Later” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Anyway, I was interested in this paper (by David Lubinski, Camilla Benbow, and Harrison Kell) because . . . I’m one of the kids in the study. I was 11 years old at the time.

What’s happened since then? According to the abstract of the paper:

Across the two cohorts, 4.1% had earned tenure at a major research university, 2.3% were top executives at “name brand” or Fortune 500 companies, and 2.4% were attorneys at major firms or organizations; participants had published 85 books and 7,572 refereed articles, secured 681 patents, and amassed $358 million in grants. . . .

Wow, we’ve really cost the taxpayer a lot of money!

Seriously, though, this surprised me:

That’s me in cohort 2! I think the categories are intended to be mutually exclusive, so I don’t know if they classified me under “Science, Technology, Engineering, & Mathematics” or “Education: Post-secondary Faculty.”

The thing that really stunned me, though, was that 15% of us were chief executives. Whoa! Although maybe that includes self-employed people who have their own business. It’s really not so clear. “Chief executive” sounds like a bit deal to me but maybe I’m missing something.

One thing that seems particularly time-bound is the high rate of people working in law, medicine, and finance. I don’t see a strong connection between those fields and mathematical precocity (ok, maybe in some aspects of finance), but they all pay well, and law and medicine were, at least for kids of my generation, natural fields to go into for kids who were good at taking tests.

Recall that the “mathematically precocious youths” were identified by scoring high on the SAT. So it could well be labeled a study of “youths who were talented at standardized tests.” (But it’s not quite as bad as it sounds. Back in the 1970s, we didn’t see standardized tests very often, so we were taking the SAT cold. It’s not like we were sitting there in elementary school taking practice tests every year.)

Given all these high-powered occupations, I was surprised that the median salary for the married men in my group was only $150,000. And was the top centile only $400,000? Hard to believe, with all those doctors and chief executives in the sample. I can’t quite figure this one out. I mean, sure, $400,000 is a lot of money, but “top centile”—that’s the 99th percentile of income, right? There aren’t 3 guys in this group of 249 married men who are making more than $400K? Interesting if true. I guess these aren’t the highest-paid sort of doctors, lawyers, finance professionals, and CEO’s.

My biggest disappointment with the paper is near the end, when they talk about responses to attitude questions. For example, what percentage of (adult) men and women in our group agree with the statement, “Society should invest in my ideas because they are more important than those of other people”? I’d like to know. But all we are told (it’s in figure 6) is that men are about 0.3 standard deviations more likely than women to answer yes to that question. Sex differences are interesting but I’d like to see the averages too. And they’d be so easy to display!

Where they *do* display means, in figure 7, the results (and the displays themselves) are pretty boring. The authors have a lot of good stuff here; I’d just like to see more. And of course it’s pretty exhausting to see everything written in psychology-journal style with those p-values all over the place.

**P.S.** I’m disappointed to see that they reported results to fractional percentage points. “23.6%,” indeed. Couldn’t they have run this by me for comments? This sort of hyper-precision is just embarrassing.

**P.P.S.** Data-detective Weakliem also is suspicious of the top percentile number. It seems like the authors did make a mistake somewhere. Weakliem writes, “they must have measured income by categories and then applied some rule to translate them into the numbers shown.”

The post “Life Paths and Accomplishments of Mathematically Precocious Males and Females Four Decades Later” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post What to do in 2015: Your statistics diary appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>For the last two weeks of our class on statistical communication, I gave my students the following assignment:

Every day, you will write an entry in your statistics diary. Just set up a text or Word file and add to it each day. The diary entries can be anything. They can be short slice-of-life observations (“Today Jakey asked me what he would be the best at in ten years: piano, soccer, judo, chess, or fencing. This made me wonder: how predictable are such things? How would we gather data to measure this?”), quick research notes (“Someone emailed me a graph of birthday data from Brazil! I sent it to Aki along with a link, maybe he or a student can fit our model. Apparently Brazil has different holidays than we do in the U.S.”), or things you’re working on, difficult statistics problems that you might be stuck on, or have an insight about. You can write as little or as much as you want each day. The only requirement is that you write something new in it, every day. You’re not allowed to go back a week later and fill in 7 entries at once. That would be cheating. One entry each day. Just type it in to the file.

Here are some of their diaries, which I’m posting here for three reasons: First, I had a lot of fun reading these, and I think you will too; second, for you teachers out there, maybe you’ll now be inspired to try this “statistics diary” thing out in your own classes; third, I’m teaching statistical communication this spring, and now that I’ve posted this, I can point students to this link so they can get a sense of what their diaries can look like.

I was thinking of posting just my favorite entries, but I decided it would be best to present each students’ diary entire, in order to give a clearer sense of what they were doing.

Student #1:

**November 25, 2014**

Why are paradoxes of the statistical sort so “paradoxical”? How much of it is linguistic? As in, how much of our confusion is a fault in the ambiguities contained in how the paradox is stated? I’m thinking in particular of, for instance, things like the paradox of the heap, or of Monty Hall, where it’s unclear what the host’s actual intention is. I think I’ll write a post about Monty Hall later, although not so much from a linguistic point of view.**November 27, 2014**

Today I stumbled across this “journal” (http://www.jasnh.com/) which purports to champion the null hypothesis. Even if I like the spirit behind it, I’m not sure it’s a great idea.I just think you’d have to back up each study as being sufficiently plausible that the null is false; otherwise, it’d be too easy to publish whatever you want. Also, it doesn’t seem to be the most well-organized. I always think better when there are categories involved, and the website is essentially an uncategorized listing of papers.

**November 28, 2014**

John List gave a talk at the Society of Judgment and Decision Making this past week about “making life like a lab experiment”. I liked a lot about what he said, although a lot of it seemed like it pertained to the economics field more than anything else. The “field experiments” movement resonates me on several levels (spiritually, ecumenically, grammatically…).On the other hand, I do wonder whether List’s position (that we should stop trying to analyze “mounds and mounds of big data” and instead conduct field experiments) might be almost too broad. It seems like field experiments are great under certain contexts where interventions are clearly feasible. But, there are so many problems that are worth studying that are very difficult to study via the field experimental approach, be it due to ethical barriers, legal constraints, historical issues, rare events, etc. It seems like he may be overlooking the usefulness of sophisticated causal inference techniques in these scenarios.

**November 29, 2014**

Another cool, if strange, effect on the “power” of uncertainty I was reading about today. It turns that under certain scenarios, people might prefer uncertainty to certainty (Shen et al., working) which is interesting considering that we generally are pretty averse to things we can’t predict. What these researchers find is that people are willing to work harder and exert more effort for rewards which are probabilistic (e.g., “50% chance of winning $1, 50% chance of winning $2) compared rewards which are certain (e.g., “100% chance of winning $2″). Note that here, the probabilistic reward is dominated by the certain reward!**December 1, 2014**

It seems weird to me that people aren’t as persuaded by “hard statistics” as they should be. It’s not that surprising, I suppose, given how much pull things like status quo bias have on people, even when statistics might seem to refute habit. The thought I had related to this concerned the proposed change in NBA basketballs from a while ago. In the end, even though the numbers were in favor of adopting the new standard, the proposal was rejected due to a few vocal nay-sayers who had already gotten used to the old basketballs. Seems like numbers have a hard time winning over plain inertia.**December 2, 2014**

Since I started being friends with R Shiny, I’ve been thinking some about when dynamic graphics are better. I feel like there aren’t many systematic principles out there to use a rules of thumb. It’s tempting to almost overdo it, especially in my current state of mind, where I keep wanting to turn every graph I make into a Shiny app. But are there times where dynamic graphs are overkill? When do they help, and when do they just distract? I think it would be useful to as a class, maybe, via the blog, draft a guideline addressing this issue.**December 3, 2014**

Not my story, but a story I heard second-hand from a friend. This friend’s professor of political science was recounting a time when a student he thought highly of received a negative review from the department. As a result, the professor said to my friend that he revised his opinions of the department downward, his opinions of the candidate downward, AND his own opinion downward. Something about this isn’t right!

Student #2:

November 23

First diary entry – I suppose I’ll make a few comments about my final project and goals for expanding it. We’re building a Shiny app to plot data from a linear regression – the app will allow you to plot different explanatory variables against your dependent variable and to view the impact on regression output of incorporating different variables in the model. We’re planning to present the interactive app in class, but ultimately we’d like to turn it into an R package so that people can run it on their own – I think this will give them a deeper understanding of how Shiny works.November 24

My local post office is terrible – they routinely fail to deliver packages but pretend that they tried to, marking things “Business closed – notice left” when no one put any notification on our door. The USPS has definitely gone through tough times in recent years, but I wonder how my post office compares to others. You could use a performance measure such as the average length of time between arrival of a package to the facility and actual delivery.November 25

A lot of people have been sharing a 538 article relevant to the Darren Wilson lack of indictment without seeming to have read it. The headline (“It’s Incredibly Rare For A Grand Jury To Do What Ferguson’s Just Did”) is grabby but misleading, given the article content. The article goes on to explain that while prosecutors convince grand juries to indict in nearly all cases, this is not true when the prospective defendant is a police officer. So it is incredibly rare to fail to get an indictment, but it’s pretty ordinary when the case is related to a police shooting.November 26

Our Shiny app for the final project is going well – today I linked the user’s CSV input to the rest of the functionality, so it’s fully reactive to the dataset input. The next step is for Tara to add in her leverage function and added variable plots.November 27

Happy Thanksgiving! How consistent are sales of traditional Thanksgiving food items from year to year, controlling for population alone (such as number of turkeys or pounds of cranberries sold per capita)? Are there any overarching trends over time? I’d guess we eat more of everything…November 28

A Shiny app from Hadley Wickham that allows you to scale an eggnog recipe! This inspired me to ponder the many possible non-statistical uses of Shiny…you can imagine a whole cookbook published this way. The distribution platforms have room for improvement, though – it would be nice to be able to browse shinyapps.io rather than navigating directly to the app’s URL.November 29

With all of the holiday travel going on, I’ve been wondering about some of the paradoxical human perceptions of statistical risk. Why are people more afraid of flying than driving, when fatal traffic accidents are so much more common than plane crashes? Some argue that it’s related to the lack of control, since you aren’t the one flying the plane – but you’re also often a passenger in a car, you never drive a bus or train, etc. It’s curious how humans tend to fixate on extreme scenarios in interpreting risk.November 30

With all of the Black Friday sales going on, I’ve been wondering about how companies use all of the additional information available to them via online sales. Prices on Amazon change with significant frequency, and the retailer can see the goods we place in our “shopping carts” – do they change the price of the good based on the indicated interest? This seems like it could significantly alter the negotiating power in the existing relationship between retailer and consumer.December 1

On R-Bloggers today, there was an article listing gift ideas for statisticians. I quite like the ’distribution plushies’ – stuffed toys in the shapes of statistical distributions.December 2

Inspired by this article from 538, I was thinking about how people lie about their height. There’s a plot in the article that indicates the distribution of men’s height on OKCupid is centered roughly 2 inches above the actual distribution of men’s height in the US. But is everyone reporting 2 inches higher? Or is there a tendency to report more added height when you’re shorter (if I’m 6’0, I’ll say I’m 6’1″, but if I’m 5’7″, I’ll say I’m 5’10″)? And do people prefer to report round numbers, like 6’0″?December 3

I wondered today about medical statistics with respect to disease prognoses. I’m sure there are conventions in reporting terminology, but for the uninformed layperson, the phrasing makes it hard to tell quite what things mean. For example, for a disease with a 5-year survival rate of 70& and a relapse rate of 35%, does the survival rate incorporate the probability of relapse?December 4

Apparently 2014 is on track to be the hottest year on record, with global average air temperature 0.57 C above average. Which raises the question – what is the best way to make people care about something that isn’t individually observable? This past winter was very cold and snowy in the US, and my grandmother refuses to believe global warming exists.

Student #3:

Mon 24 Nov

I’m wondering about the accuracy of weather forecasts. When snowfall is predicted and an estimated number of inches is proposed, how many times do the actual measurements match up? Has anyone reported on weather forecast accuracy recently with actual data?Tues 25 Nov

Apparently I just crossed paths with the Ferguson protests in midtown this evening. I wonder what the ratio of police to protesters is here in the city? When the news coverage recalls how many people were at such events, how do they arrive at their estimate? Is it the “density” of people in one block multiplied by the number of block occupied? I guess I wonder how these things are estimated for a dynamic group (people marching through the various avenues/streets in Manhattan) vs a static group (assembled in a park for example). The latter seems easier…Wed 26 Nov

I bought an actual paper version of the New York Times for my train ride today and saw an infographic on the “most googled recipes in each state”. What is frog eye salad and why do people in Colorado seek this out so much? I didn’t save the print version, but found the online version here: http://www.nytimes.com/interactive/2014/11/25/upshot/thanksgiving-recipes-googled-in-every- state.html?_r=0&abt=0002&abg=0

It’s interesting because the online version seemed to have missed a good opportunity to make the map an interactive one (click on a state and that takes you to the more detailed state section, vs the pulldown menu option). I almost like the print version better for the clarity and ease of seeing all the states with their top 3 recipes in an array.Thurs 27 Nov

I wish there was a better way to peruse PubMed. I don’t love the way the search and archive system is designed. It seems like more filters or ways to organize a list of results could improve the system. For example if one searches “microfluidic whole blood assays for point of care settings”, they should be able to re-organize the results by “most cited”, “newest to oldest” etc. It just seems like the sheer volume of articles contained in the system is regurgitated rather than refined with these exploratory lit searches. I wish the database could be reorganized or re-tagged for better searches!Fri 28 Nov

It seems like there’s more and more reports of how Black Friday “sales” are not actually much of a bargain.http://money.cnn.com/2014/11/25/news/economy/black-friday-deals/

I wonder how this year will be analyzed by the news outlets. Sat 29 Nov

More Thanksgiving analysis. A friend sent me this link: http://jezebel.com/this-is-what-america-was-most-thankful-for-this-year-s-1664583267

It’s a visualization of some Facebook “data” of what people were most thankful for in each state. I think it’s relative to people in other states, so I doubt Californians’ first thought is Netflix, but perhaps that shows up more in their posts than people from Ohio. It’s probably also skewed by the kind of people who post a status talking about something like this.

The original Facebook data analysis post is here: https://m.facebook.com/notes/facebook-data-science/what-are- we-most-thankful-for/10152679841318859/Sun 30 Nov

￼￼￼￼￼I emailed our research collaborator after seeing some discrepancies in our data analyses. I explained our methodology, which I assumed was a pretty standard one in the diagnostics space. Apparently, the collaborator disagrees and finds this “too stringent” of an analysis. I think this requires more followup.Mon 1 Dec

We had a final project deadline at 5pm today. I think it would be really interesting (and entertaining for the instructors) to plot the distribution of submission times for this. Maybe even compare this to all the previous homework assignment submissions.

Tues 2 Dec

We had a call with the research collaborator this afternoon about data we had collected last summer and her methodology in analyzing it compared to ours. To summarize the gist of our “issue”, we were discussing diagnostic performance specs, such as sensitivity, specificity, and signal to noise cutoffs. I think the root cause of the confusion was that the “gold standard” reference test data she had provided (to which we had to compare our prototype test to), was not actually the right one, and therefore we were getting mismatched counts of “false positives” or “false negatives” (leading to discrepancies in overall diagnostic performance specs that were calculated).Wed 3 Dec

http://www.nytimes.com/2014/12/04/technology/personaltech/smart-nurseries-track-a-babys-sleep-or-lack- thereof.html?hp&action=click&pgtype=Homepage&module=mini-moth®ion=top-stories-below&WT.nav=top- stories-below

“￼￼￼￼it’s hard to figure out what parents can do with this data or how it could actually help babies sleep

￼better.”

Seems like another “Big Data” type issue: finding ways to deconstruct data or provide actionable steps?

Student #4:

Nov. 25

Andrew is sending out free book to students again. Consider the number of free book a student got from the class, should that be underdispersed (because when one got a book, he will tend to give the change to others) or overdispersed (because there are guys who are really active and are more likely to get the book)Nov. 26

Attended a talk about contextual bandit problems. Striking a balance between exploration and exploitation really is one of the main themes of human life.Nov. 27

Had a great Thanksgiving meal in Wei’s home. The time series for consumption of turkey must be really spiky around Thanksgiving. How do turkey sellers manage to accommodate the bursty need during Thanksgiving?Nov. 28

Went out shopping in department stores in Midtown. Analyzing the discounts in Thanksgiving and maybe providing optimal strategy for boasting sales seems like a cool stat projectNov. 31

Spent too much time playing candy crush, which is really full of uncertainty. Not sure if the developers took statistics into account. How do they control the difficulty of the game?Dec. 1

Spent most of my time coding a dynamical system model for neuron spike train data (a fancy way of doing PCA on count data). Just amazed by how people developed all these delicate and complicated models to summarize and understand the data.Dec. 2

Still in the middle of coding and felt that it can be true that “50 percent of the result from statistics papers are just coding error”. I mean, when things got complicated, coding and debugging is really really hard.Dec. 3

Heard a talk about a variant of PCA that involves both the covariates and the latent variables.

Student #5:

Monday, November 24

It’s my dad’s birthday, hooray! He actually got the birthday present I sent on time but my mum, whose birthday was 2 weeks ago did not. Instead, she got a letter from customs, asking her to come in person and present all the original receipt for the items in the package that I sent. I am pretty sure this is the result from some random sampling they do because pretty clearly, if you look at the stuff in the package, you can see that the value is less than the legal limit. I still wonder what the procedural rules are: how many packages are selected at random? And then if they open it and see it’s of no commercial value, at what point do they just sent it on? Also curious what kind of data is collected but right now just a little angry my mum’s present is 2 weeks lateTuesday, November 25

I was going to run a pre-test of a survey on MTurk. The big concern with MTurk is always that it’s not representative of your target population – too many men, too young, too educated to represent an average citizen. But since the pre-test was supposed to be just about making sure the questions were understood, produced variance and somewhat sensible answers, this approach seemed alright. Then it turned out, that MTurk just pretty much did not have any workers at all in our target population – so we needed to come up with Plan B really quickly and ended up sending the survey out to friends and family…. Which is an even much less representative sample than MTurk – looks like almost everyone is in their late 20s or early 30s and highly educated. At what point does it become just too unrepresentative?Wednesday, November 26

Big travel day before Thanksgiving. The weather report anticipated a snow storm and according to what I am hearing from friends, there is a lot of anxiety at the airport. But what has come down by the afternoon was mostly rain, no big deal. I wonder if the weather people weight downside and upside risk differently – e.g. if there is a 40% chance of a snowstorm, would you rather tell people there will be a snowstorm and be wrong, then at least everyone was prepared, then tell them the likelihood is greater that everything will be just fine, even though there is a substantial chance of snow? Is the reasoning the same for different kind of weather events? Snow, wind, rain?Thursday, November 27

Took a long run in Central Park in anticipation of massive amounts of food later. There were a few people walking their dogs, but overall, it was pretty empty but I noticed a lot of cops in the Park. Wondering how they get assigned – is there extra pay for holidays, so they all volunteer? Is it the same number as usual, just that they are more noticeable because there are fewer people and it seems more disproportionate? Are there extra cops because a relatively empty park might attract crime? I guess I am really interested in forecasting in generalFriday, November 28

Another day, another Central Park run. Even more cops today! Maybe they’re planning their Christmas party?Saturday, November 29

I just discovered this stats blog: http://iquantny.tumblr.com/ – totally awesome! The author takes public New York City data and turns it into really fun analyses!Sunday, November 30

So, I am working on a paper that deals with the spatial links of racial attitudes and attitudes towards welfare and I am struggling to find good data – it seems that most surveys collect spatial information on the state level, at best, when really I would be interested in country-level, zip-code or even census-tract level data. I wonder why – I also recently discovered by accident that when you do a survey using qualtrics, it will automatically record latitude and longitude of the respondents (based on their IP address, I presume) so that even gives you point data – awesome! Why is this data (or some aggregated level if there are privacy concerns) published for all online surveys? And in-person surveys? Good spatial data should also be really easy to collect, then, it’s hardest for phone interviewsMonday, December 1

Final prep meeting before we’re presenting our project “Plotting for dummies” tomorrow. I really learned a lot about Shiny and I think also coding in general, great experience.Monday, December 2

Still struggling to find appropriate data for my spatial project. I spent countless hours trying to find out what an obscure variable called “geolink” was, according to the codebook a variable to “link the data to census data”. Turns out it’s just made up numbers because the PIs wanted to restrict access to the spatial link…. Gaaaaa!Tuesday, December 3

Asked the ICPSR what it would take to get access to the data…. They are telling me that they don’t have the data either. I reached out to the Principal Investigators, let’s see.Wednesday, November 4

Heard back from the Principal Investigator – they write “As I am sure you realize, this data was collected 20 years ago” – and continue to say that they don’t have the data either. Unfortunately, in terms of spatial data and including the kind of questions I am interested in, this survey is still the best there is… so I am not sure what to do. I might try and find papers that use the data and see if they authors kept the data.

Student #6:

Nov 24

This afternoon, one friend who works in pharmacy asked me the Negative binomial regression. He told me why bio-researchers often ignore statistical assumptions in those statistical methods. It confused me too. I thoughts we should be more concerned with those assumptions, or the results are not reliable.Nov 25

My friend Rijia and I presented in Dr. Gelman’s class. I was wondering If I should use a statistical way to describe how nervous I was. I might count my heart beats times in one minute with the time passed, or in which the slide I told, or at which moment that the comment classmates said, or what kind of face that Professor Gelman showed.Nov 26

it was snow today. Honestly, Americans like discussing the weather.

Because of the statistical diary, I ask myself how fast and how large snowing would cause the road have snow cover. The factors should include the humidity of the time and the snow speed per hour… etc.

Indeed, I never thought those questions before. Thanks for the statistical diary assignment. Should I google the answer and write it on the statistical diary? lolNov 27

Today is Thanks giving, I visited my friend Jeff’s apartment, having a thanks giving dinner with him and his friends. Joyce is Jeff’s friend I never meet her before. She is a business data manager in MTV. She introduced her job position. She told us that she must designed many models for predicting the internet users’ video and movie tastes. In her department, she and her colleagues should capture surfing webs information from each internet user. After collecting bunch of data, Joyce and her colleagues could realize what kind of movie or video that users may prefer, and the MTV company would send the advertisement for you.

I thought the statistical methods she used were applied in many fields such as shopping webs, elections campaigns… and so on. It is really interesting.

After Joyce explained what she did, everyone all called her ‘evil’, because they couldn’t believe there is no privacy in the internet world. If you want to keep your privacy, clean the Webs Caches and manage your computer routinely.Nov 28

It is Black Friday. I spend whole morning looking for valuable back bag. I love the Amazon system. When I looked for these products I considered to buy, the Amazon system always gave me other related products simultaneously. This kind of algorithm save me many time. I observed the criteria the system uses, including the popularity (review number of consumers), the producs release date, best discount of products, and the best sell.

In statistical thinking, I thought when seller categorize each product they must be very careful. because the key words are very important. When people typed the keywords, how precise the shopping filter would cause whether the product become popular or broaden the information to the potential costumers.Nov 29

There is a mid-term election for local governors today in Taiwan. As a Taiwanese, I woke up at 5:00 am for following the election results. It is really excited because the candidate of Taipei City who I support won the campaign. The KMT party, which is like American Republican lost many counties where they governed before.

Many political scientists felt surprised that the KMT got a serious lost.

They all thought the main reason is that the election poll is different with the election result. Many statistician and political scientists discussed the possible factors, which include the missing sample of teenagers who study out of their home town. The phone interview lost plenty of samples who only use cell phone. Those discussions made me think about the course of sample survey I learned from Dr. Michael Sobel in 2012 fall semester.

With the technology progressive, if scientists want to capture precise information, they should consider tech using and think a new approach for design the sample survey methods.Nov 30

I have an idea of Entrepreneurship. I want to be a data story teller, which is sharing more Asian social issue with data analysis to the world. I am wondering if it works or not. Therefore, I shared the idea with my friend who is a sociology PhD student in NYU and a politician in Taiwan. I told them that there are many data journal teams such as fivethityeight, NYTimes, Washington Post in the United States. Data Journalists analyze bunch of data and write many meaningful articles.I am wondering if it works in Asia. We discuss a lot difficulties we may suffer later. For example, there are less datasets in Asia. In addition, If we would like to write any discourse, should we write the article in Chinese or English or both.(I know my English writing is suck…) Anyway, no matter how difficult it is, we had a conclusion that we should start it as soon as possible.Dec 1

It is bored today. I am struggling in assignment of Statistical Machine learning. I deeply believe that the statistical science is really broad and deep. As an amateur in quantitative research, the learning process is like a boat sailing against the current, we must forge ahead or be swept downstreamDec 2

I am wondering what kind of student eager to write statistical diary everyday. I thought the main reason for me is statistical thinking and english writing.

Since began the statistical diary wiring, I write it every morning. I remember the interesting event before the day. It stimulate my statistical ideas.

In addition, as a Non-Native English speaker, I love using this chance for improving my English writing….. oh… I need to increase the vocabulary in my diary.Dec 3

More statistical methods I learned, more interesting research questions I could ask. I am not sure if this argument would be always true or not.

Student #7:

11/24 Someone should make a dictionary that shows how the meanings of statistical terms vary between the various disciplines that use statistics. For example, I’m sick of trying to figure out what authors mean by fixed effects and random effects. These terms get tossed around casually as if everyone with statistical knowledge knows what they mean, except they seem to mean many different things to different people. Very frustrating.

11/25 How many undiscovered computer coding errors/typos are there that, if fixed, would change the results in published work? Probably a lot. There are a few notable examples (e.g. Harvard economists and their Excel documents), but my hunch is that there must be many many many undiscovered cases (mostly innocuous but some are probably consequential). Is there any way to estimate this?

11/26 I saw presentations by two different Columbia graduate students today and in both cases the student made the same really basic statistical mistake: to choose the independent variables to include in a regression model they first did a bunch of bivariate regressions and then used all of the “significant” predictors in the bivariate regressions as the predictors in the multiple regression. No no no no no!!! Are students being taught to do this? Are they taught not to do it but then forget or get lazy? Are they taught not to do this but then aren’t shown any alternatives?

11/27 I’ve had a number of conversations recently with people are convinced that random means that all possibilities are equally likely, even when confronted with simple examples to the contrary. For example, I tried using the example of weighted dice and got a response like “oh, well if they’re not equally likely then it’s not really random.” I wonder how common this perception actually is and why it exists at all. Maybe the most common colloquial use of “random” is as a substitute for “uniformly at random”, but then what’s the colloquial term all other kinds of randomness?

11/28 I sent an email to the author of a political science paper asking if he had data and code that I could look at. He responded promptly and sent me what he called his “replication files.” But then I discovered that he only had processed data (after recoding, cleaning, etc.) and the code he sent was just for running the analysis on this post-processed data. I asked if he had the raw, dirty data and the code he used to prepare it for the analyses and he said he didn’t think so but would get back to me. I haven’t heard anything. I’m wondering if this is typical or atypical? I don’t have enough experience to know how prevalent this is. If “replication” is just running code on clean data then we’ve got problems (see my entry for 11/25).

11/29 I wonder if the ‘gifts’ that public radio stations offer to listeners who make dona- tions (e.g. tote bags, t-shirts, DVDs of Ken Burns documentaries, etc.) actually lead to an increase in the number of contributors, the amount donated per contributor, etc.?

11/30 One of my pet peeves is when people say that something was caused by chance. Usually when we say something was caused by chance we’re really just expressing our ignorance about why that thing happened. But over Thanksgiving I caught myself saying it, which was rather disappointing. It’s just such a natural thing to say, despite not making much sense. Am I making something out of nothing by being annoyed by this common saying? I found a blog post on this topic – written by a statistician – in which the author recommended replacing “caused by chance” with “I don’t know what the cause is, but our uncertainty in X is . . . ”, but that just seems like a ridiculous thing to say in informal contexts.

12/1 In response to the recent events in the news, a friend of mine asked me if there’s any- thing that could be said statistically about what an “acceptable” number of unarmed black teenagers getting shot by police officers should be? Ideally zero (obviously) but accounting for human error, the complexity of the circumstances, etc, it gets pretty messy. So, is there a way to roughly determine what an abnormally high number of these cases would be so we can then compare what we observe to some idea of what we should reasonably expect?

12/2 What are good choices of prior distributions for time-varying autoregressive pa- rameters? For example in

yt = ρtyt−1 + other stuff

what are some reasonable priors for the ρt’s? There’s a ton written about cases where

there is just a single ρ, but what about when we want to let ρ vary with time?12/3 Introductory probability textbooks often (always?) use the number of typos on a printed page as an example when discussing the Poisson distribution. Has anyone done a large-scale empirical study to actually see if this is indeed a good approximation (and if it’s better with certain types of texts)? In other words, are the assumptions we have to make so that the typos follow a Poisson distribution actually reasonable? They’re obviously wrong, but how wrong? For example, we need to assume that the typos are independent, but this is violated all the time (e.g. if instead of “weird” I type “wierd” then the “i” and “e” are both wrong but the two errors are not independent. Or does this count as just one typo?). I would assume someone has looked into this, but a google search is pretty useless (you just get page after page of the toy practice problems). Also, how does the distribution change when automatic spell-checkers are used (which catch some types of errors but not others)?

Student #8:

Mon 24 Nov

I was working on my project on the electricity sector reform. Specifically, I want to test the hypothesis: if a country is introduced an independent regulatory agency, the country is more likely to adopt unbundling rather than privatization due to its relatively less political cost (which is opposite expectation to the ‘textbook model’ of reform). I have a binary indicator of regulatory agency, unbundling and privatization, respectively. What would be the best model to test this? Probit would be sufficient using t-1 for regulatory agency, which is main IV?Tues 25 Nov

As I got comments saying that endogeneity issue is highly suspected in my model, I’m considering 2sls. But the concern, or the bigger concern is left: can I use 2sls when an instrumental variable as well as dependent variable is binary? Isn’t there any way to deal with this situation where IV, instrument variable and DV are all binary and endogeneity problem has to be addressed? I found that some scholars suggest Biprobit, but not sure if this is the best one.Wed 26 Nov

To celebrate Thanksgiving, I came to my aunt’s house in NJ. I don’t even remember how long I have been stuck to the Columbia neighborhood. Anyway, it was a great feeling to enjoy ‘outside’ city atmosphere. But one question was come up, especially compared to the toll fee in South Korea (also even taking account for the difference in living cost), I found the toll that drivers has to pay to go across George Washington Bridge is extremely expensive. What makes it that expensive? How much money they collect per hour? And how much they spend for maintenance of the bridge?Thur 27 Nov

I will spend whole Thanksgiving holidays in my aunt’s house. My aunt asked me to help his son’s study, who goes to a high school. I found that he have been basically suffered from the courses requiring some ‘quantitative’ skills, such as mathematics and statistics. Especially, he told me that he had no idea on probability. Upon his vigorous request, I tried to help him to have a grasp of basic concept in probability. What I recognized is that he doesn’t understand ‘counting numbers’ such as ‘Combination’, ‘permutation.’ What would be the best way to understand the difference between combination and permutation in its usage and logic? I often use the term ‘distinctiveness’ to differentiate it; for example whether you want to choose 2 people to clean up the classroom or choose a president and a vice president, which is distinctive position. Isn’t there any other example to let a student to get to know the difference?Friday 28 Nov

Finally, Black Friday! My relatives and I went to the Garden State Mall, which I heard, one of the biggest malls in NJ. As expected, we had to spend half an hour to find an available parking spot. The situation inside the mall was like a hell. It was so crowded and even required us to make a line to spend another half an hour to get into the shop. I was just curious: what the graph of the amount of daily sales in November? Strongly suspect that the trend would keep low before the Black Friday, and jump up on the Black Friday. The, when comparing the trend in December and October, this trend is significantly different, or the average sales in November are also quite different from that in October or December?Saturday 29 Nov

The second my private lecture on the high school level statistics for my cousin. Today, I taught the logic of the ordinary squared regression. I started with drawing a scatter plot of height and weight, then explained the reason why we want to draw a best-fit line. It went very smoothly before I wrote an equation… In order to help him to understand basic mathematics behind it (obviously I didn’t even attempt to teach how to calculate the ‘beta’ using maximization and so on) by reminding him of simple linear equation, and explaining what a and b implies respectively. It was OK. But as I put y-hat instead of y, the situation went different. He was confused on why we have to use y-hat and so on. I patiently explained ‘hat’ here represents ‘predicted value’ and explained the error term and so on. He asked me “if there are errors, why do we need to calculate this line and want to use it?” Then, I was just lost… I didn’t know from where I have to start.Sunday 30 Nov

Now coming back home. I got very good news from my family in South Korea: my older brother is about to marry! I could not help calling my brother. He gave me a very detailed description on how they met and what kind of person his wife is. But he suddenly asked me about my plan for marriage. He added “You know, there are only few Korean girls who want to live in the U.S. while abandoning her job in South Korea and losing a close and frequent relationship with her family and friends.” I just wonder if his statement is statistically right, especially considering the growing number of Korean girls who are dreaming of ‘studying’ in the U.S. these days.Monday 1 Dec

Today, I planned to come up with an idea for identification strategy to test the hypothesis: whether signing the TRIPS (Trade related to Intellectual Property Rights) agreement has positive impact on innovation. Since I’m considering number of patents as a proxy for innovation, following previous literature on innovation, I studied some models that is designed to deal with such a count variable as a dependent variable. I found that there are various methods: Poisson, Negative Binomial, and Zero-inflated model and so on. As a pseudo approach, I tried to estimate negative binomial models but the STATA iterated and iterated and finally gave me an error message. What made this ‘no results’? I think I have to delve into data structure and re-examine the assumptions required to be held for negative binomial model.Tuesday 2 Dec

We presented our project of “Plotting data for dummies (https://joonyang.shinyapps.io/Final/)’ for the statistical communication class. The presentation went well, I guess. Very critical questions are asked from the floor, but I believe that many students understood the reason why we have delved into this project – to let dummies, who do not know any statistical package, plot graphs easily. I personally will work on this project to reflect the comments and to improve further. As a complete novice to R and Shiny, I had some difficulties in pursuing such a project requires some backgrounds on those statistical software. However, fortunately, with my teammates’ sincere helps, I could learn a lot of stuff, and eventually could contribute to this final project outcome. At the same time, this experience allowed me to realize how useful technical skill would be, and further how this kind of knowledge would be efficient for me to pursue my research project. Now I’m motivated enough to delve into R and other useful packages in R this Winter.Wednesday 3 Dec

I returned the results from the Second midterm exam to the students who is taking a class of ‘International Politics”, which I’m TAing for. As the professor asked me to give ‘curve’ – making mean equal to 86, which is low B+ in our grading schemes, I changed little bit and little bit to meet such a request from the professor. Basically, my strategy was to bump all the grade up by one point but it turned out that this strategy would violate another rule: 30% – A / 50%-B /20%-C. What would be the best or the least discriminate way to do this?

Student #9:

Mon 24 Nov

For how many laps do F1 tyres last?

What was the temperature variation on the ground of the Yas Marina at Abu Dhabi Grand Prix and how did it affect the tyre performance? What is the variation in these numbers?

What percentage of handicaps think they’re entitled to acting like jerks and not apologise to people whose foot they have stepped on?Tue 25 Nov

On average how many drops of water continue to be dispensed from a faucet after switching off?

Are the protesters at the Ferguson Riot really protesting for the cause? Or are they just fuelled by their inner needs to be destructive? What percentage of the people is protesting for a constructive cause and what percentage for a destructive cause?Wed 26 Nov

How long do I spend waiting for an elevator at different times of a day? What is the distribution of waiting time?

How accurately can one predict climate change, and the associated changes in sea levels? Can climate change prediction accurately predict arctic animal population (polar bears, seals, penguins) changes?Thu 27 Nov

How many turkeys were killed in the name of Thanksgiving? Why percentage of the total number of turkeys killed was in excess?

Ideas are just ideas. What’s with the need to patent to claim ownership over an idea? What percentage of the patents filed actually provide exclusive right to commercialise, and what percentage is just filed without a useful purpose?Fri 28 Nov

What percentage of high school friends does one keep in touch with in grad school?

What percentage of college friends does one keep in touch with in grad school?Sat 29 Nov

How many percent of engineering/science/mathematics students are introverts?

How many percent of business school/MBA students are extroverts?

Is there a general correlation between academic disciplines and introversion/extroversion?Sun 30 Nov

How many fire alarms go off between 2-4 am in the morning in a month (and in the process wake up everyone else who has to bear the sound of the firetrucks blaring through the city)? Are they more frequent on weekday or weekend nights?Mon 1 Dec

How often does one look at their cell phone in a day? Is there a correlation with different activities, e.g. does the frequency increase in classroom, or when one is dining alone?Tue 2 Dec

What’s the chance of a subway train not being on time?

How many delays occur in a day? How many minutes does an average employed person with working hours from 9am-5pm waste due to commute delays?

Life is fragile, everyday people die unexpectedly. But suicide is not easy either; people have tried to kill themselves by over injection of insulin, or hanging themselves, or wrist slashing. Attempts are not always successful. These attempts, which are cowardice, seem to have higher failure rate than bold ones such. What’s the success rate of suicides, based on boldness of the attempts? How many of those who failed ended up disabled and unable to attempt again?Wed 3 Dec

What’s the chance of a non-smoker developing lung cancer in New York City vs. in a less crowded city with cleaner air?

How many stars can one observe on a night of clear sky in Manhattan in different months of the year?

Student #10:

11/25/2014

I was waiting for somebody’s call at planned time today. I think the real time he called will be like normal distribution with mean equal to the planned time.11/26/2014

When I check the weather on the phone today, it says the chance of snow currently is 73%. I guess this probability means nothing since it is snowing right now.11/27/2014

Happy Thanksgiving! I’m wondering what is the distribution of dates of Thanksgiving.11/28/2014

I see fewer people on the street today. I guess this is not the situation at commercial area. The distribution of people changed because of Black Friday.11/29/2014

I read a set of interesting graphs today comparing living expenses in China and U.S. First it compares eating, clothes, apartment renting and traveling expense. And in general you spend less in China. But salary in U.S is much higher. So it then compares how much food, clothes you can buy with the salary and how many months of housing rent you can pay with the salary. According to this, life in China is much more expensive.

So it’s true statistics and statistical graph can really be misleading sometimes.11/30/2014

When picking a seat at the theater today, I suddenly realize the decision is depend on two variables, one is the personal preference and the other is the selection of other people and so the decision for everyone isn’t independent.12/1/2014

I don’t think of any interesting statistical story today. What is the odds for this?12/2/2014

The light bulb is broken today. It reminds me of the classic example of exponential distribution: Life of a light bulb follows a exponential distribution.12/3/2014

Being asked by someone about different directions of statistics, I suddenly find about there are a lot parts that I don’t even know, like spatial statistics, statistical science. Need to learn more sometime.

Student #11:

Sunday, November 23, 2014

I was thinking about our project “Opening Week Box Office Performance Prediction”. The problem is there are too many categorical variable, too little numeric ones. How about adding average previous box office incomes of opening week for each leading actor and actress. I brought it up but no one respond. Sign. I think for a movie I haven’t watched, it’s hard to know exactly who is the leading actor or actress. And as for average previous box office incomes of opening week for each leading actor and actress, we need to find out how many movies he or she was in and the incomes. Also whether she or he is in leading role or not in previous movies. Real project and real life is really complicated.Monday, November 24, 2014

I go to work today. In the crowed subway, I am seeing people rushing from NYC to NJ and NJ people rushing to NJ. What’s the average commuting time for people who live in NYC? For me, it’s between 4 to 5 hours per day on Monday Wednesday and Friday when I go to work. I wonder what the life is like for others.Tuesday, November 25, 2014

Sometimes I want to call friends back in China, but it’s midnight or some bad time for calling. So I don’t call. There are always time lags between different parts of earth. Will there be significant more calls or times of communication if there is no time lag at all in worldwide?Wednesday, November 26, 2014

It rains and snows today. The library has less people. Some people at work will work from home today. What is the percentage that people’s plan to go somewhere will be influenced by weather?Thursday, November 27, 2014

I had a wonderful thanksgiving meal with 5 friends today. We six people didn’t finish a nearly 20 pounds turkey. How many bites does it take to each a 20 pounds turkey?Friday, November 28, 2014

Yesterday I went to the movie, huger games: mockingjay. It’s bad. I went to see it because my friend wants to see it. My friend wants to see it because she saw it from first movie. Since it’s a series movie, she definite wants to follow. So I am thinking about our projects, actually, there is no very rational reason behind why people go to a movie so that the box office performance will be good. As for me, I go to a movie when I have time and friend to do this. Or I like this actor or actress very much.Saturday, November 29, 2014

There is a theory that people will become expert in one area if he or she has spent 10000 hours on it. In this holiday, while I slept most of it, I wonder why there is no certificate for us to be expert in sleep, for we definitely slept for more than 10000 hours since we’ve born.Sunday, November 30, 2014

I am collecting data for project “Opening Week Box Office Performance Prediction” until 5 o’clock in the morning. For some variable we have look it up manually on the website. For example, we are counting how many nominated and won times for each actor, actress, writer, director etc. I have never done that before. It feels like I am a chef for such a long time that I buy material from markets to cook, but today I have to go to the farm and kill some animal, and drag it to kitchen. Solute to all the people collecting data for us all along!Monday, December 1, 2014

I met several strangers today, who helped me. I had meals with friends, who I have known for more than a year. Is there any association between the feeling of happiness and the number of people we meet everyday?Tuesday, December 2, 2014

I have a Latin quiz today. I have been told long time ago that people can really remember a new word by seeing it in different context 72 times. I hope I can find it out someday.Wednesday, December 3, 2014

Wherever I go, waiting in a line or medical room, or subway, break during the Broadway show, people are looking at their own cellphones. I kind of want to know what is the longest gap between each check on phone for different people. How many of them will go crazy if you take the phone from them for more than one day? (I cannot help laughing when I am writing). And how many time do people spend on phone, doing what, per day.Thursday, December 4, 2014

Tree lighting ceremony is 6pm today. I saw so many beautiful lights on the college walk after class, while I am walking. I cannot help wondering how many bulbs are there in total. I once saw workers putting those lights on, and they are actually twine wires full of bulbs onto the trees. I think the problem can be solved this way: we investigate how many bulbs are there on the wire per meter on average, and estimate how many meters needed for one tree on average, then count how many trees are there for the lighting. At last, we do the multiply these three numbers.

Student #12:

1 Monday, November 24

Tonight my comps study group met to go over some of the literature on ethnicity in preparation for comps in January. It is interesting at the very least to see how the use of quantitative (statistical) methods has evolved in political science. There is still so much unevenness in the quality of empirical analysis. I wonder how the training process (i.e. grad school as a “critical period” for methods training) and the publication review process (i.e. review of papers by people whose understanding of methods is less than it could be.)2 Tuesday, November 25

I’m currently sitting in the airport awaiting my flight. Everything seems to be delayed and passengers are unhappy. Rebooking after delayed/canceled flights seems to be a tedious and very costly endeavor for airlines. I wonder how airlines make their flight schedules in order to minimize the amount of missed connecting flights etc. If the online flight tracking sites are correct, certain routes are much more likely to be canceled/delayed than others. How do airlines account for this? How much variation is there by season? Do flight schedules change seasonally to account for these differences?3 Wednesday, November 26

Today we drove from my parents’ home to my grandmother’s home in preparation for Thanksgiving. We passed the spot where I received my one and only speeding ticket. I still look around to figure out where the cop was lying “in wait” when I was pulled over. How accurate are their speed radars? To what extent is the uncertainty surrounding the estimate grounds for challenging the ticket in court? A quick Google search turns up a bunch of websites giving advice on how to challenge tickets in court which are not particularly interesting to me at this point. (In the interest of full disclosure, I was speeding when I got the ticket…)4 Thursday, November 27

Today, my mom was talking about her geneology research (a new hobby). She sometimes struggles to make sure the “G.E. Miller” that she is finding records for is the right “G.E. Miller.” This piqued my curiosity about how researchers that use historical microdata (like the 19th century US Census microdata now hosted by the National Bureau of Economic Research) match large samples of individuals across censuses. What characteristics are the easiest to use to ensure that you are tracking the same individual beyond names. Birth place? Birth year? Has anyone investigated the best method for doing this?5 Friday, November 28

Today Gram was looking at the Black Friday advertisements in the local newspaper. Retailers seem to prey upon people that don’t really understand percentages. It seems that retailers use multiple markdowns (i.e. “an additional 20% off prices already marked down by 20%”) to make their offers sound better than they actually are. Perhaps there is a better explanation to the complicated sales schemes offered in these coupons but my cynical side is going towards the former explanation.6 Saturday, November 29

I arrived back in New York (well technically Newark) early this morning. In Penn Station, I helped some lost tourists from somewhere in Latin America (maybe Colombia based on the accent?). How many people get lost in the New York subway system every year? How many people are lost at any given time?7 Sunday, November 30

We finally have our Shiny app working, but today I tried to import data with missingness to Shiny which “broke” the leverage and AV plot functions. This was a pain to fix. Visualizing the scatter plot with and without missing data is revealing and an important exercise in a field (political science) with so much missing data.8 Monday, December 1

Today I read some pieces on culture in preparation for comps. One particularly problematic piece from a theoretical and empirical standpoint actually had a ton of graphs. I don’t think I’ve ever seen APSR print so many graphs in one article before. The article was published in 1988–I wonder if they would still publish as many graphs today. Do people just not submit articles with lots of graphs or are there limits on space in publication? Either way, it’s a shame.9 Tuesday, December 2

Today a group of students met with the external review committee that is doing the department’s review this year. I helped pick the group of 10 grad students with another grad student. While we picked on the usual diversity of subfields, year (in the program, and demographic profiles, the people that came up on our list are generally those who are pretty active (comparatively) in the department. How accurately does such a group represent the views of the “median” grad student in the department? Does it matter?10 Wednesday, December 3

Today was the Comparative Politics Seminar which I co-coordinate. Coordinating basically entails lots of logistical work and ordering food. It is never clear how many people will attend the seminars and consequently how much food to order. I wish I had a better sense of how to predict the number of attendees based on the speaker/topic, time of the semester, and attendees’ appetites to order the correct amount of food. (Of course, if they gave us a bigger budget, this would be less of a concern.)11 Thursday, December 4

The students in the Math Methods class that I TA are worried about their final on Monday. They seem preoccupied whether the class will be “curved.” Maybe the answer is to study, not worry as much about the “curve.” Furthermore, it’s grad school… In any case, understanding of a “curve” seems to be a fundamentally misunderstood or at least misconstrued pretty regularly.

Student #13:

Nov.24

Leaves are falling everyday, given the number of trees in a street and the time you need to go through all of them, what is the probability that non of the leaf is falling while you are passing the street?Nov.25

I am making my afternoon tea today and I am wondering if I should put my tealeaves first or after I filled my cup with hot water, which way will enhance the “connection” between hot water and tea leaves and gives me a stronger tea?Nov.26

There’s always traffic on Broadway, every night I walk home there are tons of cars passing by me, thus, given a specific time, let’s say 10pm, how many cars will pass by me on my way home? And how may people will me encounter?Nov.27

I am snacking again, cookies and chocolate!

What makes me want to eat at night? Does weather have anything to do with it? How can people predict the consumption of snacks in a given season and what other things will the consumption of snacks affect?No.28

My roommate and I got the same letter from “Capital One” to encourage us open our credit cards there. How many of those letters are they sending out each week and how can we estimate how many people on our block receive the same letter this week?Nov.29

This is the black Friday weekend, because of the sale, every shop in mid Manhattan has more customers than usual. How much should the promotion be in order to maximize their profit in this day? For example, shops that only have 20% off have much less customers than those have 40% off, however, less discount means more profit.Nov.30

I always woke up late on Sunday and wondering if I should have breakfast or brunch, all those restaurants serving brunch are filled with people. Can I predict the number of people there on weekdays based on the number of people there on Sundays?Dec.1

My roommate is waiting for those packages she ordered on black Friday. Will there be a package deliver delay because of the high volume of packages? And does this delay vary in areas?Dec.2

New Yorkers don’t like using umbrellas when it is raining but not heavily raining. On my way home, only around 20% of the people I pass by are using umbrella. What affect people’s decision of using an umbrella?Dec.3

A lot of my friends are leaving after this semester, it is pretty sad. How many friends can I make each month and how long will it take to have the same amount of friends as I used to ☹

Student #14:

Nov. 23 Sunday

——————–

I encountered eletric outage at my apartment. I wonder how often it happens. I know EdisonCon has ad up in the train station advertising… the data must be very interesting to look at.Nov. 24 Monday

———————

Again on the train I saw the Fizergald’s Law Firm AD. It is a busy and rather disorganized table listing numbers in diferent formats, such as “1.6 billion — XXX court case”, “1,000,000,000 — yyy court case”, and “1000 thousand — zzz court case”. I am really curious about the conspiracy behind this formating. I did realize that for some people a certain format stands out to them more. My friend first thought that 1.6 billion was largest. And for me, I first jump in to convert that 1,000,000,000 to comparable scale. Perhaps, in statcomm class, we will log10 then and make it a graphical display, but the graph will be unreadable on a train AD.Nov. 25 Tuesday

———————-

While walking to class, I saw hotdog stand. Actually two. One was popular with the tourists another with the construction workers. One might think that the taste and pricing of the hotdog probabably is different by reasons like tourist choosing it for convenience, workers for price and/or taste. One can try slapping some anova on the price and taste by customer groups.Nov. 26 Wednesday

————————–

We tend to think that when we are extremely busy, more things happen. Psychologically, we might just tend to take a note of choatic things. But is it real? Can we design a experiement testing the hypothesis?Nov. 27 Thursday

———————–

Yesterday’s weather forecast on my phone was so accaurate to hourly precision. That it almost gained back my trust from many failures over the year. Though, I was

wondering that what is the possibility of it was right merely by chance. Having that thought in mind, I decided to bring my umbrella along with me just in case…Nov. 28 Friday

——————–

As I was filling out extremely demanding application for grad school, I started to wonder how many possible mistakes an applicant can made. And to what extend those mistake would affect an application. Also, some tasks were really redundant such as “List all your work experiences” — “Submit your CV/Resume” and “Send in official transcript” — “Fill in all the courses, grades, course descriptions in your undergaduate and graduate education”. I wonder if they were meant to be there for error checking or for error generating.Nov. 29 Saturday

———————-

Over the busy Thanksgiving, at least, I was able to squeeze out a night for friends gathering. We had fun eating homemade hotpot. However, since hotpot ingredients are not ready-available around Manhattan, we spent a whole day shopping for them in Chinatown (or one could be deligent enough and go to Flushing). We were half-joking about the potential market with hotpot ingredient pack along with delivery service. At least around the Columbia, there will be market for it. We could use statistics to forecast the demand of certain combination of ingredients. Once we set up online ordering system, we can start making some profit.Nov 30. Sunday

——————–

I saw a post on a social network platform about how hotdog stand owners on average earn 3 times more than a business or science undergrad graduate in Toronto. I am not sure if it is really comparing fairly. The post’s conclusion was like: why bother a student loan for education? Using statistics, everything seems more grounded, but is that so? Simplest thing that they did not consider was that are those earning based on the same working-hours.Dec 1. Monday

——————–

Phone rang and I was “bidding” on who is the call for in the family. I bet if I have record I can win the bid.

Student #14:

November 23rd –

I was reading Foucault’s Discipline and Punish for a class and I was curious about the peasant uprisings during the medieval ages – especially since they preceded larger revolutions, notably the French Revolution. I wonder where I could get data on this. It could be interesting to look at peasant networks in small European cities, though I don’t know if there are records like that.November 24th –

Simmel discusses how family power dynamics are affected by the number of children. I was wondering if individuation is also affected – i.e. that in larger families, parents have less time to dedicate to children, they cater less to individual needs and therefore, they become less like individuals or that individuation happens as a child-to-child process rather than parent-to-child process. I also wonder if we would see it in the naming conventions of larger families… maybe children of higher birth order are statistically more likely to have generic names.November 25th –

I have been thinking a lot about the nature of social network ties – which are so often treated as durable entities – though this durability, to me, seems like a fiction. For example, trading with someone is an event, the tie lasting a very short time as the sellers are engaged in a transaction – but upon completion of the transaction, the tie does not necessarily continue to exist. This has implications for how we model the duration of social relationships – especially in studies of diffusion.November 26th -

I had this idea that we could measure the volatility of networks by looking at a network’s autocorrelation between its different cross sections. If we move the network forward only a tiny bit, the autocorrelation should approach 1; and the farther away the cross-section of t+c is from the original cross-section, the smaller the auto-correlation. This could be useful comparatively – i.e. that two networks have very different auto-correlations for the same timings; giving us idea about the speed at which they are changing.November 27th –

Clustering algorithms for networks seem like a place where a lot of advancements could be made. The ones I have come across really struggle with smaller networks – which isn’t surprising since a lot of analyses have trouble with small ns – but it is still frustrating because the groups are apparent, but the algorithm still misses them. The same goes for directed graphsNovember 28th –

In light of Ferguson, I was wondering about the statistics regarding law enforcement violence. On the internet, people tend to think it is increasing rapidly (as a result of the Patriot Act), but it is rarely backed up by data. Perhaps the data is unreliable, but I should look.November 29th –

A sociologist I follow on Twitter posted, “US violent crime rates have been falling for 20 years. This general increase in safety has made properly structural violence more visible.” I wonder how we could actually test that and whether it is an actual structural effect or a change in how the media reports violence (i.e. ideological).November 30th –

I am working on a project where individuals were in groups of four for a period of time. I have their network data and wanted to analyze the observed structures in relation to the set of potential structures; but I haven’t found any literature on what would essentially be a triad census for nodes of four.December 1st –

I read this paper by Katie Faust about interpreting the results for higher level effects in networks, where she argues most higher order figurations can be fully explained by lower order ones, namely dyads – and I wonder what that means for interpreting dependency in networks. She finds that most networks depart from the lower order expectations; which is really fascinating.December 2nd -

I am just leaving Abu Dhabi and I heard about a stabbing of an American teacher here last night/yesterday. There is a dearth of crime statistics in Abu Dhabi, and those that exist are notoriously difficult to access. For awhile, I have been interested in suicide statistics here, especially since workers are in a difficult and unique structural position which might lead to higher suicides.December 3rd –

I like this idea of “levels” in Harrison White and in multilevel modeling- I think there needs to be more network papers which exploit the multiple levels of existence in human society – that is, estimating the co-determinacy of two separate, yet connected networks, which could be hierarchically intertwined or on the same plane and yet somehow affecting each other… I think about this a lot in relation to scheduling, that is to say, your schedule is dependent on multiple spheres (or governed by your multiple roles/positions in society). Part of it is a duality problem, but it is also feasible to think of your work network and schedule as separate from your home one, and yet fully dependent due to time scarcity.

Student #15:

November 24th

Today we had a Thanksgiving get together in our department. I decided to contribute some cured meats and cheese. I was in a hurry while I was at the grocery store so I forgot that quite a few people in my department are vegetarian. This fact dawned on me as I walked out of the store with a good amount of meat and thought, ‘What is the likelihood that even half of my meat and cheese platter gets eaten?’I guessed, based on my prediction of who would actually come to the dinner — a baseline established due to past social event attendance — that the likelihood would be around .4. Not too likely. In the end, turns out my guess was pretty accurate, because I’m bringing more than half of the initial platter home as leftovers.

November 25

Thinking about whether visualizing missing data for my college dropout project would give me a better sense of whether the missingness is systematic. Right now, I’m just using numerical output, but I like the idea of plotting the number of missing values of my most important variables against student idea to try to find patterns. How would I do this?November 26

How would I figure out the chances that the 1 train would get stuck with a sick passenger two days in a row? Because it’s happening today. Besides figuring out the overall frequency of a sick passenger on the 1 train, I probably would want to condition on other factors like time of year (are people sicker in the late fall / winter), and the average age and income level of the passenger. But two days in a row?! Come on.November 27

Today my husband and I worked at a soup kitchen. We were in charge of the dessert station and were told in no uncertain terms that guests could only take two pieces of dessert pies and cakes and that was that. But there was so much there! I thought the likelihood that we would run out was nearly zero. I tried to convince the woman overseeing our overseeing that the guests could take 3 pieces easily by counting the number of guests and the number of pies and cakes and simply dividing. But she made the very good point that I hadn’t taken into account that the cakes and pies were not equally distributed – we had much more sweet potato pie than red velvet came – and nor were guests’ tastes. People seemed to be going after the red velvet cake more than anything else. So – it wasn’t fair to let people take 3 slices of red velvet cake before others had had 1. Fair enough. This statistics problem is a bit more complicated.November 28

Today I thought a lot about the real-world implications of prior distributions. I did so because Ben and I had a bus to catch to travel down to my mom’s house in New Jersey. The last time we traveled there, we missed our scheduled bus because Ben’s brother wanted to get a coffee. This current time, Ben’s brother wasn’t with us, but I was totally paranoid that we still we’re going to miss the bus. So I tried to ‘overcorrect’ by pushing us to leave our apartment waaaay earlier than necessary. This overcorrection ensured that we arrived in good time, but somehow still felt nervous – as if something still could go wrong, derailing our trip. I in turn realized just how strong an impact prior experiences can have on current behaviors and predicted outcomes. It was a nice moment to appreciate the power of Bayesian inference.November 29

My grandparents were in the restaurant business while I was growing up and they would always serve Thanksgiving dinner there on Thursday. As a result, our Thanksgiving always has been on Saturday, even now that the family has sold the restaurant. I wonder what the distribution of Thanksgiving meals looks like in the US across the days of the Thanksgiving weekend. Does the spike on Thursday represent 99% of the population, or is there more variety in the distribution? It would be fun to survey a representative sample to find out.November 30

I conducted an interview today with a longtime resident of Chelsea as part of the Amsterdam Avenue project.Â As a Conservative, she claimed that no one else in the Chelsea neighborhood shared her points of view. This statement of course must be false; I would guess there is at least one other Conservative in Chelsea! But this got me thinking – what is the distribution of political identities in Manhattan by neighborhood? What would it look like on a map? I’m sure someone has done this before — I need to look it up because it’s very interesting!December 1

I reworked a series of graphs today that I had included in a conference submission earlier in the fall. Some of them were histograms, where the y-axis signified “density”. Thinking about it a bit, I realized my audience likely would have very little idea about what “density” actually meant in that context, inducing confusion about the meaning of the graph on the whole. We can’t have that! I ultimately changed the y-axis to signify “proportion of the total”, which is a lot clearer, if you ask me. “Frequency” would also be another way to go.December 2

We’re having some trouble with the visualization for our final project, mostly because of bugs. It always frustrates me that the most labor intensive aspect of writing code seems to be figuring out what the heck is wrong with it. I wonder if there will ever be a tool that universally works across all sort of applications and programs to catch and correct bugs. That would be awesome.December 3

Today I was responsible for ordering food for a group of about 10 people meeting about the Amsterdam Avenue project. As I was trying to figure out what to order, the most difficult question for me was, “How many people are vegetarian or prefer vegetarian food?” This question was particularly salient following the Department Thanksgiving meat platter debacle. So of course I tried to think about any known vegetarians in the group (I did not know of any) in conjunction with how many people I needed to feed, in conjunction with the other options that would be available. Ultimately – and similar to my entry about Bayesian priors and transportation – I totally overcompensated. We had nearly two full containers of tofu with eggplant leftover, suggesting that my probability calculations were all wrong. That said, I actually benefitted in that I got to take some of the leftovers home. Did I subconsciously think about this?! I guess I’ll never be 100% sure on this vegetarian issue, but at least I’m collecting enough data to make some better inferences.

Student #16:

November 21, 2014

I consider that the hardest part of talking with a person that doesn’t know much about statistics is trying to express the meaning of average, this has been a recurrent idea of trying to express points. My point been is that you argument in terms of average and the reply in terms of outliers.For example, if you talk about social mobility and make a comment such as “for blacks is harder to go their way up to the ladder”, they respond something like you know LeBron James did it, and he was poor. I really hate this kind of comments.November 24, 2014

I don’t know why the opportunity cost and rationality are so ingrained in economic theory, and why sometimes we have such a short view of things.

I really think that the best way to teach something is to put you in the other person shoes and try to explain, especially in statistics. I very good friends is a lawyer and I see him desperate trying to grabs some concept, but for him is specially hard since doesn’t have any background and the examples are for machine production.November 25, 2014

I think the project of stan and shiny is amazing. I have to learn Bayesian statistics.

I think there was a lot of discussion in the income inequality group. I wish I had more time to analysis their hypothesis.November 26, 2014

How many travel during thanksgiving? does the accident toll in those days increase further than it should.November 27, 2014

About two months ago some protest started in Mexico due to the disappearance of 43 teachers that were going to become teachers. Mexican security forces were looking for mass graves to find the corpses of the 43 students because they believe they were dead. In the process they found around 30 bodies more. This make me wonder how many people are around those surroundings areas that nobody know that they dissapeared.November 28, 2014

Fucking thesis!November 29, 2014

I am trying to learn Bayesian statistics by myself. I don’t know if I got the right books. I bought Doping Bayesian Data Analysis and Bayesian Data Analysis. I think I will start with the first one, since it seem the approach is more basic.December 1, 2014

Monday, and I didn’t have much sleep. I don’t know if this is normal, I just feel more tired Mondays than Fridays, maybe it has to be with my expected value of going out, meaning that from Monday to Friday is work and Friday to Sunday is more fun. I wonder if there is a distribution of adults about when they get more tired.December 2, 2014

Rachel’s Monty Hall post is amazing. I think a shiny application simulating this paradox will be pretty cool, another idea is to use Monty hall paradox as a teaching technique.

I think this also will be good for my decision models class with professor Riccio.December 3, 2014

Today we sent approximately 15 mails to arrange a meeting. I was wondering on average how many mails and how difficult is to get a meeting.

It is racial profiling really statistical significant.

Thoughts about a Statistical Diary

I think carrying a little notebook to keep the statistical diary will be a good idea, specially because the majority of the statistical things I think happened during the day and sometimes when I’m home I don’t remember exactly what and why I though those thing were interesting or even statistical and the results is, at the end, some broken stories.

In summary, the statistical diary is a great, but I think should be accompanied by a small notebook, even potentially research ideas could arise.

I love these diaries so so much. I don’t think I can convey to you how much I love them. They make each student come to life.

One difficulty of keeping a “real” diary is that many of the most important things in our lives are private and personal; it can be hard to write down these things even to ourselves, let alone to share to others. And once you start censoring yourself, it can become difficult to write fluidly, also it’s like, what’s the point?

In contrast, statistical topics are usually less private. Indeed, even when a topic is highly personal, a statistical framing can provide some comfortable emotional distance.

You should all keep statistics diaries.

The post What to do in 2015: Your statistics diary appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Stone cold sober as a matter of fact appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>We had some problems, I have no idea why. But it looks ok now. We’ll reschedule some posts that appeared briefly during the rebuilding process.

The post Stone cold sober as a matter of fact appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Relaxed plagiarism standards as a way to keep the tuition dollars flowing from foreign students appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Randall Westgren writes:

I believe the easiest part of the patchwriting [plagiarism] phenomenon to understand is why writing instructors are leading the charge. Professor Howard is caught between a herd of high-value (i.e. full-tuition and fees paid) students who have negligible communication skills in the language of instruction and the administration and governing board of her university who actively seek these students to balance the budget. And the rhetoric instructors, writing coaches, and ESL instructors are “letting down” the university AND these students when they cannot ameliorate their English language within the bounds of the US curricula. I suspect the pressure on persons like Professor Howard is immense, as they are caught between the professoriate’s understanding of plagiarism as a failure of academic principle and the wishful thinking of the administrators and international students in the admission decision.

This doesn’t explain Laurence Tribe, Ed Wegman, Karl Weick, Frank Fischer, Matthew Whitaker, etc., but I suppose it all goes together: if much-honored faculty are copying without attribution, it’s harder to motivate instructors at these universities to insist that their hard-pressed students write everything in their own words.

The post Relaxed plagiarism standards as a way to keep the tuition dollars flowing from foreign students appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post “Why continue to teach and use hypothesis testing?” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Why continue to teach and use hypothesis testing (with all its difficult concepts and which are among the most statistical sins) for problems where there is an interval estimator (confidence, bootstrap, credibility or whatever)? What is the best explanation (if any) to be given to students? only tradition?

I won’t attempt to answer this question but I will comment on the replies. Notably to me, none of the replies said anything about controlling Type 1 error rates or anything else. Rather, the main defense of hypothesis testing were not defenses of hypothesis testing at all, but defenses of decision analysis.

This is interesting because in Bayesian inference, decision analysis comes automatically (I’d say “pretty much for free” but that’s not quite right because it can take effort to define a reasonable utility function. You could say that this is effort worth taking, and I’d pretty much agree with that, but it is effort.) so it doesn’t need a any special name. To do Bayesian decision analysis you don’t need any null and alternative hypotheses, you just lay out the costs and benefits and go from there.

But, for people with classical training, “hypothesis testing” is a thing. And I agree that, if all you have is interval estimation, you need to take some other step to get to decision analysis.

Some of the discussants to this post did discuss Bayesian inference so it’s not like I’m saying that my above thoughts represent some deep new idea. My point here is that I’ve typically taken hypothesis testing at face value (as a way of evaluating the evidence against a null hypothesis), but I suppose that for many people, hypothesis testing is the default statistical tool for decision analysis. Scary thoughts.

The post “Why continue to teach and use hypothesis testing?” appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post Stethoscope as weapon of mass distraction appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Macartan Humphreys sent me a Shiny app demonstrating you can get statistical significance from just about any pattern of random numbers. I posted it, and, in response, commenter Rahul wrote:

It sure is a cute demo but it’s a bit like insinuating a doctor’s stethoscope is useless by demonstrating ten ways in which it can be mis-used.

And, indeed, if the doctor’s stethescope were being used to routinely pubish spurious findings in the leading research journal in psychology; if leading figures in psychology such as Ted-talk star (and respected researcher) Daniel Gilbert were to vociferously defend ridiculous claims of fecundity and clothing which are based on nothing but a highly malleable theory and that doctor’s stethethscope; if the New York Times and various other news outlets reports a claim about ESP which is based, again, on nothing but that stethescope; if Steven Levitt, the leading voice in academic social science, lends his platform to endorse an innumerate claim about beauty and sex ratios, a claim that, you guessed it, is based on no evidence beyond what came from that stethoscope; if well-respected political scientist Larry Bartels posts on the leading political science research blog to promote a study on the effects of subliminal smiley-faces as “punching a big hole in democratic theory,” based on that stethoscope; if world-famous psychology researcher Daniel Kahneman uses the stethoscope to insist that “you have no choice but to accept that the major conclusions of these studies [on social priming] are true”; if brilliant economist James Heckman points to the stethoscope as evidence of large effects of early-childhood intervention programs (ironically, in doing so ignoring selection, the problem that made him famous); if all that is happening, then, yes, I’ll continue to explore what’s going wrong here.

You could of course take the quite reasonable position that Macartan Humphreys, Brian Nosek, etc., and I are wrong and that Daryl Bem, Daniel Gilbert, Steven Levitt, Satoshi Kanazawa, etc. are right. Fair enough—ultimately you have to make your own judgment. The point is, this is a live issue. It’s not just that the “stethoscope” *could* be misused; it’s that (in the judgment of myself and many others whom I respect) the stethoscope *is* being misused, all the time.

Fundamentally, the problem’s not (just) with p-values or with any particular technique. I see the problem as being with the entire hypothesis testing framework, with the idea that we learn by rejecting straw-man (or, as Dave Krantz charmingly said once, “straw-person”) null hypotheses, and with the binary true/false attitude which leads people to believe that, once a result is judged statistically significant (by any standard) and published in a good journal, that it deserves the presumption of belief.

The post Stethoscope as weapon of mass distraction appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post What to think about in 2015: How can the principles of statistical quality control be applied to statistics education appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>A few years ago, Eric Loken and I wrote, Statisticians: When we teach, we don’t practice what we preach:

As statisticians, we give firm guidance in our consulting and research on the virtues of random sampling, randomized treatment assignments, valid and reliable measurements, and clear specification of the statistical procedures that will be applied to data. With self-assured confidence that we occupy the moral high ground, we share horror stories about convenience samples, selection bias, multiple comparisons, and other problems that arise when those less enlightened about proper methodology don’t follow the rules.

But are we really consistent in all aspects of our professional lives? How do we approach teaching? The following generalizations apply to most of us:

We assign grades based on exams that would almost surely be revealed to be low in both reliability and validity if we were to ever actually examine their psychometric properties. Despite teaching the same courses year after year, we rarely use standardized tests.

We almost never use pre-tests at the beginning of the semester, either to adjust for differences between students in different sections of a course or even for the more direct goal of assessing what has actually been learned by students in our classes.

We evaluate teachers based on student evaluations which, in addition to all their problems as measuring instruments, are presumably subject to huge nonresponse biases. Would we tolerate client satisfaction surveys as the only measure of hospital quality?

We try out new ideas haphazardly. Not only do we not do randomized experiments, we generally do not perform any systematic comparisons of treatments at all. As one high-level administrator put it to us recently, “It would be good if we introduced our new teaching methods based on some- thing more than a ‘hunch.'”

We continued:

The statistical field of quality control emphasizes the process of monitoring and improving a system, rather than focusing on individual cases. When we teach, however, we tend to focus on what seems to work or not work in an individual course, rather than on improving the process or the sequence. Consider how entrenched the freshman science sequence is at many large universities.

The contradiction is especially clear because we actually teach the stuff we believe in our classes and expect the students to parrot it back. However, we do not, in general, conduct our classes in a manner consistent with the principles we teach. . . .

And we concluded:

Being empirical about teaching is hard. Lack of incentives aside, we feel like we move from case study to case study as college instructors and that our teaching is a multifaceted craft difficult to decompose into discrete malleable elements. . . .

In making our practice more research-based and our teaching more practically focused, it would make sense to involve the entire educational team, including members of college and university administrations who set curricula, permanent faculty who organize courses, adjuncts and teaching assistants who perform much of the grading and face-to-face teaching, and writers of textbooks and educational materials.

This all sounds good. But what have I done about this since we wrote the above paragraphs? Lots of preaching, no practicing.

So, on this first day of the new year, I think we should all reflect on how to apply the *statistical principles of quality control* to *statistics teaching*. And to education more generally, of course, but let’s start with the problems that are right in front of us.

The post What to think about in 2015: How can the principles of statistical quality control be applied to statistics education appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post On deck this month appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>What to think about in 2015: How can the principles of statistical quality control be applied to statistics education

Stethoscope as weapon of mass distraction

“Why continue to teach and use hypothesis testing?”

Relaxed plagiarism standards as a way to keep the tuition dollars flowing from foreign students

What to do in 2015: Your statistics diary

Artist needed!

Bayesian models, causal inference, and time-varying exposures

When a study fails to replicate: let’s be fair and open-minded

“In general I think these literatures have too much focus on data analysis and not enough on data collection.”

A completely reasonable-sounding statement with which I strongly disagree

Cracked.com > Huffington Post, Wall Street Journal, New York Times

What’s misleading about the phrase, “Statistical significance is not the same as practical significance”

Interactive demonstrations for linear and Gaussian process regressions

Stan comes through . . . again!

I need your help in setting up the ultimate bracket: Picking the ideal seminar speaker

Why I don’t use the terms “fixed” and “random” (again)

Lewis Richardson, father of numerical weather prediction and of fractals

Lee Sechrest

“Surely our first response to the disproof of a shocking-but-surprising claim should be to be un-shocked and un-surprised, not to try to explain away the refutation”

Another benefit of bloglag

High risk, low return

Patience and research

This is why I’m a political scientist and not a psychologist

“What then should we teach about hypothesis testing?”

Tell me what you don’t know

The (hypothetical) phase diagram of a statistical or computational method

“It is perhaps merely an accident of history that skeptics and subjectivists alike strain on the gnat of the prior distribution while swallowing the camel that is the likelihood”

Six quick tips to improve your regression modeling

“Another bad chart for you to criticize”

Cognitive vs. behavioral in psychology, economics, and political science

Economics/sociology phrase book

Oh, it’s so frustrating when you’re trying to help someone out, and then you realize you’re dealing with a snake.

Best wishes for a happy and productive 2015!

The post On deck this month appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>The post A New Year puzzle from Macartan Humphreys appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>Macartan writes:

There is a lot of worry about publication and analysis bias in social science research. It seems results are much more likely to be published if they are statistically significant than if not which can lead to very misleading inferences. There is some hope that this problem can be partly addressed through analytic transparency. Say researchers said what analysis they were going to do before they do them? Then that removes researcher degrees of freedom to analyze data in ways that might lead to too many significant results. It makes it hard to select some analyses that seem interesting while forgetting others that seem less interesting.

That’s an argument for “research registration.” It is a good argument but there is no evidence that registration really works. We might be able to find some evidence though. Medical journals started requiring registration in 2005. Did that affect the sorts of results that got published in these journals? We plan to find out.

See here for our strategy, which (because there are many equally good ones you might try) we are registering before we implement.

Before we do though, we would love to learn from social science and medical researchers what they expect to see in the data and what they will conclude when we find whatever we find. The puzzle form is here. Send in your guesses about the effects of registration in order to win a prize and extraordinary prestige.

And here’s what it says at the link:

We want to figure out whether the introduction of registration requirements affected the types of statistics published in medical journals. We have the data but we have not looked at it yet. To avoid doing the kind of data fishing that many hope registration will prevent we are registering our analysis plan before we start. See details here http://www.columbia.edu/~mh2245/p/

One innovation of this project is that we want to figure out what we learn about registration relative to current beliefs. What do you expect we will find? Has there been an effect or not?

Here you can give us your best guess for:

(i) the difference in the share of p values just below 0.05 before and after 2005 and

(ii) the difference in the share of all p values below 0.001 before and after 2005.And for the more intrepid we ask that you also provide a full specification of your *prior* (current) beliefs as well as an indication of your belief about the appropriate classical test.

When Macartan sent me the above graphs, I wrote that it appears that the histogram for poli sci papers has spikes just below 1, 2, and 3. And, what are the spikes just below 1 and 3 all about, anyway? There’s something going on here that I don’t understand at all.

To which Macartan replied:

I have been pretty worried about the poli sci spikes at 1, 2 and 3 and they make me wonder about whether the evidence for too much just over 1.96 is largely, or at least in part, an integer spike.

We looked into a set of the 2 spikes and they look like they are coming from rounding; with bs and ses given two two decimal places and z scores imputed from these. eg b = 0.02, se = 0.01 gives z at exactly 2; b = 0.02, se=0.02 gives exactly z=1. Not great.

The medical data spikes at the p numbers, .06, .05, .04, .03, .02, .01, .002, .001, not the z’s since they report the p’s directly.

Ahh, that makes sense. And the apparent peak below the integers could be an artifact of the plotting program, that it takes 2.0 and puts it in the [1.95,2.0] bin rather than the [2.0,2.05] bin, etc.

On thing that bothers me about the “spike just above 1.96” thing is that I feel it puts the focus on the wrong place. After all, suppose that there was no selection around 1.96 but otherwise people did the same thing with garden of forking paths etc. We’d still have all these problems!

The post A New Year puzzle from Macartan Humphreys appeared first on Statistical Modeling, Causal Inference, and Social Science.

]]>