Just don’t do [IV] …

Here are the problems.First of all, no matter what you may have read or been taught, identification is always and everywhere an ASSUMPTION. You cannot prove your IV is valid. …I pretty much refuse to let my grad students go on the market with an IV in the job market paper. No way, no how. Even the 80 year old deadwoods in the back of the seminar room at your job talk know how to argue about the validity of your instruments. It’s one of the easiest ways to lose control of your seminar.

As we say in Minnesota: That’s different. Two things:

- What about those cases where you have an IV that is randomly assigned? For example, what about those cases where you offer people randomly varying cash incentives to take up some kind of treatment in order to estimate the LATE? Here, there is no assumption (or is that ASSUMPTION?) made, beyond the assumption that the significance of your IV in your first-stage regression (i.e., its relevance) is not due the result of chance.
- Honestly, if we are going to go there–that is, question weak IV tests/tests of relevance because a rejection of the null might be due to chance, and indict IV for that–then can I point out that it is
*also*an assumption that your failure to reject the null in test of parallel trends in a diff-in-diffs context is not due to chance? (That is, when the parallel trends assumption is*actually*testable.)

And then there’s the following (the emphasis is mine):

We’ve had really good luck placing students who used Diff in diff (in diff),

propensity score matching, synthetic control, and even regression discontinuity. All of these approaches have their own problems, but they are like little grains of sand compared to the boulder-sized issues in IV.

Huh? I have seen propensity score matching applied wrongly way more often than I have seen poor IVs or IVs whose exclusion restrictions were not really explored and discussed. I have lost count of the number of papers I have read where the authors think that matching on observables also implies matching on unobservables, and that one can just use PSM to buy causal identification on the cheap.

Don’t get me wrong: If you are going to use an observational IV, you *do* need to think very carefully about how and why it meets the exclusion restriction. And if it does meet it, you need to pray that it will be a relevant IV. But there are clear cases where IV works, and that is especially the case in a setting where you randomly assign the IV, or in quasi experimental settings where people are assigned to some treatment at random (e.g., Angrist’s famous Vietnam draft lottery setting).

Identifying causal effects is hard. Willingly limiting yourself to a subset of methods and declaring one method off-limits is like a football coach saying he doesn’t want his quarterback to ever try to pass the ball. So when Levi Johnson (of Farmer Hayek Blog) writes:

.@ez_angus says "Friends Don't Let Friends do IV" but I suspect @mfbellemare would disagree! http://t.co/91NjhxlT8S http://t.co/vls6gXlGCq

— Farmer Hayek Blog (@FarmerHayek) October 2, 2015

I say:

*Click here to view the embedded video.*

]]>

An old friend (not an applied econometrician) writes (via Facebook, in case you wonder about the telegraphic style of the query):

Econometrics question – have a M.Sc. student doing a study on conservation agriculture (CA) and is developing instruments for CA component use. Any suggestions on appropriate instruments?

My (less-than-helpful) answer:

What’s the outcome of interest Y? What’s the treatment variable/variable of interest D? What controls X are included? All of those work as a kind of ecosystem–without knowing what are the component parts of it, I can’t come up with a good idea for an instrument Z.

It’s the regression-as-ecosystem comment that I wanted to discuss today. Indeed, if you are interested in causal effects–and who isn’t, these days?–you have to see any regression of interest as an ecosystem where things live or die as a function of other things in the system.

This is especially the case if you don’t have an experiment or a quasi experiment, and you have to rely on an instrumental variable (IV) that is nonrandom. In the “cookbook econometrics” class I teach every other year to our doctoral students, I tell students that an IV lives and dies by the controls it is surrounded with, a point that is obvious once you start thinking about it, but which is made all too rarely. Indeed, here is something that I bet is taking place almost daily throughout the world in economics seminars:

- The presenter is interested in the causal relationship flowing from some treatment D to some outcome Y.
- The presenter recognizes that Y and D are jointly determined, and is thus using an instrument Z to get at it.
- A clever member of the audience says: “Yes, but have you considered [channel through which Z violates the exclusion restriction]?”
- The presenter says: “You’re right–in principle. Because I have [specific variable] in my set of controls X, the exclusion restriction is still met.”
- Clever member of the audience: “Ok, okay.”

Here is a real-life example: In my food prices and food riots paper, in which I was interested in the causal effect of food prices on the extent of social unrest worldwide, I used natural disasters worldwide as an IV for food prices. A few times in seminars, I was asked: “Yes, but you don’t control for the income of food consumers, and that’s an omitted variable.” Notwithstanding the fact that natural disasters are also orthogonal to income (and that it is not clear that you want to include an obviously endogenous control such as income in the regression I was estimating), my response was: “Yes, but I am regressing on the real–not nominal–price of food, which controls for the overall price level and thus, presumably, for wages, which themselves determine most people’s income levels.”

At any rate, I’m not sure I have much more of a point than “All the pieces matter,” to quote fictional detective Lester Freamon, and that when thinking about causality, you have to consider Y = f(D(Z,X), X) + e as a whole, and not just D(Z) or even Y = D(Z).

If anything, that is where the use of directed acyclic graphs (DAGs) comes in handy, and why I advocate that our students (i) read (some of) Judea Pearl’s *Causality*, and (ii) use DAGs when they start thinking about an empirical problem.

I, along with a number of other researchers in the areas of agriculture, climate, and food, am one of the authors of the US Department of Agriculture’s assessment report titled “Climate Change, Global Food Security, and the US Food System.”

In the September 8 issue of the *Federal Register*, USDA chief economist Rob Johansson requested public comments on the report. If you would like to make public comments, you can find the notice here. Comments have to be submitted before 11:59 pm on October 8, 2015.

In that paper, we first review the relatively small literature studying the effects of price risk and uncertainty on consumers, producers, and households. Then, we discuss the potential of experimental economics in identifying price risk preferences and of behavioral economics in developing theoretical models that are closer to how people actually behave. In doing so, we wanted to lay out a future research agenda on price risk.

Here is the abstract:

After several decades of neglect, the food crises of 2007-2008 and 2010-2011 have brought food price volatility back on the policy agenda. The study of price volatility, however, is really the study of price risk and uncertainty as they relate to individuals, households, and firms. Because the study of behavior in the face of risk and uncertainty has mostly focused on behavior in the face of income risk and uncertainty, we first review the theoretical and empirical literatures on behavior in the face of price risk and uncertainty. Then, because policy recommendations are only as good as the empirical findings on which they are based, and because market-level phenomena such as price risk do not lend themselves to randomization, we discuss the many ways in which experimental economics can inform our understanding of price risk. Finally, because expected utility–the workhorse model used to study behavior in the face of risk and uncertainty–fails to account for a number of behaviors, we discuss how insights from behavioral economics could be incorporated into the study of price risk, with the ultimate goal of generating more policy-relevant findings.

* This paper is a kind of homage to some of the scholars who have shaped my thinking. The title of the paper is a nod to Jean-Louis Arcand, whose undergraduate development economics class made me want to become a development economist when I took it in 1996. (It turns out that Jean-Louis in turn got the “earnest search” expression from Gustav Papanek.) The organization of the behavioral part of the paper is a nod to Ted O’Donoghue, whose graduate psychology and economic theory class was similarly organized when I took it in 2003. Though I am not a behavioral economist by any stretch of the imagination, Ted’s class was a huge influence on how I approach theorizing.

]]>First off, what is heteroskedasticity? It is an issue that arises when the variance of the error term e in

Y = a + bX + e

is nonspherical. Seeing as to how this isn’t exactly helpful, a more intuitive explanation is the following:

Technically speaking, you have a heteroskedasticity problem when the variance of your error term is nonconstant between observations.

An example I like to use to talk of heteroskedasticity is the following: If you were to regress the demand for sushi on income on, you’d likely see some heteroskedasticity. People with low income tend to have a fairly low consumption of sushi, and there are relatively few departures from the average. People with high income tend to have a higher consumption of sushi, but there are many departures from the average because even among high-income people, some people might dislike sushi, and some people might really like it, which would make a regression of the demand for sushi on income look like the graph on the right-hand side above.

One of the issues with heteroskedasticity lies in how it is often taught as an econometric problem among many. That is, it is not uncommon for grad students to think of heteroskedasticity as no less of a problem than endogeneity, since both problems with lead to the OLS estimator no longer being BLUE.

In practice, however, heteroskedasticity is much less of a problem, for two reasons. First, it is easier to deal with heteroskedasticity than it is to deal with endogeneity. In the context of a linear regression, it is very easy to estimate using a technique that will make your standard errors robust to heteroskedasticity of an unknown form. Dealing with endogeneity is much more difficult: It requires an instrumental variable, a research design with a clearly exogenous source of variation, etc.

Second, heteroskedasticity *does not bias your estimate of b* above–it merely makes the OLS estimator not be the best (i.e., minimum variance) among linear unbiased estimators. Though one might not be wrong in arguing that this is no less of a problem than dealing with a biased estimator, it turns out that what we care about these days is the identification of causal relationships, which is related to getting an unbiased estimator (or, if one’s estimator is biased, it should be biased in a direction which strengthens the case for there being a causal relationship).

So all this amounts to heteroskedasticity being less severe of a problem than what one might prima facie believe when sitting in econometrics classes.

So when is heteroskedasticity a bit more of a problem? From the Wiki entry on heteroskedasticity:

For any nonlinear model (for instance logit and probit models), however, heteroskedasticity has more severe consequences: the maximum likelihood estimates of the parameters will be biased (in an unknown direction), as well as inconsistent (unless the likelihood function is modified to correctly take into account the precise form of heteroskedasticity). As pointed out by Greene, “simply computing a robust covariance matrix for an otherwise inconsistent estimator does not give it redemption. Consequently, the virtue of a robust covariance matrix in this setting is unclear.”

Or, as Dave Giles has pointed out a long time ago: You can’t just add “, robust” at the end of your “logit” and “probit” commands in Stata and call it a day, because as the excerpt above indicates, in such contexts, you have to correctly specify the form of the heteroskedasticity you’re dealing with. Given the popularity of nonlinear estimators among graduate students (who often seem to prefer using “fancier” MLE-based estimators, because fancier simply appears to translate into “better”), this needs to be emphasized and, for me, it militates in favor of estimating linear probability models (which can accommodate standard errors robust to general forms of heteroskedasticity) over probits and logits.

]]>Here is the abstract:

]]>This paper provides evidence on the importance of reputation in the context of the Kenyan rose export sector. A model of reputation and relational contracting is developed and tested. A seller’s reputation is defined by buyer’s beliefs about seller’s reliability. We show that (i) due to lack of enforcement, the volume of trade is constrained by the value of the relationship; (ii) the value of the relationship increases with the age of the relationship; and (iii) during an exogenous negative supply shock deliveries are an inverted-U shaped function of relationship’s age. Models exclusively focusing on enforcement or insurance considerations cannot account for the evidence.

After some discussion, it was determined that I would write on the welfare impacts of participation in contract farming for smallholder farmers in developing countries. The special issue also includes nice articles on current trends in agricultural contracts, the economics of agricultural contract grower protection legislation, and risk and the use of contracts by vegetable growers.

My article, titled “Contract Farming: What’s In It for Smallholder Farmers in Developing Countries?,” was published last Friday in *Choices*. After discussing the theoretical pros and cons of participating in contract farming for smallholder farmers in developing countries and whether the institution does make them better off, I offer some skeptical thoughts about the policy relevance of the empirical findings (including my own) on the welfare impacts of contract farming:

]]>The bulk of the evidence suggests that participating in contract farming improves the welfare of those who choose to participate (Wang, Wang, and Delgado, 2014). Yet because the identification problem—correlation is not causation—remains as thorny as ever, one has to keep in mind the distinct possibility that publication bias has molded what we know of the welfare impacts of contract farming. Null findings—in this case, findings that show no association between participation in contract farming and welfare—tend not to get published. Findings that tend to go against the dominant view—in this case, findings that would show a negative association between participation in contract farming and welfare—are perhaps even more difficult to publish than findings of no association. Hence, the publication process might lead to a surfeit of findings showing a positive association between participation in contract farming and welfare. …

Whether policy implications can be derived from the foregoing depends on one’s willingness to believe the findings in the literature. … If, given the issues discussed above—limited internal validity, external validity, or potential publication bias—one is more skeptical about the findings of the empirical studies discussed above, then there are few if any policy implications. This is especially the case considering that the literature has so far had little to say about the potential benefits of contract farming for those who did not choose to participate. As a consequence, it might be unwise to encourage the participation in contract farming of households who do not already do so. In that case, it is perhaps best to leave growers and processors alone, without trying to nudge one party or the other in any specific direction …

With that information, you can then maximize the consumer’s utility U(x) by choosing x such that px = w (the constraint will hold with equality if you assume that the consumer’s preferences are monotonic, i.e., consumers derive greater well-being for greater amounts of x). This yields x(p,w), the consumer’s Marshallian demand (some prefer to call it a Walrasian demand) for the good whose demand you are studying when price is equal to p and income is equal to w. From x(p,w), you can calculate how consumer demand changes as price increases or as income increases, which you would respectively denote dx/dp and dx/dw. (Yes, I am abusing notation by using d to denote partial derivatives; bear with me.)

Suppose you want to test that the Law of Demand holds. That is, you want to test that the demand curve is downward sloping, i.e., as the price of a good increases, the quantity demand of that good decreases, everything else equal. So you hypothesize that dx/dp < 0. Or maybe you want to test the hypothesis that the good you are studying is a normal good. That is, you want to test the hypothesis that as consumers get wealthier, the quantity demanded of that good increases. So you hypothesize that dx/dw > 0.

To test the hypotheses above, you would want to randomly select a sample of consumers and collect information on (i) how much of the good each consumer purchases, (ii) at what price, and (iii) what is each consumer’s income. Assuming there is enough variation in the price at which the good is purchased, you could estimate

x = a + bp + cw + e,

wherein b and c would respectively be estimates of dx/dp and dx/dw. (Again, bear with me as I assume away all kinds of issues which normally arise in empirical work.)

How would you then go about testing your theoretical hypotheses? Let’s focus only on dx/dp, since the reasoning in what follows is the same for dx/dw, but with the signs flipped. To test that the Law of Demand holds in your data in the usual way, you would specify the null hypothesis H0: b = 0 versus the alternative hypothesis HA: b ≠ 0.

But notice the difference between how your theoretical and statistical hypotheses are specified: In the theoretical case, the hypothesis you want to test assumes that a relationship is of a specific sign, i.e., dx/dp < 0. In the statistical case, the hypothesis you want to test assumes that there is no relationship, i.e., b = 0.

Because dx/dp = b, the discrepancy lies in the fact that in one case, you hypothesize a negative relationship; in the other case, you hypothesize that the same relationship is zero.

(Note: This discussion is about exact hypotheses. The case of inexact hypotheses—which encompass one-sided tests, but generally cover hypotheses about ranges of values—is obviously different, but one-sided tests are rarely conducted in empirical economics, most likely because they tend to over-reject the null compared to two-sided tests, because we are all the intellectual heirs of Ronald Fisher, or both.)

Is the discrepancy between theoretical and statistical hypothesis testing only prima facie? I’m not sure. Indeed, for a hypothesis to be scientific, it has to be falsifiable. That is, it has to be the case that it can be rejected on the basis of data. Both the theoretical and statistical hypotheses above are falsifiable: For the theoretical hypothesis above, a rejection would entail dx/dp ≥ 0; for the statistical hypothesis, a rejection would entail b < 0 or b > 0. To recapitulate by mixing the theoretical and the statistical together:

- Finding that b = 0 would be a failure to reject the null (statistical) hypothesis, but a rejection of the theoretical hypothesis.
- Finding that b < 0 would be a rejection of the null (statistical) hypothesis, but support in favor of the theoretical hypothesis, and
- Finding that b > 0 would be a rejection of both the null (statistical) hypothesis
*and*of the theoretical hypothesis.

But Case 1 does not really constitute a rejection of the theoretical hypothesis, simply because depending on the level of confidence of your test, you would expect to fail to reject the null in 90, 95, or 99 percent of cases, and so such “null results” are not very convincing.

Case 3 is probably the clearest case for rejection of a hypothesis broadly defined, since both the theoretical and statistical directions of the test agree and go against the theoretical hypothesis.

Case 2 is what most people are after. A rejection of the statistical hypothesis is unlikely to be due to change (i.e., depending on the level of confidence of your test, it would only be due to chance in 10, 5, or 1 percent of cases), and this rejection is also in a direction which agrees with the theoretical hypothesis.

This is one of those rare cases where I am not sure if I have managed to clarify the issues on this or whether this has been more confusing than anything. For me, the scientific notion of “falsifiability” should really push us (as a profession, that is, and not necessarily as individuals) to want to publish Case 3-type studies wherein the identification is solid and a compelling alternative theoretical explanation is offered. Yet most empirical studies I know of are Case 2-type studies, with only a sprinkling of Case 1-type studies (i.e., null findings) in the literature. Unfortunately, as Spanos (1986), quoted by Kennedy (2008), wrote: “No economic theory was ever abandoned because it was rejected by some empirical econometric test, nor was a clear cut decision between competing theories made in light of the evidence of such a test.”

]]>(Back from two weeks in Milan, where I attended the 2015 IAAE conference, visited Expo 2015, took some time off, and saw friends I had not seen in a long time. This week’s ‘Metrics Monday is a bit different in that it is not so much about econometrics, but about the consumers of results generated by econometrics, and the need for better statistical education at an early age.)

In a conversation on this blog’s Facebook page a few weeks ago about a new working paper, a friend asked “Does this support [X] or not?” Given that the paper under discussion looked at a number of outcomes and presented a mixed bag of results, and given that the results were not causally identified, I responded: “There is no simple answer to that question. There is a little bit of everything for everyone here. Read the limitations section, too.” My friend then said she had done that but was still confused, and that “We all know it’s much easier to lie with statistics than tell the truth.”

Is it? Or is it just much easier to *be misled by* statistics than it is to *lie with* them?

What I mean by this is that we live in an age where we are increasingly being confronted with statistical data in the stories that we are told. When I worked as a reporter in the mid 90s, no one had ever heard of data-driven journalism, yet that is increasingly what editors ask from reporters at respectable media outlets, and I sat through job talks for a position focusing on exactly that area when I taught at a policy school. And the phenomenon shows no sign of going away: with the sharply decreasing costs of data collection and the consequent rise of big data, more and more of the information we will be presented with daily will have some kind of statistical component.

This means that education systems the world over will have to adjust in order to teach statistical literacy earlier in their respective curricula. The first time I was exposed to the notion of probability, I was in the eighth grade. The first time I took an actual statistics course, I was in college. And the first time I actually learned causal inference, I was an assistant professor–up until that point, my econometrics training had largely been about properly modeling data-generating processes, and whatever discussion of causation there was simply stated “Now remember, those parameters don’t indicate causal relationships; correlation is not causation,” and stopped short of a discussion of what was required to identify causal relationships.

I am convinced that it is possible to introduce these notions at an intuitive level (i.e., without the use of math) at an early age, by providing a number of examples. It should be relatively simple, for example, to adapt the Linda problem in order to teach its lesson to elementary school-aged children. If you’re not familiar with the Linda problem, it goes as follows:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations … The respondents are asked to rank in order of likelihood various scenarios: Linda is (1) an elementary school teacher, (2) active in the feminist movement, (3) a bank teller, (4) an insurance salesperson, or (5) a bank teller also active in the feminist movement … The remarkable finding is that (now generations of) respondents deem scenario (5) more likely than scenario (3), even though (5) is a special case of (3). The finding thus violates the most basic laws of probability theory. Not only do many students get the Linda problem wrong, but some object, sometimes passionately, after the correct answer is explained.

Likewise, it should be relatively simple to convince children that correlation is not causation, that in order to be reasonably certain that an estimated relationship is causal certain conditions have to be met, and to teach them that a lot of the relationships they are presented with should be confronted with the gold standard of a randomized experiment.

Given that (i) the role of any education system should be to form responsible citizens, (ii) critical thinking skills are a prerequisite for responsible citizenship, and (iii) the ability not to be hoodwinked by the statistics one is presented with is a core component of critical thinking, education has to be reformed to teach those things earlier.

Some people might say that the state already does too much, and that we should not expect it to teach statistical literacy on top of everything else. Sadly, this is not one of those areas where we should just let people choose to educate themselves; how often do people actually willingly choose to learn some math-related skill on their own, in their spare time? I think there is a clear case to be made that statistical literacy generates a positive externality (more responsible government via politicians who can less easily get away with fallacies and misleading statistics, for one), and so it is fully within the purview of the state to pay for it.

In my statement of teaching philosophy, which I have used for various purposes over the years, I wrote:

The core of my teaching philosophy is my belief in the important role college plays in forming responsible citizens. … I often tell my students that one the most important critical thinking skills–if not

themost important such skill–is the ability to question the causal statements one is presented with. … We often hear it said that correlation is not causation, but my goal is to get students to understand that knowing whether X actually causes Y is difficult and requires a great deal of thinking.

The only thing I have changed my mind about in recent years is the age at which those things should be taught.

]]>It is a curated list because a few books will hopefully be added to this list every year. Those new additions will be placed at the very bottom of the list, under “additional books.”

]]>