Everyone who writes as part of their job should read the article in full, but here is a *précis *in case you are too busy to read the whole thing:

*“Inspiration” is bunk*: “It’s a total illusion. And the danger in the illusion is that you will wait for those moments … I am persuaded that most writers, like most shoemakers, are about as good one day as the next.”*Writing is rewriting*: “[A]ll first drafts are deeply flawed by the need to combine composition with thought. Each later draft is less demanding in this regard. Hence the writing can be better.”*Brevity is the soul of wit*: “The gains from brevity are obvious; in most efforts to achieve brevity, it is the worst and dullest that goes. It is the worst and dullest that spoils the rest.”*Intoxicants don’t help*: “Nothing is so pleasant. Nothing is so important for giving the writer a sense of confidence in himself. And nothing so impairs the product. … [I]t is, quite literally, very sobering to reflect upon how many good American writers have been destroyed by this solace — by the sauce. Scott Fitzgerald, Sinclair Lewis, Thomas Wolfe, Ernest Hemingway, William Faulkner–the list goes on and on.”*Just the facts*: “Nothing is so hard to come by as a new and interesting fact. Nothing is so easy on the feet as a generalization … My advice to all young writers is to stick to research and reporting with only a minimum of interpretation.”*No jokes*: “Avoid all attempts at humor … Humor is an intensely personal, largely internal thing. What pleases some, including the source, does not please others. One laughs; another says ‘Well, I certainly see nothing funny about that.'”*Armchair quarterbacks*: “Any specialist who ventures to write on [his specialization] with a view to making himself intelligible works under a grave moral hazard. He will be accused of oversimplification. The charge will be made by his fellow professionals, however obtuse or incompetent … Complexity and obscurity have professional value—they are the academic equivalents of apprenticeship rules in the building trades. They exclude the outsiders, keep down the competition, preserve the image of a privileged or priestly class. The man who makes things clear is a scab. He is criticized less for his clarity than for his treachery.”

ht: Austin Sandler.

]]>An article titled “The OPEC of Maple Syrup” in this week’s issue of *The Economist *embodies a number of the concepts I teach undergraduates in intermediate micro. First, the setup:

Tourists flock to Quebec each autumn to see its forests turn bright colors. The beautiful foliage, however, conceals a dark secret: the province’s tree-tappers are ripping off pancake-lovers. The Federation of Quebec Maple Syrup Producers (FPAQ) tries to control the price of its product much as OPEC does that of oil. Its members–and all syrup producers in the province must join or risk having their output seized by FPAQ’s enforcers–are subject to quotas. Any excess syrup is put into FPAQ’s stockpile, and producers only get paid for it when it is sold, often years later. The intention is to keep prices high and stable by limiting supply.

If you restrict the supply, you increase the market price by moving along the demand curve–assuming demand itself remains unchanged–and this effectively imposes a price floor. It is not clear, however, how well one can just target a price and get to that price by imposing a quota.

Unfortunately, messing with markets is like playing with fire; sooner or later, you get burned:

[I]n a bittersweet echo of the oil price run-up of recent years, high prices have encouraged the development of new supplies. America’s maple-syrup harvest grew from 21m pounds (7.2m liters) in 2012 to 35m in 2014. The state of New York alone has more maple trees than all of Quebec, although few of them are tapped. America out-produced Canada until the 1930s; it could do so again.

And as the article notes, it’s not like there aren’t any substitutes for maple syrup, so the price elasticity of demand is not as low as the producers would like it to be. That is, no matter how much maple syrup may taste better than cheaper, corn-based alternatives–what we refer to in Quebec sarcastically as “telephone pole syrup”–there comes a point where the lower price of cheap substitutes is just about enough to make consumers willing to sacrifice a bit of taste.

Worse, the current high prices, when combined with low costs of production, mean that profits are extra-normal, which incentivizes new players to join the market, which no amount of dumping or predatory pricing will likely deter. The article concludes that

[p]roduction is outpacing what the Federation can sell. Its “strategic reserve” has swollen to 25m liters of syrup–almost a year’s worth of sales. As America’s output grows, the organisation will have to cut its members’ quotas, stockpile ever more of Quebec’s harvest, or allow prices to fall. The first two options would cede market share to America while keeping prices helpfully high for rivals. But mimicking Saudi Arabia’s current strategy, of allowing oil prices to fall to drive higher-cost producers out of business, might not work for FPAQ either. The operating costs of maple plantations are very low, so the American upstarts would be unlikely to close shop.

A sticky situation indeed, but yet another example of Mancur Olson’s logic of collective action, wherein a small, well-organized group can better rally to speak with one voice and rent-seek than larger, less organized groups (think “US sugar lobby“). This is especially true when the costs are diffuse and spread out over many consumers whose budget share of food is impacted infinitesimally by changes in the price of the commodity whose price is being kept artificially high, as with maple syrup (I suspect few people spend more than 1% of their food budget on maple syrup…)

]]>Still, I thought I should assemble of all my ‘Metrics Monday posts (including those proto-‘Metrics Monday posts, i.e., posts about econometrics before ‘Metrics Monday was even a thing). I am also making a separate page for this list, and that page will be updated every time I add a post to the ‘Metrics Monday series.

(Incidentally, I am getting to a point where I have covered a lot of what I wanted to cover as part of this series, and I am running out of inspiration; requests from readers are much appreciated.)

Here is the list:

- Fads and Fashions in Econometrics
- Multicollinearity
- Friends *Do* Let Friends Do IV
- Regressions as Ecosystems
- When Is Heteroskedasticity (Not) a Problem?
- Hypothesis Testing in Theory and in Practice
- Statistical Literacy
- Data Cleaning
- Outliers
- Proxy Variables
- What to Do with Missing Data
- What to Do with Endogenous Control Variables
- Control Variables: More Isn’t Necessarily Better
- You Can’t Test for Exogeneity: Uninformative Hausman Tests
- “Do Both”
- You Keep Using that Instrumental Variable; I Do Not Think It Does What You Think It Does
- PSA: p-Values Are Thresholds, Not Approximations
- The Use and Misuse of R-Square
- Big Dumb Data?
- Rookie Mistakes in Empirical Analysis
- Goodness of Fit in Binary Choice Models
- A Nifty Fix for When Your Treatment Variable Is Measured with Error
- A Rant on Estimation with Binary Dependent Variables
- Love It or Logit, or: Man, People *Really* Care about Binary Dependent Variables
- In Defense of the Cookbook Approach to Econometrics
- More on the Cookbook Approach to Econometrics
- Econometrics Teaching Needs an Overhaul
- Hipstermetrics
- On the (Mis)Use of Regression Analysis: Country Music and Suicide
- Methodological Convergence in the Social Sciences

(1) Y = a + b_{1}X_{1} + b_{2}X_{2} + …+ b_{K}X_{K} + e.

You have *N* observations which you use to estimate the regression. If *N* < *K*, you will not be able to estimate the vector of parameters b = (b_{1}, b_{2}, …, b_{K}). That’s because you have fewer equations than you have unknowns in your system–recall from your middle-school algebra classes that you need at least as many equations as you have unknowns in order to solve for those unknowns. So in econometrics, *N* < *K *means that you cannot “solve” for b (i.e., it is under-determined), *N* = *K *means that your equation has a unique solution for b (i.e., it is exactly determined), and *N* > *K *means that your equation has several solutions for b (i.e., it is over-determined).

Multicollinearity is the problem that arises when *N* is too small relative to *K*, or what Arthur Goldberger called “micronumerosity,” referring to too small a number of observations relative to the number of parameters. The most extreme version of multicollinearity is *N* < *K*, in which case you cannot estimate anything.

A less extreme version of multicollinearity is when there is an exact linear relationship between two variables. Suppose X_{1} and X_{2} in equation (1) above are respectively dummy variables for whether one is male and whether one is female. Barring the unlikely case where the data include one or more intersex individuals, trying to estimate equation (1) will lead to one of the two variables being dropped, simply because X_{1} + X_{2} = 1, i.e., there is an exact linear relationship between the two. If you were to try to “force” that estimation, your statistical package would not be able to invert the (X’X) matrix necessary to estimate b by least squares, and the only way to include both variables would be to estimate equation (1) without a constant.

The more common version of the multicollinearity problem is when the correlation between two or more variables is “too high,” meaning that there is an approximate linear relationship between those variables. A good example would be between the amount of food one purchases which one consumes, the amount of food one purchases which one wastes, and the total amount of food one purchases. Food consumed and food wasted need not sum up to one’s total food purchases–sometimes one gives food to someone else–but the correlation is high.

When that happens, the OLS estimator is still unbiased, and as Kennedy (2008)–my Bible when it comes to the fundamental of econometrics–notes, the Gauss-Markov theorem still holds, and OLS is BLUE. Rather, the problem is that the standard errors blow up, and b is imprecisely estimated, and so hypothesis tests will tend to fail to reject the null hypothesis that the components of b are not statistically different from zero.

Kennedy provides a neat intuitive discussion of why that is. Think of the variation in X_{1} and X_{2} in in the context of a Venn-Diagram. Each of two sets represents the variation in one variable, with the intersection between the two representing the variation that is common to both variable. Then, the variation in each variable that is not common to the other is represented by the part of the set for that variable which lies outside the intersection. This means that the more highly correlated two variables are, the less variation is available to identify their coefficients–that is, the more imprecisely estimated those coefficients will be. It is in that sense that multicollinearity is a consequence of there being not enough variation in the data, which is why the common recommendation that is made to deal with collinearity is to “get more data,” i.e., increase *N*, since multicollinearity is caused *N *being too small *relative to* *K*.

Unless you have perfect collinearity, in which case Stata will drop a regressor, detecting multicollinearity is tricky, given that having imprecise estimates is not uncommon with observational data. One thing I see often in the manuscripts I review or am in charge of as an editor is a correlation matrix, which shows the correlation between the variables in a regression. But this is only useful insofar as you have multicollinearity issues between two variables; if the multicollinearity issue stems from an approximate linear relationship between three or more variables, the correlation matrix will be useless.

What to do when you suspect you are dealing with a multicollinearity problem? Kennedy offers a few ideas; I am listing those that strike me as the most practical:

*Do nothing.*This is especially useful if your coefficient estimates turn out to be statistically significant–if you do get significance even with imprecisely estimated coefficients, you’re in relatively good shape.*Get more data.*See the discussion above for why that might be a good idea. This can be a costly option, however, and by “costly,” I mean “impossible.”*Drop one of the collinear variables*. That would have been my default prior to writing this post, but this only is a workable solution if that variable adds nothing to the regression to begin with, i.e., if its estimated coefficient is zero. But then, how can you tell whether that is the case if that coefficient is imprecisely estimated? Moreover, doing this introduces bias, so you need to think carefully about whether you’re willing to deal with bias in order to mitigate imprecision.*Use principal components or factor analysis*. This boils down to creating an index with the multicollinear variables or estimating some linear combination of those same variables which is then used as a single regressor. The latter is especially useful when you have several variables that aim to measure the same thing, and you want to include them all.

I must confess that I hardly ever worry about collinearity in my own work. That’s because if the problem gets too extreme, Stata will drop one of the collinear variables, and if the problem is not extreme, it is hard to determine whether a statistically insignificant coefficient estimate is imprecisely estimate because of multicollinearity or because of there being no statistically significant relationship. In the latter case, my personal preference is to just call it a day go with the assumption that there is no statistically significant relationship rather than mine the data by combining variables into an index, omitting a variable, or doing some factor analysis.

That said, in his chapter on collinearity, Kennedy also has a neat quote by Williams (1992), which states that “the worth of an econometrics textbook tends to be inversely related to the technical material devoted to multicollinearity.” Relatedly, it is perhaps no surprise then that the people who most worry about collinearity tend to be people who are just starting out in applied econometrics, i.e., people who have not gotten their hands dirty enough with data yet.

]]>Hi Marc,

I have a general question about reading references. How do you usually do reference reading? Do you take some notes in any software in your computer? How can you recall the main ideas of a specific paper after you have read a lot relevant papers?

Do you know or have you written any (blog) article about how to read academic references for graduate students?

At first, I had no idea how to answer that question, but then I remembered that a long time ago, Kim Yi Dionne wrote a post addressing just that.

That said, let me state the obvious: You can’t read academic books and articles like you read novels or magazine articles, and developing the right method for reading academic articles is key to becoming a more effective researcher. The way I went about it was pretty simple: I just read, read, and read some more until I developed some good habits, which I can summarize as follows:

Read with two pens in hand, one a highlighter, and one a regular pen. Use the highlighter to highlight what you think is important, and use the pen to take notes in the margins.

I developed my “method” starting with the literature review I did for my Masters thesis. I remember having to read Stiglitz and Weiss (1981), and in order to follow the math, I would write down the variable names and equations and then try to reproduce the authors’ derivations in the margins.

And when I decided to read everything I could find on contract theory and applied contract theory when writing my dissertation, I went about it the same way. I made sure to keep a bibliography of all the articles I was reading, both so I could remember what I had read, but also so I could easily copy and past them into my dissertation or a research paper when necessary.

With books, of course, it is a bit more difficult to use this method, first because the books might not be yours, and then because you might not want to deface your books. Luckily, I also happen to be in a discipline where most of the literature consists of articles, which I can print and read (this is important: I don’t recommend reading articles on a computer or tablet, simply because it is easier to remember things you read on paper than things you read electronically, and because it is easier to highlight and annotate on paper). But generally: If a book is important for your research, it is perhaps best to buy it or photocopy the relevant parts, and academic books are not exactly like previous first editions of famous classics, so I personally feel free to deface them by highlighting and annotating them.

(Some people like to do this with EndNote, which you can use to format your bibliography in a given journal’s style with a few clicks. Unfortunately, after starting many times with EndNotes, I never got the hang of using it, and I decided to uninstall it.)

That’s it, really. I don’t think there is anything more to it than highlight and take notes in the margins, which combined with the increasing returns to scale that come from reading enough academic articles, will make you a more effective consumer of scientific literature. Some people might recommend reading the abstract, introduction, and conclusion at first, which is not a bad idea if you want to know where a paper is going, but I still recommend reading everything–at least to grad students. When you become a more experienced reader, you’ll have a pretty good sense of what you can skip, but this can be risky in grad school when you are less familiar with what is important and what is less important.

]]>I was wondering if you wouldn’t mind writing a post on 3SLS. I recently sent in a research proposal and part of the feedback I got was that 3SLS was “outdated” and that I needed to find a natural experiment or a good instrument. What do we do when these things aren’t readily available?

Good question. Let me delay the answer a bit to talk about whether three-stage least squares (3SLS) is outdated.

First, a refresher on 3SLS (which I needed myself, as it has been at least 12 years since I haven’t thought about that estimator): 3SLS is 2SLS applied to a system of equations (e.g., a supply equation and a demand equation). Why would you want to apply 2SLS to a system of equations? Two reasons:

- Each equation in your system has one or more endogenous regressor on the right-hand side (RHS), and
- You want to take into account the fact that the error terms are correlated across the equations in the system.

In other words, in order to minimize bias (the 2SLS part) and maximize precision (the system part). Thus if you have endogeneity issues across a system of equations (e.g., an equation for quantity supplied, and an equation for quantity demanded, both with endogenous prices on the RHS), it might seem like a good idea to kill two birds with one stone by estimating both equations simultaneously by 3SLS.

Why 3SLS is “outdated”? (Again, a sign that it is outdated is that I had to refresh my own memory about what 3SLS does, since the last time I had come across the estimator was in grad school).* First, as Dave Giles put it in a 2011 post:

There are the various “single equation” estimators, such as 2SLS or Limited Information Maximum Likelihood (LIML). These have the disadvantage of being asymptotically inefficient, in general, relative the “full system” estimators. However, they have the advantage of usually being more robust to model mis-specification. Mis-specifying one equation in the model may result in inconsistent estimation of that equation’s coefficients, but this generally won’t affect the estimation of the other equations.

Note that the relative inefficiency (i.e., imprecision) of 2SLS is why one might wish to estimate 3SLS. The downside of 3SLS, however, is that if there is any hint of misspecification, then the misspecified equation’s parameters do not converge to their true value.

Second, the Credibility Revolution brought with itself an emphasis on causal identification, and thus on the estimation of unbiased coefficients, often at the expense of precision. If I recall correctly, Angrist and Pischke note in the conclusion to *Mostly Harmless Econometrics* that (I’m paraphrasing) with the method they just covered, “though you might not get the standard errors right, you’ll at least get the identification part right.”

In other words, with the methods covered in their book, you might not get efficiency, but you’ll get consistency. But if (i) one of 3SLS’s disadvantages is that misspecification will lead to inconsistent estimation and, conversely, one of 2SLS’s advantages is that it is robust to misspecification, and (ii) what we have come to care about mostly now is consistency more than efficiency, it is no surprise that 3SLS is seen as “outdated.”

All of this highlights the fact that practice of econometrics is not immune to fads and fashions. In an alternate reality where the Credibility Revolution did not happen and people mostly cared about efficiency, it is possible that 3SLS would be encouraged on the grounds that “Sure, it might lead to some inconsistency, but it’s at least efficient!”

Back to Levi’s question of what to do when you don’t have a natural experiment or a solid IV. The unfortunate answer is to try to find a better (i.e., more plausibly exogenous) IV. Perhaps more importantly, between an efficient estimator that is sensitive to misspecification and an inefficient estimator that is robust to misspecification, it is better to pick the latter, even with less-than-ideal IVs.

* I also did a little proof by JSTOR by looking for any mention of “3SLS” or “three-stage least squares” in articles in the *AER*, *QJE*, *JPE*, *REStud*, *Econometrica*, *REStat*, or *AEJ: Applied* since 2010. Bearing in mind partial coverage of those journals by JSTOR for that period I take the fact that I could find only *one* article that did so as evidence in favor of 3SLS being outdated.

Just don’t do [IV] …

Here are the problems.First of all, no matter what you may have read or been taught, identification is always and everywhere an ASSUMPTION. You cannot prove your IV is valid. …I pretty much refuse to let my grad students go on the market with an IV in the job market paper. No way, no how. Even the 80 year old deadwoods in the back of the seminar room at your job talk know how to argue about the validity of your instruments. It’s one of the easiest ways to lose control of your seminar.

As we say in Minnesota: That’s different. Two things:

- What about those cases where you have an IV that is randomly assigned? For example, what about those cases where you offer people randomly varying cash incentives to take up some kind of treatment in order to estimate the LATE? Here, there is no assumption (or is that ASSUMPTION?) made, beyond the assumption that the significance of your IV in your first-stage regression (i.e., its relevance) is not the result of chance.
- Honestly, if we are going to go there–that is, question weak IV tests/tests of relevance because a rejection of the null might be due to chance, and indict IV for that–then can I point out that it is
*also*an assumption that your failure to reject the null in test of parallel trends in a diff-in-diffs context is not due to chance? (That is, when the parallel trends assumption is*actually*testable.)

And then there’s the following (the emphasis is mine):

We’ve had really good luck placing students who used Diff in diff (in diff),

propensity score matching, synthetic control, and even regression discontinuity. All of these approaches have their own problems, but they are like little grains of sand compared to the boulder-sized issues in IV.

Huh? I have seen propensity score matching applied wrongly way more often than I have seen poor IVs or IVs whose exclusion restrictions were not really explored and discussed. I have lost count of the number of papers I have read where the authors think that matching on observables also implies matching on unobservables, and that one can just use PSM to buy causal identification on the cheap.

Don’t get me wrong: If you are going to use an observational IV, you *do* need to think very carefully about how and why it meets the exclusion restriction. And if it does meet it, you need to pray that it will be a relevant IV. But there are clear cases where IV works, and that is especially the case in a setting where you randomly assign the IV, or in quasi experimental settings where people are assigned to some treatment at random (e.g., Angrist’s famous Vietnam draft lottery setting).

Identifying causal effects is hard. Willingly limiting yourself to a subset of methods and declaring one method off-limits is like a football coach saying he doesn’t want his quarterback to ever try to pass the ball. So when Levi Russell (of Farmer Hayek Blog) writes:

.@ez_angus says "Friends Don't Let Friends do IV" but I suspect @mfbellemare would disagree! http://t.co/91NjhxlT8S http://t.co/vls6gXlGCq

— Farmer Hayek Blog (@FarmerHayek) October 2, 2015

I say:

*Click here to view the embedded video.*

]]>

An old friend (not an applied econometrician) writes (via Facebook, in case you wonder about the telegraphic style of the query):

Econometrics question – have a M.Sc. student doing a study on conservation agriculture (CA) and is developing instruments for CA component use. Any suggestions on appropriate instruments?

My (less-than-helpful) answer:

What’s the outcome of interest Y? What’s the treatment variable/variable of interest D? What controls X are included? All of those work as a kind of ecosystem–without knowing what are the component parts of it, I can’t come up with a good idea for an instrument Z.

It’s the regression-as-ecosystem comment that I wanted to discuss today. Indeed, if you are interested in causal effects–and who isn’t, these days?–you have to see any regression of interest as an ecosystem where things live or die as a function of other things in the system.

This is especially the case if you don’t have an experiment or a quasi experiment, and you have to rely on an instrumental variable (IV) that is nonrandom. In the “cookbook econometrics” class I teach every other year to our doctoral students, I tell students that an IV lives and dies by the controls it is surrounded with, a point that is obvious once you start thinking about it, but which is made all too rarely. Indeed, here is something that I bet is taking place almost daily throughout the world in economics seminars:

- The presenter is interested in the causal relationship flowing from some treatment D to some outcome Y.
- The presenter recognizes that Y and D are jointly determined, and is thus using an instrument Z to get at it.
- A clever member of the audience says: “Yes, but have you considered [channel through which Z violates the exclusion restriction]?”
- The presenter says: “You’re right–in principle. Because I have [specific variable] in my set of controls X, the exclusion restriction is still met.”
- Clever member of the audience: “Ok, okay.”

Here is a real-life example: In my food prices and food riots paper, in which I was interested in the causal effect of food prices on the extent of social unrest worldwide, I used natural disasters worldwide as an IV for food prices. A few times in seminars, I was asked: “Yes, but you don’t control for the income of food consumers, and that’s an omitted variable.” Notwithstanding the fact that natural disasters are also orthogonal to income (and that it is not clear that you want to include an obviously endogenous control such as income in the regression I was estimating), my response was: “Yes, but I am regressing on the real–not nominal–price of food, which controls for the overall price level and thus, presumably, for wages, which themselves determine most people’s income levels.”

At any rate, I’m not sure I have much more of a point than “All the pieces matter,” to quote fictional detective Lester Freamon, and that when thinking about causality, you have to consider Y = f(D(Z,X), X) + e as a whole, and not just D(Z) or even Y = D(Z).

If anything, that is where the use of directed acyclic graphs (DAGs) comes in handy, and why I advocate that our students (i) read (some of) Judea Pearl’s *Causality*, and (ii) use DAGs when they start thinking about an empirical problem.

I, along with a number of other researchers in the areas of agriculture, climate, and food, am one of the authors of the US Department of Agriculture’s assessment report titled “Climate Change, Global Food Security, and the US Food System.”

In the September 8 issue of the *Federal Register*, USDA chief economist Rob Johansson requested public comments on the report. If you would like to make public comments, you can find the notice here. Comments have to be submitted before 11:59 pm on October 8, 2015.