Here is the abstract:

Contract farming has often been associated with an increase in the income of participating households. It is unclear, however, whether contract farming increases other aspects of household welfare. We use data from six regions of Madagascar and a selection-on-observables design in which we control for a household’s marginal utility of participating in contract farming, which we elicited via a contingent valuation experiment, to show that participating in contract farming reduces the duration of a household’s hungry season by about eight days on average. Further, participation in contract farming makes participating households about 18 percent more likely to see their hungry season end at any time. Moreover, we find that these effects are more pronounced for households with more children, and for households with more girls. This is an important result as children—especially girls—often bear the burden of food insecurity.

A few things, in no particular order:

- In various keynote and invited presentations I have given these past few years on what I see as the future of research on contract farming and agricultural value chains, I have often mentioned that the research needs to start looking at outcomes beyond income (or some closely related measures). There are literally dozens of papers looking at whether participation in contract farming leads to higher incomes, including my own contribution. This forthcoming article was a chance to lead by example by looking at a less proximate outcome variable, viz. the duration of the hungry season experience by each household.
- We find that participating in contract farming as a grower is associated with a decrease in the length of the hungry season experienced by the average household. This is interesting in and of itself, but more interestingly, we find that this is true even when controlling for income. That is, the seeming improvement in food security seems to (also) stem from non-income sources.
- Given the selection-on-observables (SOO) design, this paper was a good occasion to look at regression and matching approaches side-by-side. Indeed, the regression approach can tell us what is the effect of participating in contract farming on the length of the hungry season experienced by the household for the average household, but it tells us nothing about (i) what is the effect of participation for those households that do participate (i.e., the average treatment effect on the treated), and (ii) what the effect of participation would be for those households that do not participate (i.e., the average treatment effect on the untreated). Though I am generally not a fan of matching methods, the same assumption which made the SOO design possible also made the conditional independence assumption likely to hold here. The good news is that whether we use regression or matching, the average treatment effect is pretty much the same.
- This is one of the very few instances during my career where the peer-review process made the paper substantially better than what we originally submitted. The editor in charge and the reviewers made us think more carefully about our findings and explore different mechanisms, and for that we are very grateful.

*Food Policy*2.044*Agricultural Economics*1.739*Journal of Agricultural Economics*1.545*European Review of Agricultural Economics*1.544*American Journal of Agricultural Economics*1.436

The number to the right of each journal name is the journal’s impact factor, which has been calculated on the basis of calendar year 2015 citation numbers.

This been a good year for agricultural economics journals–both *Food Policy*, which I edit, and the *American Journal of Agricultural Economics* (AJAE) at which I used to serve as associate editor, have seen their impact factor go up. One notable absence from the top 5 this year is the journal *Food Security*, which was ranked #2 last year.

Obviously, I am pleased that (i) our impact factor went up and (ii) we are ranked first in my discipline. All the credit goes to my co-editor Mario Mazzocchi, our associate editors, who also handle a lot of manuscripts, and the Elsevier staff we work with.

Again, that is only *one* top 5. The rank ordering might differ significantly depending on what other indicators of quality you look at, or whether you consider reputation. In agricultural and applied economics departments, for example, many still consider the *AJAE* the no-contest top journal in the field, no matter what impact factors say.

Update: Mario also informs me that *Food Policy* ranks 49th in economics, a category where there is a lot of serious competition.

It’s always a good day when a leading media outlet picks up some of my research. It does not get much better than when *The Economist* does so.

From an article in this week’s issue:

Although concerns for the poorest Peruvians were misplaced in 2013, there may be cause to worry now. The high prices of 2013-14 prompted many more people to start growing quinoa, from entrepreneurial Bolivian taxi drivers to large agribusinesses. European farmers got in on the act, too. Quinoa is now grown in around 50 countries, according to James Livingstone-Wallace, founder of Quinola, a quinoa supplier.

That means a lot more supply: the combined volume of quinoa exports from Peru and Bolivia to the European Union rose by 227% between 2012 and 2015. Prices, naturally, have plummeted—by 40% between September 2014 and August 2015 alone. Following that drop, wages in the two regions that had traditionally produced the most quinoa fell by 5%, and total food consumption by 10%,

according to a new report from the International Trade Centre, a development agency.The same study suggests that many Andean farmers are hoarding quinoa, in the hope that prices will rise again. But European farmers are doing the same, according to Freek Jan Koekoek, a consultant. In other words, there is a real chance that prices could fall further, as farmers despair and sell their stocks.

If that happens, the marginal producers likely to be pushed out of business by the glut are the original ones: poor Andean farmers. They grow quinoa because little else thrives on their steep, barren plots. Their new competitors, tilling better soil with modern farming equipment, manage yields that are up to eight times higher.

The emphasis is mine. Here, *The Economist* is talking about the report (link opens a .pdf) I coauthored with Seth Gitter, Alex Kasterine, Efrain Obregon, and Ann-Kathrin Zotz.

In that report, we present a number of descriptive statistics on the 150 households Efrain and his team interviewed quarterly for us in 2014-2015 in the Puno and Cuzco regions of Peru. The main takeaway that emerges from our report, at least for me, is that rather than worry about high quinoa prices, as some did back in 2013, we might need to worry about low quinoa prices instead. (And if I may make an armchair behavioral argument: Reference points and loss aversion might be what pushes quinoa producers to hold on to their grain rather than sell it.)

I started reading *The Economist* in 1996, during my first semester of college. Had you told me back then that my work would be featured in that newspaper 20 years later, I would not have believed it. Achievement unlocked, as kids say these days…

The 2016 MidDev conference, which my colleague Paul Glewwe organizes here at the University of Minnesota every other year, is almost upon us.

One of the papers I am most looking forward to seeing presented at the conference is a new paper on the inverse farm size-productivity relationship by Leah Bevis and Chris Barrett, which Leah will be presenting on Saturday morning in an organized session on agriculture in developing countries.

Briefly, for those of you who may not be familiar with it, the inverse farm size-productivity relationship is the empirical regularity whereby smaller farms are on average more productive than larger farms in developing countries. (Here, note that “productivity” refers to the amount produced per unit of land–kilograms of rice per hectare, for example–and not the total amount harvested.)

The existence of that inverse relationship has preoccupied economists for almost 100 years now, and many have tried to explain how and why it arises. If there truly is an inverse relationship, then this is at odds with neoclassical economic theory, according to which we would expect low-productivity producers to sell or lease their plots to high-productivity producers.

Moreover, if there truly is an inverse relationship, then the obvious policy recommendation for anyone interested in improving food security is to break up larger farms into smaller ones.

Taking this argument to its extreme implies that one could feed the world from a handful of flower pots, which is probably why economists have a hard time accepting that the inverse relationship is actually a thing, and not a statistical artifact that is simply the result of unobserved heterogeneity. In our 2010 article on the inverse relationship, for example, my coauthors and I looked at whether the inclusion of precise soil quality measurements (i.e., indicators such as soil pH, carbon, nitrogen, and potassium, as well as the breakdown of each plot in terms of clay, silt, and sand) explained the inverse relationship in a sample of rice plots in Madagascar. It turns out that the omission of those precise measurements does not cause the apparent inverse relationship.

Enter Bevis and Barrett. In their new paper, they manage to “make the inverse relationship go away,” i.e., they find that elusive x-factor which looks like it might explain why we see an inverse relationship in the data. In this case, it looks as though the inverse relationship is cause by “edge effects,” whereby farmers work harder along the edges than they do in the middle of their plots (though biophysical effects, such as different levels of nutrients around the edges, cannot be ruled out).

Leah Bevis dedicated a post to the paper over at Economics that Really Matters:

[W]e propose and test a new mechanism: the edge effect. A vast agronomy literature documents the fact that sunlight, biodiversity, water, and other inputs may differ around the edges of a plot, making this section more productive than the interior of the plot. Additionally, the edge of a plot may be more visible or more accessible to a farmer, changing his or her awareness of and management of this space. Behavioral economics research illustrates that individuals change food consumption behavior based on information about portion size or based on visual cues about portion size. We hypothesize that farmers similarly change crop or soil management based on their awareness of plot size.

If plots are more productive around the edges, then smaller plots will be more productive as they will have a higher edge-to-interior ratio, as pictured to the right. Interested readers can see our full paper for the math; we control for this effect by controlling for the perimeter-area ratio. Once we control for this ratio, the inverse size productivity relationship disappears completely; in these Uganda data the inverse relationship is driven entirely by the edge effect, namely that plots are more productive around their perimeter.

If it holds up to reviewer scrutiny (and having read and commented on the paper, I see no reason why it would not), this is a very important finding, seeing as to how it might help resolve an old puzzle in development economics.

]]>There are many ways one can approach the estimation of nonlinear relationships. I will focus only on a handful of them in this post, from least to most nonlinear, and from semiparametric to nonparametric.

A good first step beyond the estimation of a U-shaped relationship would be to estimate the equation

(1) ,

where is the outcome of interest, is your treatment variable, is a vector of control variables, and is an error term with mean zero. I assume for the time being that is as good as randomly assigned, so that identification is guaranteed.

The difference between equation (1) and the usual linear regression is the term , where the outcome variable is related to the treatment variable in a nonlinear fashion by way of the functional form .

In my own work, one estimator I like to use to model such nonlinear relationships is a restricted cubic spline. Before anything, I should perhaps render unto Caesar the things that are Caesar’s, and note that I learned how to use restricted cubic spline from this set of slides by Maarten Buis, which includes Stata code that you can readily adapt for your own work.

Briefly, when using a restricted cubic spline, you get “a continuous smooth function that is linear before the first knot, a piecewise cubic polynomial between adjacent knots, and linear again after the last knot” (p.1311, *Stata Base Reference Manual*, Release 13). This is in contrast to a linear spline, which imposes piecewise linear components between the knots; restricted cubic splines should be used in cases where the relationship of interest is “more nonlinear” than what a linear spline allows.

What does a restricted cubic spline look like? Something like this:

The above figure is from the newest version of my paper on farmers markets and food-borne illness, which I will blog about soon. Because the estimated coefficients from a restricted cubic splines are difficult to interpret by merely looking at them, a picture is literally worth a thousand words when estimating such splines. The above figure, which overlays a scatter plot for and , shows that even when taking into account the nonlinear relationship between those two variables, that relationship looks pretty monotonic (especially considering that there are five knots here, and thus four cubic components between to linear components).

An even cooler thing you can do with the code provided by Buis in his slides is to estimate and plot , along with its confidence interval, which is the restricted cubic spline analog of the estimated coefficient for and its associated confidence interval in the context of a linear regression. For the restricted cubic spline above, the marginal effect looks like this:

The interpretation of the above figure is as follows: The marginal effect of farmers markets per capita on the number of outbreaks of food-borne illness per capita is everywhere positive, but it is only significant at less than the 5 percent level just a little bit below the mean of the standardized distribution of the treatment variable.

In cases where you want to go full nonlinear, you can use lowess smoothing, which estimates a locally weighted regression of on . If you are interested in those, the Stata reference manual has a good discussion here. Without any additional options, estimating the relationship in figure 1 by lowess instead of by a restricted cubic spline gives the following:

With that said, I want to reiterate that linear splines, restricted cubic splines, and lowess smoothing are only a handful of a number of potential estimators you can use to estimate nonlinear relationships. If you are interested in reading more on the topic, here is a very partial reading list, in no particular order:

- Henderson and Parmeter (2015),
*Applied Nonparametric Econometrics*. - Härdle (1992),
*Applied Nonparametric Regression*. - Pagan and Ullah (1999),
*Nonparametric Econometrics*. - Yatchew (2003),
*Semiparametric Regression for the Applied Econometrician*.

In closing, I would also like to offer a word of caution. As with any “fancy” procedure (e.g., tobit, Poisson, multinomial logit, etc.) aimed at properly modeling the DGP, there is a danger an inherent danger that once one has learned to use the nonlinear procedures described above, one starts to see everything as a nail. Don’t fall into this trap.

As I have described before, there is an unspoken ontological order in which things are to be tackled in applied econometrics and in most social-scientific applications, it will be much more important to have a reasonable shot at causal identification than it is to accurately model nonlinearities in your data. This means that the procedures described above should be reserved for those cases where you have experimental data, a selection-on-observables design, etc. which yields plausible identification.

]]>I’ve been using Stata since 1999, and I managed to learn quite a bit from looking at those cheatsheets.

ht: Jesse Tack.

]]>The end result is an extremely well-researched episode titled “It Is OK to Eat Quinoa” which also features my coauthor Seth Gitter, UC Berkeley graduate student Andrew Stevens, and a few others who know a lot about quinoa. The episode’s main page is here, but you can also listen directly by clicking here.

]]>*Click here to view the embedded video.*

As its name suggests, -utest- allows testing for the presence of a U-shaped relationship between your dependent variable and one of your explanatory variables.

For example, let denote an individual’s asset holdings and denote her age. With a sample of working-age adults, and without bringing in any additional variables, you might want to test the hypothesis that there is an inverse U-shaped relationship between an individual’s asset holdings and her age.

Indeed, few of us had any assets at 18. But as we go through our working life, our stock of assets grow–we purchase cars, buy homes, save some money for retirement, and so on. So at least in early adulthood, the relationship between asset holdings and age should be positive. Conversely, when we retire, we start selling off assets to maintain a certain standard of living, so after retirement, the relationship between asset holdings and age should be negative.

If you are interested in the effect of some other variable on individuals’ asset holdings in the context of a regression that also includes age, such that

,

then whether you include just or both and does not really matter–what matters is to get right, and provided you control for at all, you get that; I seem to recall that Angrist and Pischke discuss this briefly in *Mostly Harmless Econometrics*. But there are cases where you might be genuinely curious about whether there is a nonlinear relationship between and , so you would estimate

,

and you’d then look at whether is significantly different from zero. If it is and it is positive (negative), then there is a(n inverse) U-shaped relationship between and .

The usefulness of the -utest- command comes from the fact that it can determine whether there is actually a U-shaped relationship (whether inverse or not) and report a p-value for the hypothesis that there is not such relationship, and that it also reports the extremum (i.e., the maximum in case of an inverse U-shaped relationship and the minimum in case of a U-shaped relationship) and whether that extremum is within the range of your variable.

If you are interested in reading more on testing for U-shaped relationships and for the theory behind -utest-, see this cleverly titled (“With or Without U”) article by Lind and Mehlum (OBES, 2009).

In future installments, I’m hoping to cover the use of splines and other procedures to look at whether the relationship between two variables is nonlinear of a higher order, i.e., “more nonlinear” than what a simple second-order polynomial or quadratic function can uncover.

Update: A point which I should have made clearer in the original post is that -utest- allows determining whether you have a(n inverse) U-shaped relationship, and not just a monotonic relationship that is also be convex (concave). In other words, the main takeaway here is that you can’t just look at , check whether it is significant, and conclude that there is indeed a U-shaped relationship. This, too, was new to me. I thank Simon Savard for making me realize that this should be clarified.

* Little-known fact: Though “Nothing Compares 2 U” was made popular by Sinead O’Connor, it was written and composed Prince, who was about as local an artist as can be around the Twin Cities (with the potential exception of Garrison Keillor).

]]>Prior to this week’s ‘Metrics Monday, I had last posted on March 23 about my new working paper with Johanna Fajardo-Gonzalez and Seth Gitter on the welfare impacts of rising quinoa prices.

Since then, I have criss-crossed the North American continent, presenting my work in New York, Minnesota, Alberta, and New Mexico, which means that I have had no time to blog until Monday morning.

Since I discussed our quinoa paper, however, there has been quite a bit of media interest in our findings. Here is a brief roundup of the most interesting media stories (the other stories I saw were written by people whom I did not actually speak to, and often their stories were just reprising details from the two stories I discuss below).

First, there was this story by Brad Plumer on Vox:

In January 2013, agricultural economist Marc Bellemare was browsing the internet when he noticed a raging debate about (of all things) quinoa. Specifically, whether people in rich countries should feel guilty about eating it.

This was the height of the “superfood” craze, recall, and Americans and Europeans were scarfing down quinoa — a high-protein alternative to rice with a rich, nutty flavor — by the bucketful. Global quinoa prices had nearly tripled between 2006 and 2013.

Some commenters feared the West’s quinoa binge would spell disaster for the developing world … [O]ther pundits … countered that the price increase would be a boon to farmers who grew the crop.

The quinoa debate was lively, but it all struck Bellemare as a bit superficial.

“None of these arguments had good data behind them,” he told me. So he did what economists do: he started hunting for data.

Then came this longer story by Jeremy Cherfas on NPR:

The price of quinoa tripled from 2006 to 2013 as America and Europe discovered this new superfood. That led to scary media reports that the people who grew it in the high Andes mountains of Bolivia and Peru could no longer afford to eat it. And while, as we reported, groups working on the ground tried to spread the word that your love of quinoa was actually helping Andean farmers, that was still anecdote rather than evidence.

The lack of evidence didn’t seem right to Marc Bellemare, an agricultural economist at the University of Minnesota, or Seth Gitter, another economist at Towson University in Maryland. They hadn’t met, but Gitter knew of a data source that could provide solid evidence, and the two teamed up. They recently published their results in a working paper.

(In case you don’t know about him, Jeremy Cherfas is also the host of the Eat This podcast, which anyone with an interest in food-qua-cultural-phenomenon and food policy should listen to. The podcast is so good, in fact, that it has been nominated for a James Beard award for the second year in a row.)

With that said, it is interesting to see the difference between what is of interest to journalists vs. what is of interest to social scientists. Last Friday, I was presenting the results of some experimental work I have been doing on producer attitudes to price uncertainty at the University of New Mexico. During her introduction, the colleague who had invited me finished by mentioning NPR’s story on our quinoa paper.

This made me muse about how even though there had been a great deal of media interest in the quinoa paper, the experimental work I was about to present would almost surely generate zero media interest due to its nuts-and-bolts nature.

Yet that experimental work is much more fundamental to economics than the quinoa paper, which focuses on a commodity that is relatively small in terms trade volume. So it goes, I guess, as Kurt Vonnegut would have it.

]]>Last month, Ben Chapman and Don Schaffner, who host the Food Safety Talk podcast, discussed my January Gray Matter column in the *New York Times* in January, in which I discussed my work on farmers markets and food-borne illness.

Their discussion was even-handed, and Don (I think it was him; I listened to the segment only once, over a month ago) demonstrated a surprising understanding of the working paper culture in economics, wherein we circulate working papers well ahead of submitting for publication so as to make our work better in view of publishing it in better journals. But the one part which made my ears perk up was when Ben asked Don (or the other way around; again, it’s been a while since I listened) why my coauthors and I had looked at the relationship between farmers markets and all those seemingly irrelevant illnesses, and Don said (and I’m paraphrasing), “I don’t know, it looks like data mining.”

This made me conscious once again of the gap there exists between economics and other disciplines when it comes to empirical work. Bench scientists think papers in economics are much too long: “Why do you need to describe the data in so much detail?,” or “Why do you need all those tables that show the same thing over and over?”

Briefly, on why we need to describe the data in so much detail, much of the cleavage between economics and bench science comes from the fact that bench scientists deal exclusively with experimental data. In economics, however, for all this talk of field experiments and lab-in-the-field experiments, experimental data is still the exception and observational data the norm. And when you are dealing with observational data, chances are you are dealing with survey data, in which case you are almost surely dealing with responses provided by human beings whose answers are not always the most reliable. In such cases, it is helpful to explain where, when, and how the data were collected so your readers can tell whether there might be anything hinky going on with your results.

On why we need all those tables, that too has to do with the fact that much of economics deals with observational data. Unlike experimental data, which often allow for a simple comparison of means between treatment and control groups, observational data require one to slice the data in many different ways to make sure that a given finding is not spurious, and that the researchers have not cherry-picked their findings and reported the one specification in which what they wanted to find turned out to be there.

As such, all those tables of robustness checks are there to do the exact opposite of data mining. And as for why I look at seven different types of food-borne illness in my work on farmers markets and food-borne illness, that’s because when you find a positive relationship between, say, D and Y = Y_{1} + Y_{2} + … + Y_{7}, it makes sense to want to know whether that positive relationship comes from Y_{1}, Y_{2}, …, or Y_{7}, i.e., the different constituent parts of Y.

On the one hand, finding that none of the constituent parts of Y is associated with D would cast the initial finding that there is a relationship between D and Y in question. If I had found that outbreaks or cases of *none* of the top seven food-borne illnesses reported by the CDC were associated with the number of farmers markets, my main finding would have been pretty weak. On the other hand, finding that one ore more of the constituent parts of Y is associated with D is interesting in its own right. In my work, this means that knowing that the positive relationship between farmers markets and food-borne illness in general is due to a similar positive relationship between both (i) farmers markets and norovirus and (ii) farmers markets and campylobacter is interesting, since this can guide future research and policy making efforts.

That said, I completely get where Ben and Don were coming from with the comment that this looked like data mining. When you deal primarily (if not exclusively) with experimental data, looking at the relationship between your treatment variable and an increasing number of outcomes will eventually yield significance. Even if there is no causal relationship, you would expect significance at the 1, 5, or 10 percent significance level by looking at the relationship between your treatment and 100, 20, or 10 a priori unrelated outcomes. In that case, looking at more outcomes would indeed be data mining. That is why economists have recently started advocating for submitting a pre-analysis plan when doing experimental work, i.e., a document you submit before you begin collecting data in which you explain exactly which outcomes you will be looking at and how. (See here for a counterpoint that argues that pre-analysis plans are not always that useful, especially in case where one can do replication research.)

]]>