On the heels of a recent post (here) describing a Stata regression command tailored to the unique needs of a dependent variable that takes the form of a ratio/percentage/fraction, comes a recent paper that explores potential problems with finance papers that model ratios. Potentially compounding the problem is that ratios as dependents variables are certainly not uncommon in the finance literature (*see, e.g*., papers featuring Tobin's Q). In *The Ratio Problem*, Robert Bartlett (Berkeley) and Frank Partnoy (Berkeley) frame the ratio "problem" in terms of two challenges–omitted variable and measurement error bias–"that arise anytime a researcher uses linear regression to estimate a production function that has a ratio as an output." As the paper notes, "In theory, the [ratio] denominator is not necessarily problematic; in practice, however, there are statistical concerns whenever the output of a production function is a ratio." The paper's abstract follows.

“We use the term ‘ratio problem’ to describe the omitted variable and measurement error bias that can arise when a ratio is the dependent variable in an economic model. First, we show how bias can arise from the omission of two classes of variables based on a ratio’s denominator. As an example, we demonstrate that the widely-cited ‘inverse U’ relationship between managerial ownership and Tobin’s Q is reversed when these variables are included. Second, we show how measurement error in the ratio’s denominator can produce bias. We provide empirical tests and solutions, and urge caution about ratios as dependent variables.”

While I probably should have mentioned this when Stata's (relatively) new and quite helpful "fracreg" command was released, a recent surge of student questions about how to best model a rate (or fractional) dependent variable motivates this post.

Periodically ELS scholars will need (or want) to model a defendant variable that is bounded between zero and one (*e.g*., rates, fractions, percentages, etc.). Until recently few satisfactory options (at least "off-the-shelf" ones) existed as the various standard "reg" commands are inapt given the rate/fractional/percentage dependent variable's bounded structure. The "fracreg" command in Stata, however, is designed for dependent variables that range from zero to one (inclusive). Anyone intrigued should consult the user manual (here) and watch a brief introductory tutorial video (here).

While an increasingly important mass tort litigation mainstay since their creation in 1986, "Lone Pine" orders remain in relative scholarly obscurity. In *Lone Pine Orders: A Critical Examination and Empirical Analysis*, Nora F. Engstrom (Stanford) and Amos Espeland (Stanford-grad student) set out to "pull back the curtain." They do just that as it relates to Lone Pine orders in a brief essay and, in so doing, their paper demonstrates, once again, how the creation of new original data sets, combined with basic descriptive analyses, can add helpful information. The paper's abstract follows.

"Invented in 1986 and now a prominent feature of the mass tort landscape, Lone Pine orders require plaintiffs to provide to the court prima facie evidence of injury, exposure, and specific causation — sometimes early, and usually on pain of dismissal. Though they’ve taken root in a hazy space outside of the Federal Rules of Civil Procedure, these case management orders are frequently issued, and they play an important role in the contemporary litigation and resolution of mass torts. But although Lone Pine orders are common, potent, and increasingly controversial, they have mostly fallen under the academic radar. Even their key features are described inconsistently by commentators and courts. This Essay pulls back the curtain. Drawing on a unique hand-coded data set, this Essay describes the origin and evolution of Lone Pine orders, sketches poles of the debate surrounding their use, and offers empirical evidence regarding their entry, content, timing, and effect."

Those who use the logistic as well as the margin commands (and both are quite common in ELS) sometimes confront the question of whether to report out odds ratios (from the logistic command) or the predicted probabilities (from the margin command). As one commentator notes in this Stata list discussion (here), "An odds [ratio] is related but not the same as a probability. They are related in the sense that both are ways of quantifying how likely an event (success) is. They differ in that the "odds [ratio] is the average number of success per failure. The probability is the average number of successes per trial." What is the more intuitive of the two will, of course, vary across people.