The post G.A. Barnard: The “catch-all” factor: probability vs likelihood appeared first on All About Statistics.
]]>
With continued acknowledgement of Barnard’s birthday on Friday, Sept.23, I reblog an exchange on catchall probabilities from the “The Savage Forum” (pp 79-84 Savage, 1962) with some new remarks.[i]
BARNARD:…Professor Savage, as I understand him, said earlier that a difference between likelihoods and probabilities was that probabilities would normalize because they integrate to one, whereas likelihoods will not. Now probabilities integrate to one only if all possibilities are taken into account. This requires in its application to the probability of hypotheses that we should be in a position to enumerate all possible hypotheses which might explain a given set of data. Now I think it is just not true that we ever can enumerate all possible hypotheses. … If this is so we ought to allow that in addition to the hypotheses that we really consider we should allow something that we had not thought of yet, and of course as soon as we do this we lose the normalizing factor of the probability, and from that point of view probability has no advantage over likelihood. This is my general point, that I think while I agree with a lot of the technical points, I would prefer that this is talked about in terms of likelihood rather than probability. I should like to ask what Professor Savage thinks about that, whether he thinks that the necessity to enumerate hypotheses exhaustively, is important.
SAVAGE: Surely, as you say, we cannot always enumerate hypotheses so completely as we like to think. The list can, however, always be completed by tacking on a catch-all ‘something else’. In principle, a person will have probabilities given ‘something else’ just as he has probabilities given other hypotheses. In practice, the probability of a specified datum given ‘something else’ is likely to be particularly vague–an unpleasant reality. The probability of ‘something else’ is also meaningful of course, and usually, though perhaps poorly defined, it is definitely very small. Looking at things this way, I do not find probabilities unnormalizable, certainly not altogether unnormalizable.
Whether probability has an advantage over likelihood seems to me like the question whether volts have an advantage over amperes. The meaninglessness of a norm for likelihood is for me a symptom of the great difference between likelihood and probability. Since you question that symptom, I shall mention one or two others. …
On the more general aspect of the enumeration of all possible hypotheses, I certainly agree that the danger of losing serendipity by binding oneself to an over-rigid model is one against which we cannot be too alert. We must not pretend to have enumerated all the hypotheses in some simple and artificial enumeration that actually excludes some of them. The list can however be completed, as I have said, by adding a general ‘something else’ hypothesis, and this will be quite workable, provided you can tell yourself in good faith that ‘something else’ is rather improbable. The ‘something else’ hypothesis does not seem to make it any more meaningful to use likelihood for probability than to use volts for amperes.
Let us consider an example. Off hand, one might think it quite an acceptable scientific question to ask, ‘What is the melting point of californium?’ Such a question is, in effect, a list of alternatives that pretends to be exhaustive. But, even specifying which isotope of californium is referred to and the pressure at which the melting point is wanted, there are alternatives that the question tends to hide. It is possible that californium sublimates without melting or that it behaves like glass. Who dare say what other alternatives might obtain? An attempt to measure the melting point of californium might, if we are serendipitous, lead to more or less evidence that the concept of melting point is not directly applicable to it. Whether this happens or not, Bayes’s theorem will yield a posterior probability distribution for the melting point given that there really is one, based on the corresponding prior conditional probability and on the likelihood of the observed reading of the thermometer as a function of each possible melting point. Neither the prior probability that there is no melting point, nor the likelihood for the observed reading as a function of hypotheses alternative to that of the existence of a melting point enter the calculation. The distinction between likelihood and probability seems clear in this problem, as in any other.
BARNARD: Professor Savage says in effect, ‘add at the bottom of list H_{1}, H_{2},…”something else”’. But what is the probability that a penny comes up heads given the hypothesis ‘something else’. We do not know. What one requires for this purpose is not just that there should be some hypotheses, but that they should enable you to compute probabilities for the data, and that requires very well defined hypotheses. For the purpose of applications, I do not think it is enough to consider only the conditional posterior distributions mentioned by Professor Savage.
LINDLEY: I am surprised at what seems to me an obvious red herring that Professor Barnard has drawn across the discussion of hypotheses. I would have thought that when one says this posterior distribution is such and such, all it means is that among the hypotheses that have been suggested the relevant probabilities are such and such; conditionally on the fact that there is nothing new, here is the posterior distribution. If somebody comes along tomorrow with a brilliant new hypothesis, well of course we bring it in.
BARTLETT: But you would be inconsistent because your prior probability would be zero one day and non-zero another.
LINDLEY: No, it is not zero. My prior probability for other hypotheses may be ε. All I am saying is that conditionally on the other 1 – ε, the distribution is as it is.
BARNARD: Yes, but your normalization factor is now determined by ε. Of course ε may be anything up to 1. Choice of letter has an emotional significance.
LINDLEY: I do not care what it is as long as it is not one.
BARNARD: In that event two things happen. One is that the normalization has gone west, and hence also this alleged advantage over likelihood. Secondly, you are not in a position to say that the posterior probability which you attach to an hypothesis from an experiment with these unspecified alternatives is in any way comparable with another probability attached to another hypothesis from another experiment with another set of possibly unspecified alternatives. This is the difficulty over likelihood. Likelihood in one class of experiments may not be comparable to likelihood from another class of experiments, because of differences of metric and all sorts of other differences. But I think that you are in exactly the same difficulty with conditional probabilities just because they are conditional on your having thought of a certain set of alternatives. It is not rational in other words. Suppose I come out with a probability of a third that the penny is unbiased, having considered a certain set of alternatives. Now I do another experiment on another penny and I come out of that case with the probability one third that it is unbiased, having considered yet another set of alternatives. There is no reason why I should agree or disagree in my final action or inference in the two cases. I can do one thing in one case and other in another, because they represent conditional probabilities leaving aside possibly different events.
LINDLEY: All probabilities are conditional.
BARNARD: I agree.
LINDLEY: If there are only conditional ones, what is the point at issue?
PROFESSOR E.S. PEARSON: I suggest that you start by knowing perfectly well that they are conditional and when you come to the answer you forget about it.
BARNARD: The difficulty is that you are suggesting the use of probability for inference, and this makes us able to compare different sets of evidence. Now you can only compare probabilities on different sets of evidence if those probabilities are conditional on the same set of assumptions. If they are not conditional on the same set of assumptions they are not necessarily in any way comparable.
LINDLEY: Yes, if this probability is a third conditional on that, and if a second probability is a third, conditional on something else, a third still means the same thing. I would be prepared to take my bets at 2 to 1.
BARNARD: Only if you knew that the condition was true, but you do not.
GOOD: Make a conditional bet.
BARNARD: You can make a conditional bet, but that is not what we are aiming at.
WINSTEN: You are making a cross comparison where you do not really want to, if you have got different sets of initial experiments. One does not want to be driven into a situation where one has to say that everything with a probability of a third has an equal degree of credence. I think this is what Professor Barnard has really said.
BARNARD: It seems to me that likelihood would tell you that you lay 2 to 1 in favour of H_{1} against H_{2}, and the conditional probabilities would be exactly the same. Likelihood will not tell you what odds you should lay in favour of H_{1} as against the rest of the universe. Probability claims to do that, and it is the only thing that probability can do that likelihood cannot.
[i]Anyone who thinks we really want a Bayesian probability assignment to a hypothesis must come to the grips that it depends on having a catchall factor-of all possible hypotheses that could explain the data-and the probability of data given “something else”. This is unrealistic Barnard is telling Savage,an when something new enters, our original probability assessments are wrong. In their attempts to get the “catchall factor” to disappear, most probabilists appeal to comparative assessments–likelihood ratios or Bayes’ factors. Several key problems remain: (i) the appraisal is always relative to the choice of alternative, and this allows “favoring” one or the other hypothesis, without being able to say there is evidence for either; (ii) although the hypotheses are not exhaustive, many give priors to the null and alternative that sum to 1 (iii) the ratios do not have the same evidential meaning in different cases (what’s high? 10, 50, 800?), and (iv) there’s a lack of control of the probability of misleading interpretations, except with predesignated point against point hypotheses or special cases (this is why Barnard later rejected the Likelihood Principle). You can read the rest of pages 78-103 of the Savage Forum here. This exchange was first blogged here. Share your comments.
References
Please comment on the article here: Statistics – Error Statistics Philosophy
The post G.A. Barnard: The “catch-all” factor: probability vs likelihood appeared first on All About Statistics.
]]>Kevin Lewis sent along this paper from the Creativity Research Journal: Further Evidence that Creativity and Innovation are Inhibited by Conservative Thinking: Analyses of the 2016 Presidential Election The investigation replicated and extended previous research showing a negative relationship between conservatism and creative accomplishment. Conservatism was estimated, as in previous research, from voting patterns. The […]
The post Further evidence that creativity and innovation are stimulated by college sports: Evidence from a big regression appeared first on Statistical Modeling, Causal Inference, and Social Science.
The post Further evidence that creativity and innovation are stimulated by college sports: Evidence from a big regression appeared first on All About Statistics.
]]>Kevin Lewis sent along this paper from the Creativity Research Journal:
Further Evidence that Creativity and Innovation are Inhibited by Conservative Thinking: Analyses of the 2016 Presidential Election
The investigation replicated and extended previous research showing a negative relationship between conservatism and creative accomplishment. Conservatism was estimated, as in previous research, from voting patterns. The voting data used here were from the 2016 US Presidential election. The number of patents granted per county in the United States was used as estimate of creative and innovative accomplishment. Using a 2-level multilevel approach, in which state-level influences are taken into consideration, various control variables were tested, including socioeconomic status (SES), education, income, and diversity. The results confirmed a negative relationship between conservatism and the number of patents granted. Therefore, in counties and states with high conservatism, fewer patents were granted, even after controlling for SES and population. Patents were positively related to racial diversity and education. Practical implications include the benefits of liberal thinking outside of the political arena. Liberal thinking is very likely associated with flexibility, tolerance, and openness, and according to the present results, creative accomplishment. Limitations of the research and future directions are discussed.
I’d really like to think this a parody but it just might well be serious. I wonder what Susan T. Fiske would think of it. On one hand, it’s ridiculous. On the other hand, it’s a peer-reviewed publication with p less than 0.05 so it’s got to be true. It’s a tough call.
Meanwhile, I have an idea that, outside of certain big cities, the number of patents in a county is associated with the presence of college sports teams. I conjecture that the presence of college sports stimulates the sort of creative thinking. Somebody get a p-value on that, ok?
P.S. Some would say it’s uncool to mock a paper in an obscure journal. And, indeed, I’ve tried to show some courtesy by not mentioning the authors’ names, as I’m sure they’re doing their best (or perhaps doing an awesome parody; it’s hard for me to tell as the article is beyond a paywall). But I don’t buy the argument that we should be gentle on bad research just cos it’s in an obscure publication. I’ve published in obscure journals myself, and I still hope these articles I’m writing can make some positive contribution. If you really don’t want your work criticized in public, you shouldn’t publish it at all. Or, to put it another way, if you’re ok with citations and positive press, you should be able to handle criticism.
The post Further evidence that creativity and innovation are stimulated by college sports: Evidence from a big regression appeared first on Statistical Modeling, Causal Inference, and Social Science.
Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science
The post Further evidence that creativity and innovation are stimulated by college sports: Evidence from a big regression appeared first on All About Statistics.
]]>The post Egalitarian LASSO for Forecast Combination appeared first on All About Statistics.
]]>
Please comment on the article here: No Hesitations
The post Egalitarian LASSO for Forecast Combination appeared first on All About Statistics.
]]>Chess records page (no, not on the first page, or the second page, or the third page, of a google search of *chess records*). There’s lots of good stuff here, enough to fill much of a book if you so desire. As we’ve discussed, chess games are in the public domain so if you take […]
The post Chess records page appeared first on Statistical Modeling, Causal Inference, and Social Science.
The post Chess records page appeared first on All About Statistics.
]]>Chess records page (no, not on the first page, or the second page, or the third page, of a google search of *chess records*).
There’s lots of good stuff here, enough to fill much of a book if you so desire. As we’ve discussed, chess games are in the public domain so if you take material on chess games from an existing book or website without crediting the person who compiled this material, you’re not actually plagiarizing.
The post Chess records page appeared first on Statistical Modeling, Causal Inference, and Social Science.
Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science
The post Chess records page appeared first on All About Statistics.
]]>The post Upcoming data preparation and modeling article series appeared first on All About Statistics.
]]>I am pleased to announce that vtreat
version 0.6.0 is now available to R
users on CRAN.
vtreat
is an excellent way to prepare data for machine learning, statistical inference, and predictive analytic projects. If you are an R
user we strongly suggest you incorporate vtreat
into your projects.
vtreat
handles, in a statistically sound fashion:
In our (biased) opinion vtreat
has the best methodology and documentation for these important data cleaning and preparation steps. vtreat
‘s current public open-source implementation is for in-memory R
analysis (we are considering ports and certifying ports of the package some time in the future, possibly for: data.table
, Spark
, Python
/Pandas
, and SQL
).
vtreat
brings a lot of power, sophistication, and convenience to your analyses, without a lot of trouble.
A new feature of vtreat
version 0.6.0 is called “custom coders.” Win-Vector LLC‘s Dr. Nina Zumel is going to start a short article series to show how this new interface can be used to extend vtreat
methodology to include the very powerful method of partial pooled inference (a term she will spend some time clearly defining and explaining). Time permitting, we may continue with articles on other applications of custom coding including: ordinal/faithful coders, monotone coders, unimodal coders, and set-valued coders.
Please help us share and promote this article series, which should start in a couple of days. This should be a fun chance to share very powerful methods with your colleagues.
Please comment on the article here: Statistics – Win-Vector Blog
The post Upcoming data preparation and modeling article series appeared first on All About Statistics.
]]>Cesare Aloisi writes: I am writing you regarding something I recently stumbled upon in your book Data Analysis Using Regression and Multilevel/Hierarchical Models which confused me, in hopes you could help me understand it. This book has been my reference guide for many years now, and I am extremely grateful for everything I learnt from […]
The post Getting the right uncertainties when fitting multilevel models appeared first on Statistical Modeling, Causal Inference, and Social Science.
The post Getting the right uncertainties when fitting multilevel models appeared first on All About Statistics.
]]>Cesare Aloisi writes:
I am writing you regarding something I recently stumbled upon in your book Data Analysis Using Regression and Multilevel/Hierarchical Models which confused me, in hopes you could help me understand it. This book has been my reference guide for many years now, and I am extremely grateful for everything I learnt from you.
On page 261, a 95% confidence interval for the intercept in a certain group (County 26) is calculated using only the standard error of the “random effect” (the county-level error). The string is as follows:
coef(M1)$county[26,1] + c(-2,2)*se.ranef(M1)$county[26]
My understanding is that, since the group-level prediction (call it y.hat_j = coef(M1)$county[26,1]) is a linear combination of a global average and a group-level deviation from the average (y.hat_j = beta_0 + eta_j), then the variance of y.hat_j should be the sum of the covariances of beta_0 and eta_j, not just the variance of eta_j, as the code on page 261 seems to imply. In other words:
Var(y.hat_j) = Var(beta_0) + Var(eta_j) + 2Cov(beta_0, eta_j)
Admittedly, lme4 does not provide an estimate for the last term, the covariance between “fixed” and “random” effects. Was the code used in the book to simplify the calculations, or was there some deeper reason to it that I failed to grasp?
My reply: The short answer is that it’s difficult to get this correct in lmer but very easy when using stan_lmer() in the rstanarm package. That’s what I recommend, and that’s what we’ll be doing in the 2nd edition of our book.
The post Getting the right uncertainties when fitting multilevel models appeared first on Statistical Modeling, Causal Inference, and Social Science.
Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science
The post Getting the right uncertainties when fitting multilevel models appeared first on All About Statistics.
]]>This week (and a bit from last week) in Stan: Paul-Christian Bürkner‘s paper on brms (a higher-level interface to RStan, which preceded rstanarm and is still widely used and recommended by our own devs) was just published as a JStatSoft article. If you follow the link, the abstract explains what brms does. Ben Goodrich and […]
The post Stan Weekly Roundup, 22 September 2017 appeared first on Statistical Modeling, Causal Inference, and Social Science.
The post Stan Weekly Roundup, 22 September 2017 appeared first on All About Statistics.
]]>This week (and a bit from last week) in Stan:
(video links added by Aki)
The post Stan Weekly Roundup, 22 September 2017 appeared first on Statistical Modeling, Causal Inference, and Social Science.
Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science
The post Stan Weekly Roundup, 22 September 2017 appeared first on All About Statistics.
]]>The post Misclassification in Binary Choice Models appeared first on All About Statistics.
]]>".....the literature has established that misclassification is pervasive and affects estimates, but not how it affects them or what can still be done with contaminated data. This paper characterizes the consequences of misclassification of the dependent variable in binary choice models and assesses whether substantive conclusions can still be drawn from the observed data and if so, which methods to do so work well. We first present a closed form solution for the bias in the linear probability model that allows for simple corrections. For non-linear binary choice models such as the Probit model, we decompose the asymptotic bias into four components. We derive closed form expressions for three bias components and an equation that determines the fourth component. The formulas imply that if misclassification is conditionally random, only the probabilities of misclassification are required to obtain the exact bias in the linear probability model and an approximation in the Probit model. If misclassification is related to the covariates, additional information on this relation is required to assess the (asymptotic) bias, but the results still imply a tendency for the bias to be in the opposite direction of the sign of the coefficient."
Please comment on the article here: Econometrics Beat: Dave Giles' Blog
The post Misclassification in Binary Choice Models appeared first on All About Statistics.
]]>So. Marcus Crede, Carol Nickerson, and I published a letter in PPNAS criticizing the notorious “air rage” article. (Due to space limitations, our letter contained only a small subset of the many possible criticisms of that paper.) Our letter was called “Questionable association between front boarding and air rage.” The authors of the original paper, […]
The post Air rage update appeared first on Statistical Modeling, Causal Inference, and Social Science.
The post Air rage update appeared first on All About Statistics.
]]>So. Marcus Crede, Carol Nickerson, and I published a letter in PPNAS criticizing the notorious “air rage” article. (Due to space limitations, our letter contained only a small subset of the many possible criticisms of that paper.) Our letter was called “Questionable association between front boarding and air rage.”
The authors of the original paper, Katherine DeCelles and Michael Norton, published a response in which they concede nothing. They state that their hypotheses are “are predicated on decades of theoretical and empirical support across the social sciences” and they characterize their results as “consistent with theory.” I have no reason to dispute either of these claims, but at the same time these theories are so flexible that they could predict just about anything, including, I suspect, the very opposite of the claims made in the paper. As usual, there’s a confusion between a general scientific theory and some very specific claims regarding regression coefficients in some particular fitted model.
Considering the DeCelles and Norton reply in a context-free sense, it reads as reasonable: yes, it is possible for the signs and magnitudes of estimates to change when adding controls to a regression. The trouble is that their actual data seem to be of low quality, and due to the observational nature of their study, there are lots of interactions not included in the model that are possibly larger than their main effects (for example, interactions of plane configuration with type of flight, interactions with alcohol consumption, nonlinearities in the continuous predictors such as number of seats and flight difference).
The whole thing is interesting in that it reveals the challenge of interpreting this sort of exchange from the outside. how it is possible for researchers to string together paragraphs that have the form or logical argument, in support of whatever claim they’d like to make. Of course someone could say the same about us. . . .
One good thing about slogans such as “correlation does not imply causation” is that they get right to the point.
The post Air rage update appeared first on Statistical Modeling, Causal Inference, and Social Science.
Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science
The post Air rage update appeared first on All About Statistics.
]]>The post My advice on dplyr::mutate() appeared first on All About Statistics.
]]>There are substantial differences between ad-hoc analyses (be they: machine learning research, data science contests, or other demonstrations) and production worthy systems. Roughly: ad-hoc analyses have to be correct only at the moment they are run (and often once they are correct, that is the last time they are run; obviously the idea of reproducible research is an attempt to raise this standard). Production systems have to be durable: they have to remain correct as models, data, packages, users, and environments change over time.
Demonstration systems need merely glow in bright light among friends; production systems must be correct, even alone in the dark.
“Character is what you are in the dark.”
John Whorfin quoting Dwight L. Moody.
I have found: to deliver production worthy data science and predictive analytic systems, one has to develop per-team and per-project field tested recommendations and best practices. This is necessary even when, or especially when, these procedures differ from official doctrine.
What I want to do is share a single small piece of Win-Vector LLC‘s current guidance on using the R
package dplyr
.
dplyr
development. However:
“One need not have been Caesar in order to understand Caesar.”
Alternately: Georg Simmmel or Max Webber.
Win-Vector LLC, as a consultancy, has experience helping large companies deploy enterprise big data solutions involving R
, dplyr
, sparklyr
, and Apache Spark
. Win-Vector LLC, as a training organization, has experience in how new users perceive, reason about, and internalize how to use R
and dplyr
. Our group knows how to help deploy production grade systems, and how to help new users master these systems.
From experience we have distilled a lot of best practices. And below we will share one.
From: “R for Data Science; Whickham, Grolemund; O’Reilly, 2017” we have:
Note that you can refer to columns that you’ve just created:
mutate(flights_sml, gain = arr_delay - dep_delay, hours = air_time / 60, gain_per_hour = gain / hours )
Let’s try that with database backed data:
suppressPackageStartupMessages(library("dplyr")) packageVersion("dplyr") # [1] ‘0.7.3’ db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") flights <- copy_to(db, nycflights13::flights, 'flights') mutate(flights, gain = arr_delay - dep_delay, hours = air_time / 60, gain_per_hour = gain / hours ) # # Source: lazy query [?? x 22] # # Database: sqlite 3.19.3 [:memory:] # year month day dep_time sched_dep_time ... # <int> <int> <int> <int> <int> ... # 1 2013 1 1 517 515 ... # ...
That worked. One of the selling points of dplyr
is a lot of dplyr
is source-generic or source-agnostic: meaning it can be run against different data providers (in-memory, databases, Spark
).
However, if a new user tries to extend such an example (say adding gain_per_minutes
) they run into this:
mutate(flights, gain = arr_delay - dep_delay, hours = air_time / 60, gain_per_hour = gain / hours, gain_per_minute = 60 * gain_per_hour ) # Error in rsqlite_send_query(conn@ptr, statement) : # no such column: gain_per_hour
(Some detail on the failing query are here.)
It is hard for experts to understand how frustrating the above is to a new R
user or to a part time R
user. It feels like any variation on the original code causes it to fail. None of the rules they have been taught anticipate this, or tell them how to get out of this situation.
This quickly leads to strong feelings of learned helplessness and anxiety.
Our rule for dplyr::mutate()
has been for some time:
Each column name used in a single mutate must appear only on the left-hand-side of a single assignment, or otherwise on the right-hand-side of any number of assignments (but never both sides, even if it is different assignments).
Under this rule neither of the above mutate
s are allowed. The second should be written as (switching to pipe-notation):
flights %>% mutate(gain = arr_delay - dep_delay, hours = air_time / 60) %>% mutate(gain_per_hour = gain / hours) %>% mutate(gain_per_minute = 60 * gain_per_hour)
And the above works.
If we teach this rule we can train users to be properly cautious, and hopefully avoid them becoming frustrated, scared, anxious, or angry.
dplyr
documentation (such as “help(mutate)
“) does not strongly commit to what order mutate expressions are executed in, or visibility and durability of intermediate results (i.e., a full description of intended semantics). Our rule intentionally limits the user to a set of circumstances where none of those questions matter.
Now the error we saw above is a mere bug that one expects will be fixed some day (in fact it is dplyr
issue 3095, we looked a bit at the generate queries here). It can be a bit unfair to criticize a package for having a bug.
However, confusion around re-use of column names has been driving dplyr
issues for quite some time:
dplyr
issue 3095dplyr
issue 2884dplyr
issue 2883dplyr
pull 2869dplyr
issue 2842dplyr
pull 2483dplyr
issue 2481dplyr
issue 2360It makes sense to work in a reliable and teachable sub-dialect of dplyr
that will serve users well (or barring that, you can use an adapter, such as seplyr
). In production you must code to what systems are historically reliably capable of, not just the specification. “Works for the instructor” is not an acceptable level of dependability.
Please comment on the article here: Statistics – Win-Vector Blog
The post My advice on dplyr::mutate() appeared first on All About Statistics.
]]>