Here it is. It’s not always clear what people mean by this expression, but sometimes it seems that they’re making the “What does not kill my statistical significance makes it stronger” fallacy, thinking that the attainment of statistical significance is a particular feat in the context of a noisy study, so that they’re (mistakenly) thinking […]

The post Just google “Despite limited statistical power” appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post Just google “Despite limited statistical power” appeared first on All About Statistics.

]]>Here it is. It’s not always clear what people mean by this expression, but sometimes it seems that they’re making the “What does not kill my statistical significance makes it stronger” fallacy, thinking that the attainment of statistical significance is a particular feat in the context of a noisy study, so that they’re (mistakenly) thinking of the “limited statistical power” of that study as a further point in favor of their argument.

More from Eric Loken and me here.

The post Just google “Despite limited statistical power” appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**

The post Just google “Despite limited statistical power” appeared first on All About Statistics.

]]>The post Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R appeared first on All About Statistics.

]]>I often create character variables (i.e. variables with strings of text as their values) in SAS, and they sometimes don’t render as expected. Here is an example involving the built-in data set SASHELP.CLASS.

Here is the code:

data c1; set sashelp.class; * define a new character variable to classify someone as tall or short; if height > 60 then height_class = 'Tall'; else height_class = 'Short'; run; * print the results for the first 5 rows; proc print data = c1 (obs = 5); run;

Here is the result:

Obs | Name | Sex | Age | Height | Weight | height_class |
---|---|---|---|---|---|---|

1 | Alfred | M | 14 | 69.0 | 112.5 | Tall |

2 | Alice | F | 13 | 56.5 | 84.0 | Shor |

3 | Barbara | F | 13 | 65.3 | 98.0 | Tall |

4 | Carol | F | 14 | 62.8 | 102.5 | Tall |

5 | Henry | M | 14 | 63.5 | 102.5 | Tall |

What happened? Why does the word “Short” render as “Shor”?

This occurred because SAS sets the length of a new character variable as the length of the first value given in its definition. My code defined “height_class” by setting the value “Tall” first, which has a length of 4. Thus, “height_class” was defined as a character variable with a length of 4. Any subsequent values must follow this variable type and format.

How can we circumvent this? You can pre-set the length of any new variable with the LENGTH statement before the SET statement. In the revised code below, I correct the problem by setting the length of “height_class” to 5 before defining its possible values.

data c2; set sashelp.class; * define a new character variable to classify someone as tall or short;length height_class $ 5;if height > 60 then height_class = 'Tall'; else height_class = 'Short'; run; * print the results for the first 5 rows; proc print data = c2 (obs = 5); run;

Here is the result:

Obs | Name | Sex | Age | Height | Weight | height_class |
---|---|---|---|---|---|---|

1 | Alfred | M | 14 | 69.0 | 112.5 | Tall |

2 | Alice | F | 13 | 56.5 | 84.0 | Short |

3 | Barbara | F | 13 | 65.3 | 98.0 | Tall |

4 | Carol | F | 14 | 62.8 | 102.5 | Tall |

5 | Henry | M | 14 | 63.5 | 102.5 | Tall |

Notice that “height_class” for Alice is “Short”, as it should be.

An alternative solution is to re-write the code so that the first instance of “height_class” is the longest possible value. This does not require the use of the LENGTH statement.

data c3; set sashelp.class; * define a new character variable to classify someone as tall or short; if height < 60 then height_class = 'Short'; else height_class = 'Tall'; run;

By the way, I don’t notice this problem in R. Here is some code to illustrate this observation.

> set.seed(235) > > # randomly generate 4 values > x = rnorm(3, 60, 5) > > # add a value to the beginning of "x" so that the first value is above 60 > # add a value to the end of "x" so that the last vlaue is below 60 > x = c(63, x, 57) > x [1] 63.00000 70.68902 61.36082 56.62601 57.00000 > > # pre-allocate a vector for classifying "x" as "tall" or "short" > y = 0 * x > > > for (i in 1:length(x)) + { + if (x[i] > 60) + { + y[i] = 'Tall' + } + else + { + y[i] = 'Short' + } + } > > > # display "y" > y [1] "Tall" "Tall" "Tall" "Short" "Short"

Notice that the value “Short” renders fully with a length of 5. I did not need to pre-set the length of “y” first.

Filed under: Categorical Data Analysis, Data Analysis, R programming, SAS Programming, Statistics, Tutorials Tagged: categorical data, categorical variable, character data, character variable, length(), R, r programing, SAS, sas programming

**Please comment on the article here:** **Statistics – The Chemical Statistician**

The post Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R appeared first on All About Statistics.

]]>The post Thank You For The Very Nice Comment appeared first on All About Statistics.

]]>Somebody nice reached out and gave us this wonderful feedback on our new Supervised Learning in R: Regression (paid) video course.

Thanks for a wonderful course on DataCamp on

`XGBoost`

and`Random forest`

. I was struggling with`Xgboost`

earlier and`Vtreat`

has made my life easy now :).

Supervised Learning in R: Regression covers a *lot* as it treats predicting probabilities as a type of regression. Nina and I are very proud of this course and think it is very much worth your time (for the beginning through advanced `R`

user).

`vtreat`

is a statistically sound data cleaning and preparation tool introduced towards the end of the course. `R`

users who try `vtreat`

find it makes training and applying models *much* easier.

`vtreat`

is distributed as a free open-source package available on `CRAN`

. If you are doing predictive modeling in `R`

I honestly think you will find `vtreat`

invaluable.

And to the person who took the time to write the nice note. A sincere thank you from both Nina Zumel and myself. That kind of interaction really makes developing courses and packages feel worthwhile.

**Please comment on the article here:** **Statistics – Win-Vector Blog**

The post Thank You For The Very Nice Comment appeared first on All About Statistics.

]]>Here’s James Heckman in 2013: Also holding back progress are those who claim that Perry and ABC are experiments with samples too small to accurately predict widespread impact and return on investment. This is a nonsensical argument. Their relatively small sample sizes actually speak for — not against — the strength of their findings. Dramatic […]

The post Also holding back progress are those who make mistakes and then label correct arguments as “nonsensical.” appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post Also holding back progress are those who make mistakes and then label correct arguments as “nonsensical.” appeared first on All About Statistics.

]]>Here’s James Heckman in 2013:

Also holding back progress are those who claim that Perry and ABC are experiments with samples too small to accurately predict widespread impact and return on investment. This is a nonsensical argument. Their relatively small sample sizes actually speak for — not against — the strength of their findings. Dramatic differences between treatment and control-group outcomes are usually not found in small sample experiments, yet the differences in Perry and ABC are big and consistent in rigorous analyses of these data.

Wow. The “What does not kill my statistical significance makes it stronger” fallacy, right there in black and white. This one’s even better than the quote I used in my blog post. Heckman’s pretty much saying that if his results are statistically significant (and “consistent in rigorous analyses,” whatever that means) that they should be believed—and even more so if sample sizes are small (and of course the same argument holds in favor of stronger belief if measurement error is large).

With the extra special bonus that he’s labeling contrary arguments as “nonsensical.”

I agree with Stuart Buck that Heckman is wrong here. Actually, the smaller sample sizes (and also the high variation in these studies) speaks against—not for—the strength of the published claims.

Hey, we all make mistakes. Selection bias is a tricky thing, and it can confuse even some eminent econometricians. What’s important is that we can learn from them, as I hope Heckman and his collaborators have learned from the savvy critiques of Stuart Buck and others.

**P.S.** These ideas are not trivial, but they’re not super-technical either. You, blog reader, can follow the links and think things through and realize that James Heckman, Nobel prize winner, was wrong here.

What’s the difference between you and James Heckman, Nobel prize winner? It’s very simple. It’s not that you’re better at math than James Heckman, Nobel prize winner, or that you know more about early childhood education, or that you have a deeper understanding of statistics than he does. Maybe you do, maybe you don’t.

The difference—and it’s almost the only difference that matters here—is that you’re willing to consider the possibility that you might be wrong. And Heckman isn’t willing to consider that possibility. Or he hasn’t been. It’s never too late for him to change, though.

The post Also holding back progress are those who make mistakes and then label correct arguments as “nonsensical.” appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**

The post Also holding back progress are those who make mistakes and then label correct arguments as “nonsensical.” appeared first on All About Statistics.

]]>[update: back up. whew. back to our regularly scheduled programming.] [update: just talked to our registrar on the phone and they say it’ll probably take an hour or two for the DNS to catch up again, but then everything should be OK. I would highly recommend PairNIC—their support was awesome.] mc-stan.org is down because I […]

The post mc-stan.org down & single points of failure appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post mc-stan.org down & single points of failure appeared first on All About Statistics.

]]>**[update: back up. whew. back to our regularly scheduled programming.]**

*[update: just talked to our registrar on the phone and they say it’ll probably take an hour or two for the DNS to catch up again, but then everything should be OK. I would highly recommend PairNIC—their support was awesome.]*

mc-stan.org is down because I (Bob) forgot to renew the domain. I just renewed it, so at least we won’t lose it. Hopefully everything will start routing again—the content is still there to be served on GitHub.

**5-year anniversary**

It was registered 5 years ago, roughly concurrently with our first release in August 2012. Time sure flies.

It just lapsed today, but I was able to renew. We’ll get this under better management going forward with the rest of our assets.

**Backup web site?**

I don’t know if we can run mc-stan.github.io in the meantime—we should be able to, but I don’t know how to configure that or what the switching back and forth times are, so it may not be worth it.

**Single points of failure**

This is really terrible for us since almost everything we do is now run out of the web site (other than GitHub, so we can keep coding, but our users can’t get anything).

The post mc-stan.org down & single points of failure appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**

The post mc-stan.org down & single points of failure appeared first on All About Statistics.

]]>The post Gelman digested read appeared first on All About Statistics.

]]>It's hard to keep up with Andrew Gelman, so let me point to some interesting recent posts from his blog.

Readings on philosophy of statistics (link): Andrew has a bunch of links of (mostly his own) writings about deep statistical issues. Science is about understanding how the world works, which involves questions of cause and effect, and randomness and unexplained variability. Data that can be observed are almost never sufficient to establish cause decisively but statistical theories can be drawn upon to make careful, principled conjectures. These statistical methods are not infallible, and are subject to abuses, both malign and unintentional. Recent work has uncovered that lots of results from all kinds of fields (psychology, social psychology, evolutionary psychology, medicine, cancer studies, etc.) cannot be replicated, raising concerns about abuses. Andrew - as well as commentators - compile a list of readings for those interested in this ongoing controversy.

An elementary error showing up in JAMA (link): misinterpreting p-values - every elementary textbook warns against such erroneous claims

Something for bird lovers (link) and for cat lovers (link)

Another one on Gelman's favorite subject - the garden of forking paths leading to over-confident statistical conclusions. I once summarized his arguments in a series of posts: 1, 2, 3

Some commentary on Mechanical Turk and the general issue of measurement and data quality (link) This is an important topic in Big Data. I will be writing about a study that looks at weather as the explanatory variable. Weather is derived by looking up someone's IP address and then the weather report at that IP address. One should ask how accurate the measurement of weather was for this study.

**Please comment on the article here:** **Big Data, Plainly Spoken (aka Numbers Rule Your World)**

The post Gelman digested read appeared first on All About Statistics.

]]>Visualizing the correlations between variables often provides insight into the relationships between variables. I've previously written about how to use a heat map to visualize a correlation matrix in SAS/IML, and Chris Hemedinger showed how to use Base SAS to visualize correlations between variables. Recently a SAS programmer asked how [...]

The post Use a bar chart to visualize pairwise correlations appeared first on The DO Loop.

The post Use a bar chart to visualize pairwise correlations appeared first on All About Statistics.

]]>Visualizing the correlations between variables often provides insight into the relationships between variables. I've previously written about how to use a heat map to visualize a correlation matrix in SAS/IML, and Chris Hemedinger showed how to use Base SAS to visualize correlations between variables.

Recently a SAS programmer asked how to construct a bar chart that displays the pairwise correlations between variables. This visualization enables you to quickly identify pairs of variables that have large negative correlations, large positive correlations, and insignificant correlations.

In SAS, PROC CORR can computes the correlations between variables, which are stored in matrix form in the output data set. The following call to PROC CORR analyzes the correlations between all pairs of numeric variables in the Sashelp.Heart data set, which contains data for 5,209 patients in a medical study of heart disease. Because of missing values, some pairwise correlations use more observations than others.

ods exclude all; proc corr data=sashelp.Heart; /* pairwise correlation */ var _NUMERIC_; ods output PearsonCorr = Corr; /* write correlations, p-values, and sample sizes to data set */ run; ods exclude none; |

The CORR data set contains the correlation matrix, p-values, and samples sizes. The statistics are stored in "wide form," with few rows and many columns. As I previously discussed, you can use the HEATMAPCONT subroutine in SAS/IML to quickly visualize the correlation matrix:

proc iml; use Corr; read all var "Variable" into ColNames; /* get names of variables */ read all var (ColNames) into mCorr; /* matrix of correlations */ ProbNames = "P"+ColNames; /* variables for p-values are named PX, PY, PZ, etc */ read all var (ProbNames) into mProb; /* matrix of p-values */ close Corr; call HeatmapCont(mCorr) xvalues=ColNames yvalues=ColNames colorramp="ThreeColor" range={-1 1} title="Pairwise Correlation Matrix"; |

The heat map gives an overall impression of the correlations between variables, but it has some shortcomings. First, you can't determine the magnitudes of the correlations with much precision. Second, it is difficult to compare the relative sizes of correlations. For example, which is stronger: the correlation between systolic and diastolic blood pressure or the correlation between weight and MRW? (MRW is a body-weight index.)

These shortcomings are resolved if you present the pairwise correlations as a bar chart. To create a bar chart, it is necessary to convert the output into "long form." Each row in the new data set will represent a pairwise correlation. To identify the row, you should also create a new variable that identifies the two variables whose correlation is represented. Because the correlation matrix is symmetric and has 1 on the diagonal, the long-form data set only needs the statistics for the lower-triangular portion of the correlation matrix.

Let's extract the data in SAS/IML. The following statements construct a new ID variable that identifies each new row and extract the correlations and p-values for the lower-triangular elements. The statistics are written to a SAS data set called CorrPairs. (In Base SAS, you can transform the lower-triangular statistics by using the DATA step and arrays, similar to the approach in this SAS note; feel free to post your Base SAS code in the comments.)

numCols = ncol(mCorr); /* number of variables */ numPairs = numCols*(numCols-1) / 2; length = 2*nleng(ColNames) + 5; /* max length of new ID variable */ PairNames = j(NumPairs, 1, BlankStr(length)); i = 1; do row= 2 to numCols; /* construct the pairwise names */ do col = 1 to row-1; PairNames[i] = strip(ColNames[col]) + " vs. " + strip(ColNames[row]); i = i + 1; end; end; lowerIdx = loc(row(mCorr) > col(mCorr)); /* indices of lower-triangular elements */ Corr = mCorr[ lowerIdx ]; Prob = mProb[ lowerIdx ]; Significant = choose(Prob > 0.05, "No ", "Yes"); /* use alpha=0.05 signif level */ create CorrPairs var {"PairNames" "Corr" "Prob" "Significant"}; append; close; QUIT; |

You can use the HBAR statement in PROC SGPLOT to construct the bar chart. This bar chart contains 45 rows, so you need to make the graph tall and use a small font to fit all the labels without overlapping. The call to PROC SORT and the DISCRETEORDER=DATA option on the YAXIS statement ensure that the categories are displayed in order of increasing correlation.

proc sort data=CorrPairs; by Corr; run; ods graphics / width=600px height=800px; title "Pairwise Correlations"; proc sgplot data=CorrPairs; hbar PairNames / response=Corr group=Significant; refline 0 / axis=x; yaxis discreteorder=data display=(nolabel) labelattrs=(size=6pt) fitpolicy=none offsetmin=0.012 offsetmax=0.012 /* half of 1/k, where k=number of catgories */ colorbands=even colorbandsattrs=(color=gray transparency=0.9); xaxis grid display=(nolabel); keylegend / position=topright location=inside across=1; run; |

The bar chart (click to enlarge) enables you to see which pairs of variables are highly correlated (positively and negatively) and which have correlations that are not significantly different from 0. You can use additional colors or reference lines if you want to visually emphasize other features, such as the correlations that are larger than 0.25 in absolute value.

The bar chart is not perfect. This example, which analyzes 10 variables, is very tall with 45 rows. Among *k* variables there are *k*(*k*-1)/2 correlations, so the
number of pairwise correlations (rows) increases quadratically with the number of variables.
In practice, this chart would be unreasonably tall when there are 14 or 15 variables (about 100 rows).

Nevertheless, for 10 or fewer variables, a bar chart of the pairwise correlations provides an alternative visualization that has some advantages over a heat map of the correlation matrix. What do you think? Would this graph be useful in your work? Leave a comment.

The post Use a bar chart to visualize pairwise correlations appeared first on The DO Loop.

**Please comment on the article here:** **The DO Loop**

The post Use a bar chart to visualize pairwise correlations appeared first on All About Statistics.

]]>The post Performance or Probativeness? E.S. Pearson’s Statistical Philosophy appeared first on All About Statistics.

]]>This is a belated birthday post for E.S. Pearson (11 August 1895-12 June, 1980). It’s basically a post from 2012 which concerns an issue of interpretation (long-run performance vs probativeness) that’s badly confused these days. I’ll blog some E. Pearson items this week, including, my latest reflection on a historical anecdote regarding Egon and the woman he wanted marry, and surely would have, were it not for his father Karl!

**HAPPY BELATED BIRTHDAY EGON!**

Are methods based on error probabilities of use mainly to supply procedures which will not err too frequently in some long run? (*performance*). Or is it the other way round: that the control of long run error properties are of crucial importance for probing the causes of the data at hand? (*probativeness*). I say no to the former and yes to the latter. This, I think, was also the view of Egon Sharpe (E.S.) Pearson.

*Cases of Type A and Type B*

“How far then, can one go in giving precision to a philosophy of statistical inference?” (Pearson 1947, 172)

Pearson considers the rationale that might be given to N-P tests in two types of cases, A and B:

“(A) At one extreme we have the case where repeated decisions must be made on results obtained from some routine procedure…

(B) At the other is the situation where statistical tools are applied to an isolated investigation of considerable importance…?” (ibid., 170)

In cases of type A, long-run results are clearly of interest, while in cases of type B, repetition is impossible and may be irrelevant:

“In other and, no doubt, more numerous cases there is no repetition of the same type of trial or experiment, but all the same we can and many of us do use the same test rules to guide our decision, following the analysis of an isolated set of numerical data. Why do we do this? What are the springs of decision? Is it because the formulation of the case in terms of hypothetical repetition helps to that clarity of view needed for sound judgment?

Or is it because we are content that the application of a rule, now in this investigation, now in that, should result in a long-run frequency of errors in judgment which we control at a low figure?” (Ibid., 173)

Although Pearson leaves this tantalizing question unanswered, claiming, “On this I should not care to dogmatize”, in studying how Pearson treats cases of type B, it is evident that in his view, “the formulation of the case in terms of hypothetical repetition helps to that clarity of view needed for sound judgment” in learning about the particular case at hand.

“Whereas when tackling problem A it is easy to convince the practical man of the value of a probability construct related to frequency of occurrence, in problem B the argument that ‘if we were to repeatedly do so and so, such and such result would follow in the long run’ is at once met by the commonsense answer that we never should carry out a precisely similar trial again.

Nevertheless, it is clear that the scientist with a knowledge of statistical method behind him can make his contribution to a round-table discussion…” (Ibid., 171).

Pearson gives the following example of a case of type B (from his wartime work), where he claims no repetition is intended:

“Example of type B.Two types of heavy armour-piercing naval shell of the same caliber are under consideration; they may be of different design or made by different firms…. Twelve shells of one kind and eight of the other have been fired; two of the former and five of the latter failed to perforate the plate….”(Pearson 1947, 171)“Starting from the basis that, individual shells will never be identical in armour-piercing qualities, however good the control of production, he has to consider how much of the difference between (i) two failures out of twelve and (ii) five failures out of eight is likely to be due to this inevitable variability. ..”(Ibid.,)

*We’re interested in considering what other outcomes could have occurred, and how readily, in order to learn what variability alone is capable of producing.* As a noteworthy aside, Pearson shows that treating the observed difference (between the two proportions) in one way yields an observed significance level of 0.052; treating it differently (along Barnard’s lines), he gets 0.025 as the (upper) significance level. But in scientific cases, Pearson insists, the difference in error probabilities makes no real difference to substantive judgments in interpreting the results. Only in an unthinking, automatic, routine use of tests would it matter:

“Were the action taken to be decided automatically by the side of the 5% level on which the observation point fell, it is clear that the method of analysis used would here be of vital importance. But no responsible statistician, faced with an investigation of this character, would follow an automatic probability rule.” (ibid., 192)

The two analyses correspond to the tests effectively asking different questions, and if we recognize this, says Pearson, different meanings may be appropriately attached.

*Three Steps in the Original Construction of Tests*

After setting up the test (or null) hypothesis, and the alternative hypotheses against which “we wish the test to have maximum discriminating power” (Pearson 1947, 173), Pearson defines three steps in specifying tests:

“Step 1. We must specify the experimental probability set, the set of results which could follow on repeated application of the random process used in the collection of the data…

Step 2. We then divide this set [of possible results] by a system of ordered boundaries…such that as we pass across one boundary and proceed to the next, we come to a class of results which makes us more and more inclined on the Information available, to reject the hypothesis tested in favour of alternatives which differ from it by increasing amounts”.

“Step 3. We then, if possible[i], associate with each contour level the chance that, if [the null] is true, a result will occur in random sampling lying beyond that level” (ibid.).

Pearson warns that:

“Although the mathematical procedure may put Step 3 before 2, we cannot put this into operation before we have decided, under Step 2, on the guiding principle to be used in choosing the contour system. That is why I have numbered the steps in this order.” (Ibid. 173).

Strict behavioristic formulations jump from step 1 to step 3, after which one may calculate how the test has in effect accomplished step 2. However, the resulting test, while having adequate error probabilities, may have an inadequate distance measure and may even be irrelevant to the hypothesis of interest. This is one reason critics can construct howlers that appear to be licensed by N-P methods, and which make their way from time to time into this blog.

So step 3 remains crucial, even for cases of type [B]. There are two reasons: pre-data planning—that’s familiar enough—but secondly, for post-data scrutiny. Post data, step 3 enables determining the capability of the test to have detected various discrepancies, departures, and errors, on which a critical scrutiny of the inferences are based. More specifically, the error probabilities are used to determine how well/poorly corroborated, or how severely tested, various claims are, post-data.

If we can readily bring about statistically significantly higher rates of success with the first type of armour-piercing naval shell than with the second (in the above example), we have evidence the first is superior. Or, as Pearson modestly puts it: the results “raise considerable doubts as to whether the performance of the [second] type of shell was as good as that of the [first]….” (Ibid., 192)[ii]

Still, while error rates of procedures may be used to determine how severely claims have/have not passed they do not automatically do so—hence, again, opening the door to potential howlers that neither Egon nor Jerzy for that matter would have countenanced.

*Neyman Was the More Behavioristic of the Two*

Pearson was (rightly) considered to have rejected the more behaviorist leanings of Neyman.

Here’s a snippet from an unpublished letter he wrote to Birnbaum (1974) about the idea that the N-P theory admits of two interpretations: behavioral and evidential:

“I think you will pick up here and there in my own papers signs of evidentiality, and you can say now that we or I should have stated clearly the difference between the behavioral and evidential interpretations. Certainly we have suffered since in the way the people have concentrated (to an absurd extent often) on behavioral interpretations”.

In Pearson’s (1955) response to Fisher (blogged here):

“To dispel the picture of the Russian technological bogey, I might recall how certain early ideas came into my head as I sat on a gate overlooking an experimental blackcurrant plot….!” (Pearson 1955, 204)

“To the best of my ability I was searching for a way of expressing in mathematical terms what appeared to me to be the requirements of the scientist in applying statistical tests to his data. After contact was made with Neyman in 1926, the development of a joint mathematical theory proceeded much more surely; it was not till after the main lines of this theory had taken shape with its necessary formalization in terms of critical regions, the class of admissible hypotheses, the two sources of error, the power function, etc., that the fact that there was a remarkable parallelism of ideas in the field of acceptance sampling became apparent. Abraham Wald’s contributions to decision theory of ten to fifteen years later were perhaps strongly influenced by acceptance sampling problems, but that is another story.“ (ibid., 204-5).

“It may be readily agreed that in the first Neyman and Pearson paper of 1928, more space might have been given to discussing how the scientific worker’s attitude of mind could be related to the formal structure of the mathematical probability theory….Nevertheless it should be clear from the first paragraph of this paper that we were not speaking of the final acceptance or rejection of a scientific hypothesis on the basis of statistical analysis…. Indeed, from the start we shared Professor Fisher’s view that in scientific enquiry, a statistical test is ‘a means of learning”… (Ibid., 206)

“Professor Fisher’s final criticism concerns the use of the term ‘inductive behavior’; this is Professor Neyman’s field rather than mine.” (Ibid., 207)

__________________________

** **

**References:**

Pearson, E. S. (1947), “The choice of Statistical Tests illustrated on the Interpretation of Data Classed in a 2×2 Table,” *Biometrika* 34(1/2): 139-167.

Pearson, E. S. (1955), “Statistical Concepts and Their Relationship to Reality” *Journal of the Royal Statistical Society, Series B, (Methodological)*, 17(2): 204-207.

Neyman, J. and Pearson, E. S. (1928), “On the Use and Interpretation of Certain Test Criteria for Purposes of Statistical Inference, Part I.” *Biometrika* 20(A): 175-240.

[i] In some cases only an upper limit to this error probability may be found.

[ii] Pearson inadvertently switches from number of failures to number of successes in the conclusion of this paper.

Filed under: highly probable vs highly probed, phil/history of stat, Statistics Tagged: E S Pearson

**Please comment on the article here:** **Statistics – Error Statistics Philosophy**

The post Performance or Probativeness? E.S. Pearson’s Statistical Philosophy appeared first on All About Statistics.

]]>The Pandora Principle is that once you’ve considered a possible interaction or bias or confounder, you can’t un-think it. The malign converse is when people realize this and then design their studies to avoid putting themselves in a position where they have to consider some potentially important factor. For example, suppose you’re considering some policy […]

The post The Pandora Principle in statistics — and its malign converse, the ostrich appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post The Pandora Principle in statistics — and its malign converse, the ostrich appeared first on All About Statistics.

]]>The Pandora Principle is that once you’ve considered a possible interaction or bias or confounder, you can’t un-think it. The malign converse is when people realize this and then design their studies to avoid putting themselves in a position where they have to consider some potentially important factor. For example, suppose you’re considering some policy intervention that can be done in several different ways, or conducted in several different contexts. The recommended approach is, if possible, to try out different realistic versions of the treatments in various realistic scenarios; you can then estimate an average treatment effect and also do your best to estimate variation in the effect (recognizing the difficulties inherent in that famous 1/16 efficiency ratio). An alternative, which one might call the reverse-Pandora approach, is to do a large study with just a single precise version of the treatment. This can give a cleaner estimate of the effect in that particular scenario, but to extend it to the real world will require some modeling or assumption about how the effect might vary. Going full ostrich here, one could simply carry over the estimated treatment effect from the simple experiment and not consider any variation at all. The idea would be that if you’d considered two or more flavors of treatment, you’re really have to consider the possibility of variation in effect, and propagate that into your decision making. But if you only consider one possibility, you could ostrich it and keep Pandora at bay. The ostrich approach might get you a publication and even some policy inference but it’s bad science and, I think, bad policy.

That said, there’s no easy answer, as there will always be additional possible confounding factors that you will not have be able to explore. That is, among all the scary contents of Pandora’s box, one thing that flies out is another box, and really you should open that one too . . . that’s the Cantor principle, which we encounter in so many places in statistics.

**tl;dr:** You can’t put Pandora back in the box. But really she shouldn’t’ve been trapped in there in the first place.

The post The Pandora Principle in statistics — and its malign converse, the ostrich appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**

The post The Pandora Principle in statistics — and its malign converse, the ostrich appeared first on All About Statistics.

]]>The post Update on inference with Wasserstein distances appeared first on All About Statistics.

]]>Hi again,

As described in an earlier post, Espen Bernton, Mathieu Gerber and Christian P. Robert and I are exploring Wasserstein distances for parameter inference in generative models. Generally, ABC and indirect inference are fun to play with, as they make the user think about useful distances between data sets (i.i.d. or not), which is sort of implicit in classical likelihood-based approaches. Thinking about distances between data sets can be a helpful and healthy exercise, even if not always necessary for inference. Viewing data sets as empirical distributions leads to considering the Wasserstein distance, and we try to demonstrate in the paper that it leads to an appealing inferential toolbox.

In passing, the first author Espen Bernton will be visiting Marco Cuturi, Christian Robert, Nicolas Chopin and others in Paris from September to January; get in touch with him if you’re over there!

We have just updated the arXiv version of the paper, and the main modifications are as follows.

- We propose a new distance between time series termed “curve-matching”, which turns out to be quite similar to dynamic time warping or Skorokhod distances. This distance might be particularly relevant for models generating non-stationary time series, such as Susceptible-Infected-Recovered models.
- Our theoretical results are generally improved. In particular, for the minimum Wasserstein/Kantorovich estimator and variants of it, the proofs are now based on the notion of epi-convergence, commonly used in optimization, and various results from Rockafellar and Wets (2009),
*Variational analysis.* - On top of the Hilbert distance, based on the Hilbert space-filling curve, we consider the use of the swapping distance of
So we have the Hilbert distance, computable in , where is the number of data points, the swapping distance in and of course the Wasserstein distance in . Various other distances are discussed in Section 6 of the paper.

- On the asymptotic behavior of ABC posteriors, our results now cover the use of Hilbert and swapping distances. This is thanks to the convenient property that the Hilbert distance is indeed a distance, and is always larger than the Wasserstein distance; we also rely on some of Mathieu‘s recent results. And the swapping distance (if initialized with Hilbert sorting) is always sandwiched between Wasserstein and Hilbert.
- The numerical experiments have been revised: there is now a bivariate g-and-k example with comparisons to the actual posterior; the toggle switch example from systems biology is unchanged; a new queueing model example, with comparisons to the actual posterior obtained with particle MCMC (we could have also used the method of Shestopaloff and Neal). Finally, we have a more detailed study of the Lévy-driven stochastic volatility model, with 10,000 observations. There we show how transport distances can be combined with summaries to estimate all model parameters (we previously got only four out of five parameters, using transport distances alone).

The supplementary materials for the new version are here (while the supplementary for the previous arXiv version are still online here).

**Please comment on the article here:** **Statistics – Statisfaction**

The post Update on inference with Wasserstein distances appeared first on All About Statistics.

]]>