The post Mathematical Statistics Lesson of the Day – Complete Statistics appeared first on All About Statistics.

]]>The set-up for today’s post mirrors my earlier Statistics Lesson of the Day on sufficient statistics.

Suppose that you collected data

in order to **estimate** a **parameter** . Let be the **probability density function (PDF)*** for .

Let

be a **statistic** based on .

If

implies that

then is said to be **complete. **To deconstruct this esoteric mathematical statement**, **

- if you want to use a measurable function of to form an unbiased estimator of the zero function,
- and if the only such function is almost surely equal to the zero function,
- then is a complete statistic.

I will discuss the intuition behind this bizarre definition in a later Statistics Lesson of the Day.

**This above definition holds for discrete and continuous random variables.*

Filed under: Mathematical Statistics, Mathematics, Probability, Statistics, Statistics Lesson of the Day Tagged: almost surely, complete statistic, completeness, estimation, mathematical statistics, point estimation, probability, probability density function, probability mass function, statistics, unbiased estimation, unbiased estimator

**Please comment on the article here:** **The Chemical Statistician » Statistics**

The post Mathematical Statistics Lesson of the Day – Complete Statistics appeared first on All About Statistics.

]]>The post Le Monde puzzle [#887quater] appeared first on All About Statistics.

]]>**A**nd yet another resolution of this combinatorics Le Monde mathematical puzzle: that puzzle puzzled many more people than usual! This solution is by Marco F, using a travelling salesman representation and existing TSP software.

N is a golden number if the sequence {1,2,…,N} can be reordered so that the sum of any consecutive pair is a perfect square. What are the golden numbers between 1 and 25?

For instance, take n=199, you should first calculate the “friends”. Save them on a symmetric square matrix:

m1 <- matrix(Inf, nrow=199, ncol=199) diag(m1) <- 0 for (i in 1:199) m1[i,friends[i]] <- 1

Export the distance matrix to a file (in TSPlib format):

library(TSP) tsp <- TSP(m1) tsp image(tsp) write_TSPLIB(tsp, "f199.TSPLIB")

And use a solver to obtain the results. The best solver for TSP is Concorde. There are online versions where you can submit jobs:

0 2 1000000 2 96 1000000 96 191 1000000 191 168 1000000 ...

The numbers of the solution are in the second column (2, 96, 191, 168…). And they are 0-indexed, so you have to add 1 to them:

3 97 192 169 155 101 188 136 120 49 176 148 108 181 143 113 112 84 37 63 18 31 33 88168 193 96 160 129 127 162 199 90 79 177 147 78 22 122 167 194 130 39 157 99 190 13491 198 58 23 41 128 196 60 21 100 189 172 152 73 183 106 38 131 125 164 197 59 110 146178 111 145 80 20 61 135 121 75 6 94 195166 123 133 156 69 52 144 81 40 9 72 184 12 24 57 87 82 62 19 45 76 180 109 116 173 151 74 26 95 161 163 126 43 153 17154 27 117 139 30 70 11 89 107 118 138 186103 66 159 165 124 132 93 28 8 17 32 45 44 77 179 182 142 83 86 14 50 175 114 55 141 115 29 92 104 185 71 10 15 34 27 42 154 170 191 98 158 67 102 187 137 119 25 56 65 35 46 150 174 51 13 68 53 47 149 140 85 36 64 105 16 48

Filed under: Books, Kids, R, Statistics, University life Tagged: Le Monde, mathematical puzzle, travelling salesman Concorde

**Please comment on the article here:** **Xi'an's Og » R**

The post Le Monde puzzle [#887quater] appeared first on All About Statistics.

]]>Mark Palko linked to this data-rich cartoon by Randall Munroe: And I was stunned, first by the data on interracial marriage and then, retrospectively, by my earlier ignorance of these data. Was approval of interracial marriage only 4% in 1958? I had no idea. I looked it up at the Gallup site and it seems […]

The post Quantitative literacy is tough! Or, I had no idea that, in 1958, 96% of Americans disapproved of interracial marriage! appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post Quantitative literacy is tough! Or, I had no idea that, in 1958, 96% of Americans disapproved of interracial marriage! appeared first on All About Statistics.

]]>Mark Palko linked to this data-rich cartoon by Randall Munroe:

And I was stunned, first by the data on interracial marriage and then, retrospectively, by my earlier ignorance of these data.

Was approval of interracial marriage only 4% in 1958? I had no idea. I looked it up at the Gallup site and it seems to be so. But it’s just hard for me to get my head around this. I mean, sure, I know that attitudes on racial issues have changed a lot, but still . . . 4 percent?? If you’d’ve asked me, I might have guessed 30 percent, or 20 percent, certainly I never would’ve given any number close to 4 percent.

I also learned from the Gallup report that “black-white marriages . . . still represent less than 1% of all married couples.” This sounded low to me, but then I thought about it: if 13% of Americans are black, and 7% of blacks marry white people, then this comes to 1% of all married couples. 7% is low, but it’s not as low as 1%.

In some ways that second number (the total percentage of marriages that are between blacks and whites) is somewhat of a trick question. Still, I’m surprised how far off my intuition was on both these numbers (the rate of approval of interracial marriage in 1958, and the current percentage of marriages that are between whites and blacks). Indeed, I still can’t fit the 1958 approval number into my understanding of public opinion.

I’m reminded of our discussion with Charles Murray a couple years ago regarding the claim that “it is morally wrong for a woman to bring a baby into the world knowing that it will not have a father.” Murray and several commenters seemed convinced that it is “obsessively nonjudgmental” to not think it is morally wrong for a woman etc. It was a fun and somewhat disturbing discussion because I really couldn’t understand these commenters and they really couldn’t understand me.

I similarly have difficulty understanding how 96% of Americans in 1958 could’ve disapproved of interracial marriages. I mean, sure, the data are there, and I guess I could fashion a logical argument, something along the lines of 50% were just flat-out prejudiced and 46% were not personally prejudiced but felt that, in practice, an interracial marriage probably wouldn’t work out in a prejudiced world. Still, I never would’ve guessed the numbers would be so high. My continued astonishment here is a sign to me that I need to further rejigger my mental model of public opinion to handle this data point.

In addition to this point having general statistical relevance—the idea that a single data point, like a single story (as discussed in my paper with Basbøll) has the potential to falsify a model and transform one’s worldview—it also relates to what I think is a fundamental issue in political science: as I wrote in that earlier discussion:

We often have the tendency to think that our political opponents agree with us, deep down, even if they don’t want to admit it. Hence you see Thomas Frank trying to explain the phenomenon of ordinary conservative voters, or various conservative politicians insisting that ordinary blacks and hispanics are fundamentally conservative and are voting for Democrats by mistake, or Charles Murray imagining that my friends and I agree with him that it’s wrong for a woman to have a baby without a male partner. . . . [and this contributes to] the difficulty of understanding across demographic, geographic, and partisan divides.

The post Quantitative literacy is tough! Or, I had no idea that, in 1958, 96% of Americans disapproved of interracial marriage! appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**

The post Quantitative literacy is tough! Or, I had no idea that, in 1958, 96% of Americans disapproved of interracial marriage! appeared first on All About Statistics.

]]>The post New Australian data on the HMD appeared first on All About Statistics.

]]>The Human Mortality Database is a wonderful resource for anyone interested in demographic data. It is a carefully curated collection of high quality deaths and population data from 37 countries, all in a consistent format with consistent definitions. I have used it many times and never cease to be amazed at the care taken to maintain such a great resource.

The data are continually being revised and updated. Today the Australian data has been updated to 2011. There is a time lag because of lagged death registrations which results in undercounts; so only data that are likely to be complete are included.

Tim Riffe from the HMD has provided the following information about the update:

- All death counts since 1964 are now included by year of occurrence, up to 2011. We have 2012 data but do not publish them because they are likely a 5% undercount due to lagged registration.
- Death count inputs for 1921 to 1963 are now in single ages. Previously they were in 5-year age groups. Rather than having an open age group of 85+ in this period counts usually go up to the maximum observed (stated) age. This change (i) introduces minor heaping in early years and (ii) implies different apparent old-age mortality than before, since previously anything above 85 was modeled according to the Methods Protocol.
- Population denominators have been swapped out for years 1992 to the present, owing to new ABS methodology and intercensal estimates for the recent period.

Some of the data can be read into R using the `hmd.mx`

and `hmd.e0`

functions from the demography package. Tim has his own package on github that provides a more extensive interface.

**Please comment on the article here:** **Hyndsight**

The post New Australian data on the HMD appeared first on All About Statistics.

]]>The post Confidence vs. Credibility Intervals appeared first on All About Statistics.

]]>Tomorrow, for the final lecture of the *Mathematical Statistics* course, I will try to illustrate - using Monte Carlo simulations - the difference between classical statistics, and the Bayesien approach.

The (simple) way I see it is the following,

- for frequentists, a probability is a measure of the the frequency of repeated events, so the interpretation is that
**parameters are fixed (but unknown), and data are random** - for Bayesians, a probability is a measure of the degree of certainty about values, so the interpretation is that
**parameters are random and data are fixed**

Or to quote Frequentism and Bayesianism: A Python-driven Primer, a Bayesian statistician would say "given our observed data, there is a 95% probability that the true value of falls within the credible region" while a Frequentist statistician would say "there is a 95% probability that when I compute a confidence interval from data of this sort, the true value of will fall within it".

To get more intuition about those quotes, consider a simple problem, with Bernoulli trials, with insurance claims. We want to derive some confidence interval for the probability to claim a loss. There were = 1047 policies. And 159 claims.

Consider the standard (frequentist) confidence interval. What does that mean that

is the (asymptotic) 95% confidence interval? The way I see it is very simple. Let us generate some samples, of size , with the same probability as the empirical one, i.e. (which is the meaning of "from data of this sort"). For each sample, compute the confidence interval with the relationship above. It is a 95% confidence interval because in 95% of the scenarios, the empirical value lies in the confidence interval. From a computation point of view, it is the following idea,

> xbar <- 159 > n <- 1047 > ns <- 100 > M=matrix(rbinom(n*ns,size=1,prob=xbar/n),nrow=n)

I generate 100 samples of size . For each sample, I compute the mean, and the confidence interval, from the previous relationship

> fIC=function(x) mean(x)+c(-1,1)*1.96*sqrt(mean(x)*(1-mean(x)))/sqrt(n) > IC=t(apply(M,2,fIC)) > MN=apply(M,2,mean)

Then we plot all those confidence intervals. In red when they do not contain the empirical mean

> k=(xbar/n<IC[,1])|(xbar/n>IC[,2]) > plot(MN,1:ns,xlim=range(IC),axes=FALSE, + xlab="",ylab="",pch=19,cex=.7, + col=c("blue","red")[1+k]) > axis(1) > segments(IC[,1],1:ns,IC[,2],1: + ns,col=c("blue","red")[1+k]) > abline(v=xbar/n)

Now, what about the Bayesian credible interval ? Assume that the prior distribution for the probability to claim a loss has a distribution. We've seen in the course that, since the Beta distribution is the conjugate of the Bernoulli one, the posterior distribution will also be Beta. More precisely

Based on that property, the confidence interval is based on quantiles of that (posterior) distribution

> u=seq(.1,.2,length=501) > v=dbeta(u,1+xbar,1+n-xbar) > plot(u,v,axes=FALSE,type="l") > I=u<qbeta(.025,1+xbar,1+n-xbar) > polygon(c(u[I],rev(u[I])),c(v[I], + rep(0,sum(I))),col="red",density=30,border=NA) > I=u>qbeta(.975,1+xbar,1+n-xbar) > polygon(c(u[I],rev(u[I])),c(v[I], + rep(0,sum(I))),col="red",density=30,border=NA) > axis(1)

What does that mean, here, that we have a 95% credible interval. Well, this time, we do not draw using the empirical mean, but some possible probability, based on that posterior distribution (given the observations)

> pk <- rbeta(ns,1+xbar,1+n-xbar)

In green, below, we can visualize the histogram of those values

> hist(pk,prob=TRUE,col="light green", + border="white",axes=FALSE, + main="",xlab="",ylab="",lwd=3,xlim=c(.12,.18))

And here again, let us generate samples, and compute the empirical probabilities,

> M=matrix(rbinom(n*ns,size=1,prob=rep(pk, + each=n)),nrow=n) > MN=apply(M,2,mean)

Here, there is 95% chance that those empirical means lie in the credible interval, defined using quantiles of the posterior distribution. We can actually visualize all those means : in black the mean used to generate the sample, and then, in blue or red, the averages obtained on those simulated samples,

> abline(v=qbeta(c(.025,.975),1+xbar,1+ + n-xbar),col="red",lty=2) > points(pk,seq(1,40,length=ns),pch=19,cex=.7) > k=(MN<qbeta(.025,1+xbar,1+n-xbar))| + (MN>qbeta(.975,1+xbar,1+n-xbar)) > points(MN,seq(1,40,length=ns), + pch=19,cex=.7,col=c("blue","red")[1+k]) > segments(MN,seq(1,40,length=ns), + pk,seq(1,40,length=ns),col="grey")

More details and exemple on Bayesian statistics, seen with the eyes of a (probably) not Bayesian statistician in my slides, from my talk in London, last Summer,

**Please comment on the article here:** **Freakonometrics » Statistics**

The post Confidence vs. Credibility Intervals appeared first on All About Statistics.

]]>Noted psychology researchers and methods skeptics Leif Nelson and Uri Simonsohn write: A recent Psych Science (.pdf) paper found that sports teams can perform worse when they have too much talent. For example, in Study 3 they found that NBA teams with a higher percentage of talented players win more games, but that teams with […]

The post Leif and Uri need to hang out with a better class of statisticians appeared first on Statistical Modeling, Causal Inference, and Social Science.

The post Leif and Uri need to hang out with a better class of statisticians appeared first on All About Statistics.

]]>Noted psychology researchers and methods skeptics Leif Nelson and Uri Simonsohn write:

A recent Psych Science (.pdf) paper found that sports teams can perform worse when they have too much talent.

For example, in Study 3 they found that NBA teams with a higher percentage of talented players win more games, but that teams with the highest levels of talented players win fewer games.

The hypothesis is easy enough to articulate, but pause for a moment and ask yourself, “How would you test it?”

So far, so good. But then they come up with this stunner:

If you are like everyone we talked to over the last several weeks, you would run a quadratic regression (y=β0+β1x+β2×2), check whether β2 is significant, and whether plotting the resulting equation yields the predicted u-shape.

This is horrible! Not a negative comment on Leif and Uri, who don’t like that approach and suggest a different analysis (which I don’t love, but which I agree that for many purposes would be better than simply fitting a quadratic), but a negative comment on their social circle.

If “everyone you talk to over several weeks” gives a bad idea, maybe you should consider talking about statistics problems with more thoughtful and knowledgeable statisticians.

I’m not joking here.

But, before going on, let me emphasize that, although I have some disagreements with Leif and Uri on their methods, I generally think their post is clear and informative and like their general approach of forging strong links between the data, the statistical model, and the research question. Ultimately what’s most important in these sorts of problems is not picking “the right model” or “the right analysis” but, rather, understanding what the model is doing.

**Who should we be talking to?**

Now let me return to my contention that Leif and Uri are talking with the wrong people.

Perhaps it would help, when considering a statistical problem, to think about five classes of people who might be interested in the results and whom you might ask about methods:

1. *Completely non-quantiative people* who might be interested in the substantive claim (in this case, that sports teams can perform worse when they have too much talent) but have no interest in how it could be estimated from data.

2. *People with only a basic statistical education*: these might be “civilians” or they could be researchers—perhaps excellent researchers—who focus on the science and who rely on others to advise them on methods. These people might well be able to fit the quadratic regression being considered, and they could evaluate the advice coming from Leif and Uri, but they would not consider themselves statistical experts.

3. *Statisticians or methodologists* (I guess in psychology they’re called “psychometricians”) who trust their own judgment and might teach statistics or research methods and might have published some research articles on the topic. These people might make mistakes in controversial areas (recommending a 5th-degree polynomial control in a regression discontinuity analysis or, as in the example above, naively thinking that a quadratic regression fit demonstrates non-monotonicity).

4. *General experts in this area of statistics*: people such as Leif Nelson and Uri Simonsohn, or E. J. Wagenmakers, or various other people (including me!), who (a) possess general statistical knowledge and (b) have thought about, and may have even worked on, this sort of problem before, and can give out-of-the-box suggestions if appropriate.

5. *Experts in this particular subfield*, which might in this case include people who have analyzed a lot of sports data or statisticians who specialize in nonlinear models.

My guess is that the people Leif and Uri “talked to over the last several weeks” were in categories 2 and 3. This is fine—it’s useful to know what rank-and-file practitioners and methodologists would do—but it’s also a good idea to talk with some real experts! In some way, Leif and Uri don’t need this, as they themselves are experts, but I find that conversations with top people can give me insights.

The post Leif and Uri need to hang out with a better class of statisticians appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**

The post Leif and Uri need to hang out with a better class of statisticians appeared first on All About Statistics.

]]>The post The Wishart distribution: Covariance matrices for multivariate normal data appeared first on All About Statistics.

]]>I've written about how to generate a sample from a multivariate normal (MVN) distribution in SAS by using the RANDNORMAL function in SAS/IML software. Last week a SAS/IML programmer showed me a program that simulated MVN data and computed the resulting covariance matrix for each simulated sample. The purpose of the program was to study properties of the covariance matrices.

The programmer was pleased when I told him that SAS/IML software provides a simpler and more efficient way to simulate covariance and correlation matrices for MVN data. You can generate the covariance matrices *directly* by using the RANDWISHART function, which generates matrices from the Wishart distribution.

Before thinking about covariance matrices for multivariate normal data, let's recall a theoretical result for univariate data: For a sample of size *n* drawn from a normal distribution, the sample variance (appropriately scaled) follows a chi-square distribution with *n*–1 degrees of freedom. This means that if you want to study properties of the sample variance, you don't need to generate normal data. Instead you can draw a random chi-square variate and rescale it to produce a sample variance. No normal samples required!

This result generalizes to multivariate normal data. If you draw a sample from a MVN distribution with covariance matrix Σ, the sample covariance matrix (appropriately scaled) has a sampling distribution that is called the Wishart distribution. You can think of the Wishart distribution as a multivariate generalization of the chi-square distribution. It is a distribution of symmetric positive-definite matrices. A random draw from the Wishart distribution is some matrix that, upon rescaling, is a covariance matrix for MVN data.

Suppose that you want to approximate the sampling distribution of the correlation coefficient between two correlated normal variables in a sample of size 50. The straightforward approach is to simulate 50 observations from the bivariate normal distribution, compute the correlation coefficient for the sample, and then repeat the process many times in order to approximate the distribution of the correlation coefficients. An implementation in PROC IML follows:

proc iml; call randseed(12345); N = 50; /* MVN sample size */ Sigma = {9 1, /* population covariance; correlation = 1/3 */ 1 1}; NumSamples = 1000; /* number of samples in simulation */ /* First attempt: Generate MVN data; compute correlation from data */ corr = j(NumSamples, 1, .); /* allocate space for results */ do i = 1 to NumSamples; X = randnormal(N, {0 0}, Sigma); /* MVN sample of size 50 */ corr[i] = corr(X)[2]; /* corr = off-diagonal element */ end; title "Distribution of Correlation Coefficient"; title2 "N=50; rho = 1/3"; call histogram(corr) xvalues=do(-2,0.7,0.1) other="refline 0.333 / axis=x";

The histogram shows the approximate sampling distribution for the correlation coefficient when the population parameter is ρ = 1/3. You can see that almost all the sample correlations are positive, a few are negative, and that most correlations are close to the population parameter of 1/3.

In the previous section, notice that the MVN data is not used except to compute the sample correlation matrix. If we don't need it, why bother to simulate it? The following program shows how you can directly generate the covariance matrices from the Wishart distribution: draw a matrix from the Wishart distribution with *n*–1 degrees of freedom, then rescale by dividing the matrix by *n*–1.

/* More efficient: Don't generate MVN data, generate covariance matrix DIRECTLY! Each row of A is scatter matrix; each row of B is a covariance matrix */ A = RandWishart(NumSamples, N-1, Sigma); /* N-1 degrees of freedom */ B = A / (N-1); /* rescale to form covariance matrix */ do i = 1 to NumSamples; cov = shape(B[i,], 2, 2); /* convert each row to square matrix */ corr[i] = cov2corr(cov)[2]; /* convert covariance to correlation */ end; call histogram(corr) xvalues=do(-2,0.7,0.1);

The histogram of the correlation coefficients is similar to the previous histogram and is not shown. Notice that the second method does not simulate any data! This can be quite a time-saver if you are studying the properties of covariance matrices for large samples with dozens of variables.

The RANDWISHART distribution actually returns a sample *scatter matrix*, which is equivalent to the crossproduct matrix X`X, where X is an N x p matrix of centered MVN data. You can divide by N–1 to obtain a covariance matrix.

The return value from the RANDWISHART function is a big matrix, each row of which contains a single draw from the Wishart distribution. The elements of the matrix are "flattened" so that they fit in a row in row-major order. For *p*-dimensional MVN data, the number of columns will be *p*^{2}, which is the number of elements in the *p* x *p* covariance matrix.
The following table shows the first five rows of the matrix B:

The first row contains elements for a symmetric 2 x 2 covariance matrix. The (1,1) element is 11.38, the (1,2) and (2,1) elements are 0.9, and the (2,2) element is 0.73. These sample variances and covariances are close to the population values of 9 1, and 1. You can use the SHAPE function to change the row into a 2 x 2 matrix. If necessary, you can use the COV2CORR function to convert the covariance matrix into a correlation matrix.

Next time you are conducting a simulation study that involves MVN data, think about whether you really need the data or whether you are just using the data to form a covariance or correlation matrix. If you don't need the data, use the RANDWISHART function to generate matrices from the Wishart distribution. You can speed up your simulation and avoid generating MVN data that are not needed.

tags: Simulation, Statistical Thinking

**Please comment on the article here:** **The DO Loop**

The post The Wishart distribution: Covariance matrices for multivariate normal data appeared first on All About Statistics.

]]>The post Warning message in runjags turned off in new version of DBDA2E utilities appeared first on All About Statistics.

]]>A reader pointed out that programs that accompany DBDA2E and use runjags produce a disconcerting warning message about "starting parallel chains without setting different PRNG for each chain. Different .RNG.name values have been added to each set of initial values." This is merely runjags saying that it is doing the right thing, and not to worry. But it seems to cause worry. Therefore, in a new version of the DBDA2E utilities program I have set the options in runjags not to produce the warning message. To get the new version, go to the software web page and scroll down to the bottom where you will find the programs for the book. Click the download arrow on the right-hand side. Be sure to unzip/extract the file.

**Please comment on the article here:** **Doing Bayesian Data Analysis**

The post Warning message in runjags turned off in new version of DBDA2E utilities appeared first on All About Statistics.

]]>The post How likelihoodists exaggerate evidence from statistical tests appeared first on All About Statistics.

]]>Have you ever noticed that some leading advocates of a statistical account, say a testing account A, upon discovering account A is unable to handle a certain kind of important testing problem that a rival testing account, account B, has no trouble at all with, will mount an argument that being able to handle that kind of problem is actually a bad thing? In fact, they might argue that testing account B is not a “real” testing account because it *can* handle such a problem? You have? Sure you have, if you read this blog. But that’s only a subliminal point of this post.

I’ve had three posts recently on the Law of Likelihood (LL): Breaking the [LL](a)(b), [c], and [LL] is bankrupt. Please read at least one of them for background. All deal with Royall’s comparative likelihoodist account, which some will say only a few people even use, but I promise you that these same points come up again and again in foundational criticisms from entirely other quarters.[i]

An example from Royall is typical: He makes it clear that an account based on the (LL) is unable to handle composite tests, even simple one-sided tests for which account B supplies uniformly most powerful (UMP) tests. He concludes, not that his test comes up short, but that any genuine test or ‘rule of rejection’ must have a point alternative! Here’s the case (Royall, 1997, pp. 19-20):

[M]edical researchers are interested in the success probability, θ, associated with a new treatment. They are particularly interested in how θ relates to the old treatment’s success probability, believed to be about 0.2. They have reason to hope θ is considerably greater, perhaps 0.8 or even greater. To obtain evidence about θ, they carry out a study in which the new treatment is given to 17 subjects, and find that it is successful in nine.

Let me interject at this point that of all of Stephen Senn’s posts on this blog, my favorite is the one where he zeroes in on the proper way to think about the discrepancy we hope to find (the .8 in this example). (See note [ii])

A standard statistical analysis of their observations would use a

Bernouilli(θ) statistical model and test the composite hypothesesH: θ ≤ 0.2 versus_{1}H: θ > 0.2. That analysis would show that_{2}Hcan be rejected in favor of_{1 }Hat any significance level greater than 0.003, a result that is conventionally taken to mean that the observations are very strong evidence supporting_{2 }Hover_{2}H. (Royall, ibid.)_{1}

Following Royall’s numbers, the observed success rate is:

m_{0} = 9/17 = .53, exceeding *H _{1}*: θ ≤ 0.2 by ~3 standard deviations, as σ / √17 ~ 0.1, yielding significance level ~.003.

So, the observed success rate m_{0} = .53, “is conventionally taken to mean that the observations are very strong evidence supporting *H _{2}* over

And indeed it is altogether warranted to regard the data as very strong evidence that θ > 0.2—which is precisely what *H _{2}* asserts (not fussing with his rather small sample size). In fact, m

Royall claims he is unable to allow that m_{0} = .53 is evidence against the null in the one sided-test we are considering: *H _{1}*: θ ≤ 0.2 versus

He tells us why in the next paragraph (ibid., p. 20):

But because

Hcontains some simple hypotheses that are better supported than some hypotheses in_{1}H(e.g.,θ = 0.2 is better supported than θ= 0.9 by a likelihood ratio of LR = (0.2/0.9)_{2}^{9}(0.8/0.1)^{8}= 22.2),the law of likelihood does not allow the characterization of these observations as strong evidence for(my emphasis; note I didn’t check his numbers since they hardly matter.)Hover_{2}H._{1}

It appears that Royall views rejecting *H _{1}*: θ ≤ 0.2 and inferring

to reject *H _{1}*: θ ≤ 0.2 is to infer some positive discrepancy from .2.

We, who go further, either via severity assessments or confidence intervals, would give discrepancies that were reasonably warranted, as well as those that were tantamount to making great whales out of little guppies (fallacy of rejection)! Conversely, for any discrepancy of interest, we can tell you how well or poorly warranted it is by the data. (The confidence interval theorist would need to supplement the one-sided lower limit which is, strictly speaking, all she gets from the one-sided test. I put this to one side here.)

*But Royall is blocked!* He’s got to invoke point alternatives, and then give a comparative likelihood ratio (to a point null). Note, too, the point against point requirement is *always* required (with a couple of exceptions, maybe) for Royall’s comparative likelihoodist; it’s not just in this example where he imagines a far away alternative point of .8. The ordinary significance test is clearly at a great advantage over the point against point hypotheses, given the stated goal here is to probe discrepancies from the null. (See Senn’s point in note [ii] below.)

Not only is the law of likelihood unable to tackle simple one-sided tests, what it allows us to say is rather misleading:

What does it allow us to say? One statement that we can make is that the observations are only weak evidence in favor of θ = 0.8 versus θ = 0.2 (LR = 4). We can also say that they are rather strong evidence supporting θ = 0.5 over any of the values under

H: θ ≤ 0.2 (LR > 89), and at least moderately strong evidence for θ = 0.5 over any value θ > 0.8 (LR) > 22). …_{1}Thus we can say that the observation of nine successes in 17 trials is rather strong evidence supporting success rates of about 0.5 over the rate 0.2 that is associated with the old treatment, and at least moderately strong evidence for the intermediate rates versus the rates of 0.8 or greater that we were hoping to achieve.(Royall 1997, p. 20, emphasis is mine)

*But this is scarcely “*rather strong evidence supporting success rates of about 0.5” over* the old treatment. What confidence level would you be using if you inferred *m_{0}* is evidence that θ* > 0.5? Approximately .5. (It’s the typical comparative likelihood move of favoring the claim that the population value equals the observed value. (*See comments.)

Royall”s “weak evidence in favor of θ = 0.8 versus θ = 0.2 (LR = 4)” fails to convey that there is rather *horrible* warrant for inferring θ = 0.8–associated with something like 99% error probability! (It’s outside the 2-standard deviation confidence interval, is it not?)

We significance testers do find strong evidence for discrepancies in excess of .3 (~.97 severity or lower confidence level) and decent evidence of excesses of .4 (~.84 severity or lower confidence level). And notice that all of these assertions are claims of evidence of positive discrepancies from the null *H _{1}*: θ ≤ 0.2. In short, at best (if we are generous in our reading, and insist on confidence levels at least .5), Royall is rediscovering what the significance tester automatically says in rejecting the null with the data!

His entire analysis is limited to giving a series of reports as to which parameter values the data are comparatively closer to. As I already argued, I regard such an account as bankrupt as an account of inference. It fails to control probabilities of misleading interpretations of data in general, and precludes comparing the warrant for a single *H* by two data sets * x*,

Elliott Sober, reporting on the Royall road of likelihoodism, remarks:

The fact that significance tests don’t contrast the null hypothesis with alternatives suffices to show that they do not provide a good rule for rejection. (Sober 2008, 56)

But there *is* an alternative, it’s just not limited to a point, the highly artificial case we rarely are involved in testing. Perhaps they are more common in biology. I will assume here that Elliott Sober is mainly setting out some of Royall’s criticisms for the reader, rather than agreeing with them.

According to the law of likelihood, as Sober observes, whether the data are evidence against the null hypothesis depends on which point alternative hypothesis you consider. Does he really want to say that so long as you can identify an alternative that is less likely given the data than is the null, then the data are “evidence *in favor* of the null hypothesis, not evidence *against* it” (Sober, 56). Is this a good thing? What about all the points in between? The significance test above exhausts the parameter space, as do all N-P tests.[v]

**NOTES**

[i] I know because, remember, I’m writing a book that’s close to being done.

[ii] “It would be ludicrous to maintain that [the treatment] cannot have an effect which, while greater than nothing, is less than the clinically relevant difference.” (Senn 2008, p. 201)

[iii] Note: a rejection at the 2-standard deviation cut-off would be ~M* = .2 + 2(.1) = .4.

[iv] That is, they allow the low P-value to count as evidence for alternatives we would regard as unwarranted. But I’ll come back to that another time.

[v] In this connection, do we really want to say, about a null with teeny tiny likelihood, that there’s evidence for it, so long as there is a rival, miles away, in the other direction? (Do I feel the J-G-L Paradox coming on? Yes! It’s the next topic in Sober p.56)

**REFERENCES**

Royall, R. (2004), “The Likelihood Paradigm for Statistical Evidence” 119-138; Rejoinder 145-151, in M. Taper, and S. Lele (eds.) *The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations. *Chicago: University of Chicago Press.

Royall, R.(1997) Statistical Evidence, A Likelihood Paradigm. Chapman and Hall.

Senn, S. (2007), *Statistical Issues in Drug Development, *Wiley.

Sober, E. (2008). *Evidence and Evolution*. CUP.

Filed under: law of likelihood, Richard Royall, Statistics Tagged: Sober

**Please comment on the article here:** **Error Statistics Philosophy » Statistics**

The post How likelihoodists exaggerate evidence from statistical tests appeared first on All About Statistics.

]]>The post HarvardX Biomedical Data Science Open Online Training Curriculum launches on January 19 appeared first on All About Statistics.

]]>We recently received funding from the NIH BD2K initiative to develop MOOCs for biomedical data science. Our first offering will be version 2 of my Data Analysis for Genomics course which will launch on January 19. In this version, the course will be turned into an 8 course series and you can get a certificate in each one of them. The motivation for doing this is to go more in-depth into the different topics and to provide different entry points for students with different levels of expertise. We provide four courses on concepts and skills and four case-study based course. We basically broke the original class into the following eight parts:

- Statistics and R for the Life Sciences
- Introduction to Linear Models and Matrix Algebra
- Advanced Statistics for the Life Sciences
- Introduction to Bioconductor
- Case study: RNA-seq data analysis
- Case study: Variant Discovery and Genotyping
- Case study: ChIP-seq data analysis
- Case study: DNA methylation data analysis

You can follow the links to enroll. While not required, some familiarity with R and Rstudio will serve you well so consider taking Roger's R course and Jeff's Toolbox course before delving into this class.

In years 2 and 3 we plan to introduce several other courses covering topics such as python for data analysis, probability, software engineering, and data visualization which will be taught by a collaboration between the departments of Biostatistics, Statistics and Computer Science at Harvard.

Announcements will be made here and on twitter: @rafalab

**Please comment on the article here:** **Simply Statistics**

The post HarvardX Biomedical Data Science Open Online Training Curriculum launches on January 19 appeared first on All About Statistics.

]]>