Lies and Stats

Top 50 health informatics blogs (according to Healthtechtopia)

Luk Arbuckle — Mon, 04 Jan 2010 05:01:30 +0000

I received an email informing me that my blog was included in a list of top stats blogs useful to people interested in health informatics. The following is the relevant excerpt from the full list of health informatics blogs.

Stats is very important when it comes to health informatics. Here are some good statistics blogs to help you with your health informatics learning.

Fishing in the Bay: These are “statistical musings” with a medical twist.

Data Sciences Analytics: Learn about the science of data, and how it is used in a variety of ways.

Analytic Bridge: Using information and statistics for analysis.

Statistical Modeling, Causal Inference, and Social Science: A helpful blog about using statistics to create models and make assessments.

Social Science Statistics Blog: Learn about stats from Harvard.

Lies and Stats: It’s been a while since this blog was updated, but it still has a great deal of useful information about how statistics are used.

Overcoming Bias: Learn how to use data and how to overcome the biases that may be in that data.

The Endeavor: A great blog that includes statistics posts and information.

I have no idea who updates the health informatics blog, nor am I in anyway affiliated with them. It’s just a nice reminder and motivator for me to start writing blog posts again.

While on the topic, I decided to look at my blog stats, and was surprised to learn that my blog more than doubled in views from the month of my last post (in February 2009) to the next. It even maintained a steady number of hits per month, even though I haven’t been adding anything to it! Interestingly, note that hits seem to increase before fall and winter exams.

Visits to this blog by month (since May 2008).

Since this is a blog post originally about a list, and we’ve moved on to discuss my blog stats, how about my top five posts for 2009 (by title and number of views in 2009), which also surprised me a bit:

Accept the null hypothesis, or fail to reject it (4,179)
You can’t prove the null by not rejecting it (719)
The sexy job in the next ten years (431)
You can’t increase power to prove the null (423)
Least squares explained simply, and in video (287)

Clearly there are some things that are searched for on a regular basis, and maybe I can come up with a few more popular items while I’m tutoring stats this term. But I don’t think I could ever have predicted this top list, so I’ll probably just write topics of interest as they come up. Happy 2010!

Misleading Americans about public health care

Luk Arbuckle — Mon, 23 Feb 2009 00:45:13 +0000

Canadians often wait months or even years for necessary care. For some, the status quo has become so dire that they have turned to the courts for recourse. Several cases currently before provincial courts provide studies in what Americans could expect from government-run health insurance.

At least that’s story told by the Fraser Institute in an op-ed in the Wall Street Journal. “As we inch towards nationalized health care,” reads the subtitle, ” important lessons from north of the border.” With a couple of dire tales, and a couple of national averages, Americans are led to believe that introducing government-run public health insurance will drastically increase wait times in U.S. health care.

Where problems lie
Making an appropriate comparison between wait times in the U.S. and Canada is not trivial. How do you deal with those people that can’t get treatment in the U.S. because of inadequate or nonexistent medical insurance (infinite wait times)? Even comparing specific treatments is tricky because disease coding between the U.S. and Canada differs (ICD-9-CM is currently used in the U.S., and ICD-10-CA in Canada). And then you have to consider subgroups to see how population trends change for socioeconomic classes, say, and to ensure they aren’t reversed entirely (Simpson’s paradox).

Take, for example, a study that found that “socioeconomic status and breast cancer survival were directly associated in the U.S. cohort, but not in the Canadian cohort.” Also note that “this study replicated the finding of advantaged Canadian cancer survival in smaller metropolitan areas that had been consistently observed in larger metropolitan areas.” Although it’s possible there are other (confounding) factors influencing these results, it shows that socioeconomic status needs to be considered when comparing medical treatment and outcomes in the U.S. and Canada. And, therefore, it is likely to affect wait times as well.

Instead of dealing with technical details, however, the article in the WSJ uses stories in which Canadians wait months for treatment. There’s nothing inherently wrong with this—it is, after all, an op-ed piece and not a journal article—but you have to ask yourself about the choice of stories. Are they representative of public health care in Canada, or extreme cases? Also, we don’t know whether the individual that “paid for surgery that may have saved his life”, rather than wait for treatment in Canada, was in immediate need of treatment. These are, nonetheless, compelling stories that should not be disregarded—but they don’t prove a trend.

The basic argument put forward is that Canadians wait a long time for treatment under the public health care system. But what’s considered a “long” wait time, and how does it depend on the condition and severity? Notice that there’s no mention of wait times in the U.S., even for those that have appropriate health coverage. Instead we’re given some specific average wait times, but why cataract surgery or hip and knee replacements, and not others? How much do these wait times vary based on treatment, location, socioeconomic class, and how do they compare with U.S. figures? We’re left with more questions than answers.

The real confounder
Ultimately, to consider how wait times would increase in the U.S. with the introduction of publicly run, universal health coverage—that is, health coverage for all, as in Canada—there is one factor that would need to be disassociated from wait times in Canada. This factor, not unique to Canada but certainly rare, is not stressed enough in the article.

The Supreme Court of Canada found that Canadians suffer physically and psychologically while waiting for treatment in the public health-care system, and that the government monopoly on essential health services imposes a risk of death and irreparable harm.

Disregarding the inflamed rhetoric, the important point here is that there’s a “government monopoly on essential health services” in Canada. In other words, there’s no competing private system for health services deemed medically necessary, and the government funds and regulates the public health care system (although the government doesn’t operate it). You could probably argue that this monopoly is equivalent to price fixing for those services the government decides it’ll pay for. This is likely the main reason “care is rationed by waiting”—there is, after all, no alternative (besides paying for treatment in the U.S.).

It’s probably only a matter of time before Canada allows for a parallel private system for most, if not all, health services. Private spending currently represents about 30% of the average provinces total health care spending (mostly for medications and services not covered by the public system, such as dentists, optometrists, and physiotherapists). But until a parallel private system exists for all services in Canada, or the monopoly in essential services is taken into account, it’s disingenuous to suggest that wait times are simply because “individuals bear no direct responsibility for paying for their care.”

Bottom line
Many factors impact health care and wait times. You can’t look at just one aspect or descriptive statistic and know whether the system works as intended. It would be like judging a person’s health based on blood pressure alone. I agree with the author regarding comments he made in the past about improving Canada’s health care system. But making inferences into a public health care system in the U.S. based on the results from a couple of average wait times in Canada, where other factors confuse these results and make them unreliable to begin with, is inappropriate and misleading at best.

Statistical concepts in presenting data

Luk Arbuckle — Thu, 19 Feb 2009 01:29:45 +0000

Finally someone has written a text something like Tufte’s Visual Display of Quantitative Information but specifically for statistics. Rafe M. J. Donahue, of Biomimetic Therapeutics and Vanderbilt University Medical Center, gave a seminar course on presenting statistical data at a meeting of the American Statistical Associtation (ASA) in June 2008, and will be giving a similar course in April 2009 (as part of a continuing education program of the ASA). I learned of his course in a recent blog post at Statistical Modeling, Causal Inference, and Social Science.

The current version of Donahue’s text is a 100 pages [PDF], but well worth a casual read (it’s not as bad as it sounds, as a lot of those pages are dedicated to visual displays of the ideas he is describing). If you enjoy reading Tufte’s opinions on the topic of displaying data, and you have to create charts and diagrams of statistical data, then you should enjoy Donahue’s writing as well. Reading Tufte a couple of years ago had a tremendous impact on my view of visual displays. But the focus here is in on statistical data.

The two fundamental acts of science, description and comparison, are facilitated via models. By models, we refer to ideas and explanations that do two things: describe past observations and predict future outcomes. […] Statistical models, then, allow us to describe past observation and predict future within the confines of our understanding of probability and randomness. Statistical models become tools for understanding sources of variation.

Show the atoms; show the data.

A summary of some principles presented by Donahue:

The exposition of the distribution is paramount.
Show the atoms; show the data.
Each datum gets one glob of ink.
Erase non-data ink; eliminate redundant ink.
Take time to document and explain.
The data display is the model.
Avoid arbitrary summarization, particularly across sources of variation.
Reward the viewer’s investment in the data display.
In viewing CDFs, steepness equals dataness.
Plot cause versus effect.
Typically, color ought be used for response variables, not design variables—but not always.
We understand the individual responses by comparing them to a distribution of like individuals.
Data presentation layouts and designs should be driven by intended use.
Time series make fine accounting but poor scientific models.

Each datum gets one glob of ink.

Naturally Donahue was also influenced by Tufte. As he says, “the idea of analysis is to understand the whole by decomposing into component parts.” And he therefore reminds the reader of Tufte’s principles of analytical design:

Show comparisons, contrasts, differences.
Show causality, mechanism, structure, explanation.
Show multivariate data; that is, show more than 1 or 2 variables.
Completely integrate words, numbers, images, diagrams.
Thoroughly describe the evidence.
Analytical presentations ultimately stand or fall depending on the quality, relevance, and integrity of their content.

Take time to document and explain.

The sexy job in the next ten years

Luk Arbuckle — Sun, 01 Feb 2009 23:10:39 +0000

Hal Varian, Google’s chief economist and author of arguably the two most popular textbooks in microeconomics (one at the undergraduate level and the other intro graduate), shared the following during an interviewed for The McKinsey Quarterly:

I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s? The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it.

I think statisticians are part of it, but it’s just a part. You also want to be able to visualize the data, communicate the data, and utilize it effectively. But I do think those skills—of being able to access, understand, and communicate the insights you get from data analysis—are going to be extremely important. Managers need to be able to access and understand the data themselves.

You always have this problem of being surrounded by “yes men” and people who want to predigest everything for you. In the old organization, you had to have this whole army of people digesting information to be able to feed it to the decision maker at the top. But that’s not the way it works anymore: the information can be available across the ranks, to everyone in the organization. And what you need to ensure is that people have access to the data they need to make their day-to-day decisions. And this can be done much more easily than it could be done in the past. And it really empowers the knowledge workers to work more effectively.

It’s nice to hear that your skills may become a hot commodity. I came across the article from Gelman’s post on What should an introduction to statistics be like? What I enjoyed most was the discussion that followed. Like how “the time of playing with integrals, density functions and demonstrations is over”, and that we should “focus on coding and implementation”.

Absence of evidence is evidence of absence?

Luk Arbuckle — Sun, 25 Jan 2009 22:15:29 +0000

In the context of logical reasoning, and using Bayesian probability, you can argue that absence of evidence is, in fact, evidence of absence. Namely, not being able to find evidence for something changes your thinking and can result in you reversing your original hypothesis entirely. For example, failing to find evidence that some medical treatment works, you may begin to think that it doesn’t work. Maybe it’s a placebo. You could, therefore, decide to change your hypothesis and look to create an experiment disproving it’s effectiveness. Of course, there are no “priors”, in the Bayesian sense, in the frequentist interpretation of hypothesis testing. But, just the same, what does this say about the maxim used in statistical hypothesis testing, that absence of evidence is not evidence of absence? Nick Barrowman has an interesting post on the topic, and I wanted to participate in the discussion:

I interpret “absence of evidence is not evidence of absence” (in the context of hypothesis testing) to mean “failing to reject the null is not equivalent to accepting the null.” I’m thinking of the null hypothesis of “no treatment effects”. You don’t have significant evidence to reject the null, and therefore an absence of evidence of treatment effects, but this is not the same thing as saying you have evidence of no treatment effects (because of the formulation of hypothesis testing, flawed as it may be).

One point, which I believe you are alluding to, is that an equivalence test would be more appropriate. But I’ve heard some statisticians and researchers try and argue that they could use retrospective power to “prove the null” when they are faced with non-significant results. See Abuse of Power [PDF] (this paper was the nail in the coffin, if you will, in a previous discussion I was having with a group of statisticians).

I believe the maxim is simply trying to emphasize that the p-value is calculated having assumed the null, and therefore can’t be used as evidence for the null (as it would be a circular argument). Trying to make more out of the maxim than this may be the sticking point. It’s too simple, and therefore flawed when taken out of this limited context.

I agree with your previous post. If I’m not mistaken, one point was that failing to reject the null means the confidence interval contains a value of “no effect”. But there could still be differences of practical importance, and so failing to reject the null is not the same as showing there’s no effect. The “statistical note” from the BMJ, Absence of evidence is not evidence of absence, seems to be saying the same thing: absence of evidence of a difference is not evidence that there is no difference. Or, absence of evidence of an effect is not evidence of no effect. Because you can’t prove the null using a hypothesis test (you instead need an equivalence test).

I entirely agree with Nick that confidence intervals are more clear. We can’t forget that hypothesis testing, although constructed like a proof by contradiction, has uncertainty (in the form of Type I errors, rejecting the null when it is true, and Type II errors, failing to reject the null when it is false). It’s interpretation is, therefore, muddied by uncertainty and inductive reasoning (I had actually forgotten what Nick had written with regards to Popper and Fisher when I was commenting). To be honest, my head is still spinning trying to make sense of all this, but it certainly is an interesting topic.

On time series and stochastics

Luk Arbuckle — Wed, 03 Dec 2008 22:33:34 +0000

In reading the paper On Time Series Analysis of Public Health and Biomedical Data (subscription required for the PDF), described in the last post, I was introduced to an interpretation of a time series that I was not familiar with, that is, in terms of a stochastic process. I remember being told in my course on time series analysis that a stochastic process and time series were synonymous for our purposes—although more may have been said at the time, which I’ve since forgotten—but there’s obviously more to it than that.

A time series is a single observation of a possibly infinite collection of time series. In other words the time series itself can be viewed as a random variable, and within that time series is a realization of a collection of random variables ordered in time. The possibly infinite sequence of random variables ordered in time is a stochastic process. When we consider a stochastic process we are concerned with the probability model for the individual random variables, and also combinations of them.

A will to be independent
Making inferences from a time series is making inferences from a single realization of a stochastic process, that is, a single observation at each time. The idea of stationarity—that statistical properties of the time series do not depend on time—is used to develop probability based theory specific to time series analysis. Basically, the relative time difference between variables in a stationary time series will change the probability distribution, whereas time shifting will not. The assumption of stationarity implies that the dependence between variables decreases with increasing time separation (which leads to a discussion of “auto”-correlation, which I won’t describe here). Therefore, more (nearly independent) information will be accumulated the longer a series is followed.

Longitudinal data are repeated measures for short periods, resulting in many realizations of short time series. We assume that the time series are independent, and that repeated observations lead to zero correlation with increasing time separation (which is stronger than stationarity, especially for short time series). In this case we look to increase the number of time series instead of the number of observations for a single time series. This is a topic onto itself (one I’m not currently familiar with), and therefore only mentioned briefly in the paper. What I found particularly interesting was the idea of bootstrapping a time series (based on splitting a time series into several shorter pieces), something I’ll need to look into further.

Time series analysis of public health data

Luk Arbuckle — Fri, 28 Nov 2008 20:45:23 +0000

Since I’m finishing a course in time series analysis I decided to look for applications in biostatistics (an area I’m interested in). In my search I found a paper On Time Series Analysis of Public Health and Biomedical Data (subscription required for the PDF). When I downloaded the paper I thought it was a literature review, but it’s really a gentle introduction to time series analysis for health professionals (although at times the authors use terminology that I think will confuse more than enlighten).

On independence and applications
An important point made in the article is that time series analysis should be used instead of standard regression analysis when the observations (or outcome measures) are not independent. Otherwise inferences will not be valid (since independence is a key assumption in standard regression). Time series models, on the other hand, take correlation between observations into account (resulting in valid and more efficient inferences). An example is given wherein standard regression would imply a downward trend in birth rate (for their particular data, recording births in an area for about three years), whereas time series methods do not allow for such a conclusion.

The authors point out the increasing use of time series analysis in health research, as evidenced by a search on PubMed. Some application areas mentioned in the paper:

gene expressions to describe molecular and cellular processes
physiologic studies, in general (including image analysis for PET or fMRI, as well as some areas of critical care medicine)
basic epidemiologic studies of infectious and chronic diseases
environmental epidemiology
health services research (to evaluate interventions)
demographic analyses of population health

Although these examples of application areas for time series analysis are interesting, it doesn’t go nearly far enough in the details. I would like to know more about how, specifically, time series analysis is being used to advance health research. This is one reason I was originally looking for a literature review. Another reason is that I would like to figure out the areas in which improvements in the theory and methods are still needed (i.e., brain storming for a research topic). Maybe that’s asking too much of a single paper, but I’ve read literature reviews in the past and they usually cover such ground. I’ll have to try and find something in a biostats journal.

Least squares explained simply, and in video

Luk Arbuckle — Thu, 13 Nov 2008 22:16:29 +0000

A short five minute video has been created explaining least squares with JMP. The author, Lee Creighton, uses a very simple example of fitting a line to data, and considers different measures of a “best” fit (not to ruin the punch line, but least squares has a global minimum). On the right hand side of the applet is an error bar (for the particular measure being considered) that changes as the line is changed for the set of data points.

Least squares considered with, well, squares!

What I like about the visualization is the that the squared errors are displayed as actual squares. It seems silly, but I had never thought of squared errors in a literal, geometric way. I always thought of them as scalars, and nothing more. But showing actual squares makes for a much better visual representation.

It’s likely we’ll see more video tutorials from Lee Creighton on vimeo, as he’s added a few new items already (although there’s been no mention of these at the JMP blog, at least not yet). Of course, this assumes people find them useful and he receives some positive feedback (and I hope he does, as video tutorials in statistics are rare).

Note that I chose not to embed the video into my blog post because I have no control over the formatting, and to ensure Creighton gets clicks to his blog post (give credit where credit is due). Click on the image to get to the video.

But you can show equivalence

Luk Arbuckle — Fri, 07 Nov 2008 15:49:34 +0000

Hopefully it’s clear from previous posts that you can’t prove the null, and you can’t use power to build support for the null. And this confusion is one reason I don’t like the term “accepting” the null hypothesis. The question remains, however, of what you can do with a hypothesis that fits what you would normally consider a “null”, but that you would actually like to prove.

To flip the role you would normally attribute to a null hypothesis with that of an alternative hypothesis, you probably need to consider an equivalence test. First you have to nail down an effect size, that is, the maximum amount the parameter can deviate by (positive or negative) in the experiment in order to conclude that it is of no practical or scientific importance. Even if you’re not doing an equivalence test, this question is important in determining sample size because you want to be sure your results are both statistically and scientifically significant (but calculating sample size [PDF] is the subject for a future blog post).

What’s the difference?
In an equivalence test you take your null hypothesis to be non-equivalence. That is, that the absolute value of the parameter under consideration is greater than or equal to the effect size (the parameter is less than or equal to the negative of the effect size, or greater than or equal to the effect size). The alternative is, therefore, that the absolute value of the parameter is less than the effect size. Note that we don’t care if the parameter has a positive or negative effect—the goal is to reject the null hypothesis so that you can conclude that the effect is not of practical or scientific importance (although there are one-way equivalence tests as well).

For example, consider a treatment that is believed to be no better or worse than a placebo. The effect size should define the range of values within which the actual treatment effect can be considered to be of no scientific importance (equivalent to the placebo). The null—that there is a scientifically important difference between treatment and placebo—will be rejected if the treatment effect is found to be larger than the effect size. Remember that we don’t care if the treatment has a positive or negative effect compared to the placebo in this example, since our goal is to reject the null of no effect either way.

Two for one
An equivalence test is essentially two one-tailed tests—one test to determine that there is no scientifically important positive effect (it’s no better), and a second test to determine that there is no scientifically important negative effect (it’s no worse). And, as it turns out, the equivalence test is disjoint with a test of significance so that you can test both at the same significance level. Just to be clear, the test of significance would have null equal to zero (no treatment effect), and alternative greater than zero (some positive or negative treatment effect).

My focus in this and the last two posts was on hypothesis testing, even though confidence intervals are often preferred for making inferences. This is a reflection of the debate I was dragged into, not of personal preference. If you’re interested, Nick Barrowman shared a link (in the comments to a previous post) to a website that discusses equivalence testing and confidence intervals (although I don’t agree with their comments that equivalence from the perspective of statistical significance is convoluted). Regardless, the debate is over (at least for us).

You can’t increase power to prove the null

Luk Arbuckle — Fri, 31 Oct 2008 21:01:46 +0000

In my last post I discussed the theory of hypothesis testing, and specifically how it does not support the idea of “proving the null hypothesis“. But I was told that it was only theory and that in practice you could argue that failing to prove the null was, in fact, support for the null if you had high power. The idea of increasing power (by increasing the sample size) in order to increase support for the null was also thrown around. Of course, you can argue whatever you like, but that doesn’t make it so. And in this case we have statistical theory on our side.

We know that a test of statistical significance should have a high probability of rejecting the null hypothesis when it is false (with a fixed probability of rejecting the null, the significance level, when it is true). This probability is called power, and it guards against false negatives (whereas the significance level guards against false positives). The question is whether we can use high values of power to prove the null, within the context of hypothesis testing. A great article on the subject (only six pages long, with references) is Abuse of Power [PDF], which I’ll use as my main reference.

Observe this
Proponents of using power to build evidence in support of the null calculate power using the observed value of the test statistic, calling it the observed power (in the same way a p-value is called the observed significance). High values of observed power are interpreted as strong support for the null; low values of observed power are interpreted as weak support for the null. We’ll come back to this shortly to demonstrate the false logic behind this interpretation.

Low p-value, high power; high p-value, low power. But what does this actually tell you?

For every value of observed power there is a unique p-value, and vice versa. In other words the observed power is a one-to-one function of the p-value—inferences drawn from one of these observed values must, therefore, coincide with the other. Also, observed power is inversely proportional to the p-value. That is, low p-values coincide with high values of observed power; high p-values coincide with low values of observed power.

Now let’s compare the interpretation of the observed power from those hoping to support the null against the interpretation of the p-value (provided by frequentist statistics). A high value of observed power is interpreted as strong support for the null, which coincides with a low p-value interpreted as strong support against the null (strong yet contradictory statements); a low value of observed power is interpreted as weak support for the null, which coincides with a high p-value interpreted as weak support against the null (weak yet also contradictory statements).

Say that again
Consider two experiments in which you failed to reject the null of no treatment effects, but in which the first experiment achieved a higher value of observed power than the second. Using the interpretation of observed power above, you would conclude that the first experiment with higher observed power provided stronger evidence in support of the null than the second experiment. But higher power means a lower p-value, and therefore you would conclude the first experiment provided stronger evidence against the null. These are contradictory conclusions, and only the interpretation of p-values can be called a hypothesis test (supported by frequentist statistics).

There are variants on this idea of observable power, such as detectable or significant effect size, but they’re logically flawed in the same way described above. And we could compare power analysis to confidence intervals, but the point is that nothing is gained from considering power calculations once you have a confidence interval. Power calculations should be reserved to planning the sample size of future studies, and not for making inferences about studies that have already taken place.