Actual Analysis blog

Plethoratology

2015-02-18T08:21:00.002-08:00

Computers have got to the point where they can be used to analyze enormous files of data, many people do analyze enormous files. We do tend to believe that both bigger and more are better. In fact, bigger files are often very desirable. Nevertheless, big files often create big problems, and in this article we will look at some ways to overcome those problems and get the most out of big files.

Files can be big both vertically and horizontally. A file is big vertically if it has a large number of cases (or records, in database terminology). A file is big horizontally when it has a large number of variables (fields).

Problems of vertical size. One of the most important problems with files which are big vertically is the non-sampling error (also known as a mistake). The more cases or records there are in a file, the more likely non-sampling errors become, especially if the increase in the size of the file reduces the time available for the collection of each individual case. For example, if people are under pressure to provide a long list of information, they may record inaccurate estimates or even fabricate information. Big files need to be audited for errors of this type.

Another problem with files with enormous numbers of cases is that statistical tests become so powerful that their results are meaningless. Almost any difference becomes significant at astoundingly low levels. For example, let's suppose you have a sample of 100 people, and you want to know if women are over or under-represented in it. To find out, you are going to perform a chi-square test with a significance criterion of .05. If we assume that women make up 51% of the general population, the percentage of women in your sample would, to satisfy the chi-square test, have to be 10 percentage points higher or lower than 51% for you to conclude that they were over or under-represented. That is, if 61 of your sample were women, you could conclude that women were over-represented, and if only 41 were women, you could conclude that women were under-represented. Those seem like reasonable standards, but if you use large samples, the standards become far less demanding.

For example, if your sample had 10,000 members, the chi-square test would tell you that women were over-represented if they made up as little as 52% of the sample. If your sample had 100,000 members, women would be over-represented if they made up as little as 51.3% of the sample -- less than one-half of one per cent more than the figure for the population. That may be a real difference, and it’s statistically significant, but it may well not be practically significant. The solution to these problems is to make use of sampling theory. First, you can use sampling theory to determine the most statistically appropriate sample size. For example, if you're conducting a survey, and want a 95% confidence interval of ñ5%, you need a random sample of only 385 people. Of course, other considerations may make collecting more data advisable. If you're collecting huge amounts of data, though, you can still use sampling theory to select a subsample of data to analyze.

In psychometric research, for example, it is often necessary to administer huge numbers of tests. Scaling and reliability analysis, however, are often performed on smaller random samples drawn from the main one, so that statistical tests give more meaningful results. The smaller samples can also be analyzed much more quickly. If you want to check the validity of the results obtained with the small sample, you can draw a second small sample and do the same analyses. You'll still be finished in less time than it would have taken to analyze the entire sample.

If drawing a smaller subsample is not possible, you can adjust your significance criterion. To do that, you have to determine how big a difference or how strong a relationship you're looking for, and the power you want the statistical test to have to detect those differences or relationships.

Problems of horizontal size. Having large numbers of variables or fields is not a problem if you know that they all measure different things. Problems arise when different variables measure the same thing but the data analyst assumes they are independent.These problems are quite common nowadays because statistical packages have given everyone the ability to perform statistical analyses. With every good intention, people enter large numbers of variables into multiple linear regressions without inspecting either correlations or residuals. The problem with doing that is that it produces unstable solutions. If several variables are correlated with each other, and about equally correlated with the dependent variable, the order in which they are entered into the equation is determined by small and random differences in the size of correlations with the dependent variable. If you perform the analysis on a second set of data (which you often have to do if data are collected yearly, for example), the variables will often be entered in a different order. The solution I prefer to this problem is to scale the variables. Variables which measure the same thing can be aggregated to produce a single measure. There are other solutions as well.

Reducing the number of variables also helps deal with another problem, dealing with interaction effects, which are often ignored in multiple linear regression analysis. An interaction effect is one which cannot be predicted from the individual (or main) effects of two or more variables. For example, hair loss increases with age, and it is far more common among men – those are what we call main effects of single variables. However, the relationship between age and hair loss is much stronger among men. That is an interaction effect of two variables (you can have higher-order interactions as well). If you don't assess interaction effects you will usually miss important information about the topic you're investigating. To assess interaction effects, you examine residuals and introduce multiplicative terms into your regression equation. Big files can have big benefits. To obtain those benefits, though, you have to be circumspect.

The office: then and now

2013-12-14T11:04:00.000-08:00

The revolution in office life has been so rapid and so vast that we often forget what life was like in those pre-revolutionary workplaces. Here are some comparisons which help show how much changes in office technology have revolutionized office life:

THEN: Employees wasted time gossiping around the water cooler
NOW: Employees waste time gossiping around the Xerox machine

THEN: Copying business correspondence required messy and finicky carbon paper
NOW: Copying business correspondence requires messy and expensive toner cartridges

THEN: Employees wasted time gossiping around the water cooler
NOW: Employees waste time gossiping by e-mail

THEN: Little information available for reports
NOW: A report on the price of coffee in the staff room incorporates 37 pie charts and a review of the literature

THEN: Employees wasted time gossiping around the water cooler
NOW: Employees waste time experimenting with their screen savers

THEN: The office resounded to the clacking of typewriters
NOW: The office resounds to the oaths of employees who have just jammed the copier or deleted computer files by mistake

THEN: Employees wasted time gossiping around the water cooler
NOW: Employees waste time playing solitaire in Windows

THEN: Reports could only be produced in limited quantities
NOW: Everybody makes 100 copies of everything

THEN: Employees wasted time gossiping around the water cooler
NOW: Employees waste time tweeting what they had for lunch.

As you can see, office life has changed drastically over the last quarter of a century! Change is the one constant of modern society, and we can expect the office of 2043 to be as different from the office of 2013 as the office of 2013 is from the office of 1983!

Gosh, isn't modern society exciting!

The Office: Then and Now © John FitzGerald, 1997, 2000

More ways to sabotage selection

2012-10-04T05:16:00.000-07:00

Yesterday we saw how weighting the different measures you combine to rate applicants for jobs or promotions or school placements or grants can end up undermining your ratings. The measures to which you assign the highest weight end up having almost all the influence on selection, while the other measures end up with none.

There are times, though, when people don't intend to weight their measures but end up weighting them inadvertently anyway. For example, if you measure one characteristic on a scale of 10 and another on a scale of 5, the measure with a maximum score of 10 will end up having more influence (barring extraordinary and very rare circumstances).

That problem's easy to deal with: just make sure that all your measures have scales with the same maximum score. The second is a little more difficult. It is that differences in variability can accidentally weight the measures.

Some of your measures will almost always vary over a wider range than others. The statistic most widely used to assess variability is the standard deviation. The bigger the standard deviation, the more variable the scores. An example will demonstrate the problem differences in variability create.

Let's suppose that a professor gives two tests in a course, each of which is to count for 50% of the final mark. The first test has a mean of 65 and a standard deviation of 8, while the second has a mean of 65 and a standard deviation of 16. The problem with these statistics is that two students can do equally well but end up with different final marks. We'll look at two students' possible results.

The first student finishes one standard deviation above the mean on the first test and right at the mean on the second. That is, her marks were 73 and 65, and her final mark is half of 73 + 65, or 69. The second student finishes at the mean on the first test and one standard deviation above the mean on the second. That is, her marks are 65 and 81, and her final mark is (65 + 81)/2, or 73. So, even though each student finished at the mean on one test and one standard deviation above the mean on the other, one ended up with a higher mark than the other.

To eliminate this bias you can calculate standard scores. You simply subtract the mean from each applicant's score and divide by the standard deviation. That gives you a standard score with a mean of zero; applicants with scores above the mean will have positive standard scores and applicants with scores below the mean will have negative ones. If that sounds complicated, it's not. Spreadsheets will do it for you; in Excel you use the AVERAGE function to get the mean and the STDDEV function to get the standard deviation (there is a STANDARDIZE function, but since it requires you to enter the mean and standard deviation it it's no faster than writing a formula yourself)).

Even if that still seems like a lot of work to you, the choice is clear: either you do the work or you sabotage your ratings. If you sabotage your ratings you sabotage your selection, and if you sabotage your selection you sabotage your organization (and maybe others, if you're doing something like selecting outside applicants for grants).

For more information about standardization click here for the first of a series of brief articles. Alternatively, the next time you're compiling ratings you can involve staff with statistical training or a consultant.

The hidden danger in selection procedures

2012-10-03T06:30:00.001-07:00

When you’re selecting people for jobs, students for university, projects to fund, or making any one of the many significant choices we often find ourselves faced with, you’re often advised to decide what characteristics you want the successful candidate to have, rate the characteristics numerically, weight them according to the importance you think each should have, then add up the weighted ratings.

For example, if you’re rating three characteristics, and you think one is twice as important as each of the other two, you would take 50% of the rating of the most important characteristic and 25% of the ratings of each of the other two, then add them together.

The problem with that procedure, though, is that the in the final analysis the weight of the most important characteristic will be far higher than you had intended. We can see why this happens by looking at the logic of ratings.

Let’s say you’re selecting students for a program. Your rating scale, then, is intended as a measure of ability to succeed in studying the domain the program covers. You are assessing five characteristics, and assigning weights of 50%, 30% 10%, 5%, and 5%.

If the several measures of ability to succeed are all measuring the same concept, then they will be highly correlated – people who score high on each measure will also score high on the others. When this is true there is no reason to weight the measures – that is, if they are measures of the same thing there is no justification for making one more important than the others. The statistics of test design provides clear criteria for determining if all of a group of measures are measuring the same thing.

If the measures are not correlated, then they are measuring different aspects of ability to succeed. If they are combined without weighting they will tend to cancel each other out – a high score on one measure will be cancelled out by a low score on another uncorrelated measure – and scores will tend to accumulate in the middle of the score range.

If weights are assigned to the measures to reflect priority, the applicants who score high on the one or two measures with highest priority will tend to have ratings in the high range. The rest of the scores will continue to cancel each other out and the rest of the candidates will accumulate in the middle range.

Accumulation of scores in the middle range creates a problem for selection, because the cut-off point usually is found in the middle range, and choices must be made between applicants whose scores are very similar. For example, if one student received a mark of 55 on a test of mathematics, and another student a 57, you would not conclude that the second student was a better mathematician than the first. The difference is probably due to random variables, perhaps something as simple as the first student having a headache.

This also means that the characteristics with lower priority will usually end up having no influence on selection at all because ratings of these characteristics will cancel each other out. If you are rating uncorrelated characteristics and want each to have a specific weight in selection you will need to use a procedure that insures they will have this weight. A simple procedure in our example would be to draw 50% of the selected applicants from those with high scores on the most important characteristic, 30% from those with high scores on the second most important one, and so on. Alternatively, the selection can be made in stages to ensure that each characteristic is evaluated according to its priority rank and separately from uncorrelated characteristics.

Of course, sometimes some characteristics will be correlated and some not. The correlated characteristics can then be combined into a single score that will be more accurate than the single characteristics by themselves. The other lesson to be drawn from this is that someone familiar with test design should review selection procedures to ensure that they have the intended results. Ignoring the relationships between the characteristics you are assessing means that you will be defeating your own purposes – the ones implied by the weight you assigned to each characteristic.

Tomorrow we'll look at some insidious forms of weighting that can sabotage selection even when you don't deliberately weight scores.

What's missing from the Fraser Institute school ranking report

2012-03-05T19:00:00.008-08:00

The Fraser Institute ranking of Ontario elementary schools was released on Sunday, and as usual it was covered extensively by the press. Unfortunately, the press did not, as far as I could see, ask some serious questions that need to be asked.

I am not going to fault the Fraser Institute for not including all relevant technical information in the report; it is, after all, intended as a popular guide for parents. However, I could not find on the Institute’s website any link to a technical manual that would provide important information missing from their report.

Perhaps the most serious omission is any mention of test characteristics. The overall score calculated for each school is based on the annual assessment conducted by the Ontario Educational Quality and Accountability Office (EQAO). But are the tests used for these assessments valid measures of scholastic competence? Standard measures of reliability and validity are not reported (nor could I find them on the EQAO website, or in the technical manuals EQAO provides for the tests).

Of course, even a measure that is unreliable in assessing an individual student can be made reliable by aggregating the scores of a whole school. However, an invalid measure cannot be made valid by aggregation, and if a test is not a valid measure of scholastic competence its reliability does not matter. If someone gets your email address wrong, their messages are not going to get to you regardless of how many times they send them to exactly the same wrong address.

Another issue is that Much of the report deals with improvements in schools' scores but that little information is provided about the trend analysis on which reports of improvement were based. In particular, we need to know what statistical technique was used and an explanation of the high significance criterion (p < .10).

Other issues could be raised, but, even if I had included all of them, none of this post could be taken as necessarily implying that the Fraser Institute did not do an adequate job. I've asked more serious questions about studies I've reviewed and received reassuring answers. However, without the additional information described here, we cannot conclude that the ranks assigned by the Institute serve as a guide to school performance.

What's missing from the Fraser Institute school ranking report © 2012, John FitzGerald
More articles at the main site

One reason selection tests may not work

2012-02-07T08:27:00.000-08:00

Let's suppose you wanted to find out how well students' marks on graduation from high school predicted their marks in the first year of university. You select a sample of students and correlate their high school marks with their university marks. You will often fail to find a statistically significant correlation.

This result is counterintuitive, but the reason for it is simple. Only the best students get into university, and even if they do as well in university as they did in high school their marks will fall in a very restricted range. That is, there is simply less difference in ability between the students than there would be if the full range of ability had been sampled, so it is difficult to observe a correlation between their scores.

The distribution of marks will also probably be skewed (in the statistical sense - the mean will be much different from the median), which also militates against finding a correlation.

Problems like these are why I distrust the idea that people can conduct data mining even if they have no training in inferential statistics.

One reason selection tests may not work © 2001, John FitzGerald
More articles at the main site

The raw database

2012-01-31T18:09:00.000-08:00

When you construct an analytical database, you're better off to analyze your data after they're in the database rather than before. That is, the data in your database are most useful if they are what are known as raw data – that is, individual scores rather than summary data like statistics (such as percentages) or ranges (such as age ranges).

For example, let's consider a database which consists of the names of twenty cities and their unemployment rates (which are, of course, percentages). If you want to work out the unemployment rate for all the cities or for a subset of them, you can't, because you don't know how many people are in the labour force in each city. If, however, the database consists of the names of the cities, the number of people in the labour force in each city, and the number of unemployed in each city, you can easily work out those figures as well as any you could have worked out with the other database.

That example is a simple one for illustrative purposes, but problems like the one in the example are not rare. Databases constructed with range data rather than raw data are also common. Often, for example, people's ages are entered according to an arbitrary range into which they fall – a 28-year-old might be entered as a 25-to-34-year-old, for example. You can discover useful relationships with data like that, but you can also miss relationships that you would find if you entered the actual ages. If you entered the actual ages you would still be able to investigate your age categories, as well as alternatives to them which might be more useful.

A database of raw data is a much more powerful analytical tool than one of summary data. In compiling a database of summary data you are essentially drawing conclusions about the nature of the data before they have even been entered. Keeping your options open is much the better strategy.

The Raw Database © 2000, John FitzGerald
Originally published at ActualAnalysis.com

How the British Geological Survey overcame bad data management - after 165 years

2012-01-17T07:34:00.000-08:00

I have gone on at length in this blog and on its parent website about how data aren't informative until they're organized in some useful way. The first step in organizing them is making them accessible - putting them in a database, giving them a unique identifier, and organizing them so that you can use them in the way you intended.

In 1846, John Hooker, a botanist, collected 314 slides of botanical samples for the British Geological Survey. Then he had to rush off on a trip to the Himalayas and didn't get around to entering the samples in the specimen register. In April, 2011, Howard Falcon-Lang, a paleontologist, was poking around in a cabinet in a dark corner of the BGS and found the drawers of Hooker`s slides. He pulled one out, shone his flashlight on it, and read the label "C. Darwin, Esq." (Click here for a news report)

It turned out that Hooker's slides were from Darwin's expedition on the Beagle, and that Dr. Falcon-Lang was apparently the first person in 165 years to recognize what they were. Dr. Falcon-Lang expects that examination of the the samples will contribute to contemporary science. Imagine what contemporary science would be like, though, if these examples had been examined in the 1840s and 50s.

The data most of us use aren't likely to be as significant as Darwin's, but if we can't use them they're as useless as Darwin's were for 165 years. A big problem I have run into with databases is that some data just don't get entered. People entering data omit fields they consider unimportant or too difficult to collect. Often this ends up producing huge amounts of missing data, especially when data are being entered from the field by several people, and if huge quantities of data are missing the data are useless. If you want to use the data, ensure that a complete record must be entered. If you don't use the data, don't collect them. If you don't collect unnecessary data you'll probably make fewer errors in entering the necessary data.

And don't enter summaries of data. For example, enter people's exact ages, not an age range. If you use pre-defined age ranges you may end up with all the ages clumped in one or two categories, which severely limits the analysis you can do. If you enter the exact age, you can define age ranges whose categories have roughly equal numbers of people in them, which makes it easier to find differences between the categories (click here, here, and here for more about these issues.

Similarly, instead of a test score or a rating, enter the individual test and rating items. First of all, that makes it easier to clean the data - to find erroneously recorded items or scores. More importantly, it gives you the ability to assess the adequacy of the test or rating (a PDF you can download from my website describes some things you can do with ratings; click here for the PDF).

And enter the data in a format appropriate for the type of analysis you want to do. Most statistical packages for example, want records entered as rows.

Data will only talk to you if you care for them. Be nice to your data. Only collect the ones you need, and treat the ones you need with respect.

Effect and cause as a clue to the meaning of science

2012-01-09T18:54:00.000-08:00

The December 16, 2011, issue of WIRED has a piece by Jonah Lehrer called "Trials and Errors: Why Science is Failing Us" (click here to read it). Mr. Lehrer's argument seems to be that some phenomena are too complex for scientific method to be able to discover what causes them. In his conclusion he writes:

And yet, we must never forget that our causal beliefs are defined by their limitations. For too long, we’ve pretended that the old problem of causality can be cured by our shiny new knowledge. If only we devote more resources to research or dissect the system at a more fundamental level or search for ever more subtle correlations, we can discover how it all works. But a cause is not a fact, and it never will be; the things we can see will always be bracketed by what we cannot. And this is why, even when we know everything about everything, we’ll still be telling stories about why it happened. It’s mystery all the way down.

The comments following the piece do a good job of of pointing out the flaws in the reasoning by which Mr. Lehrer reaches this conclusion. However, one issue is omitted. That issue is that science is not about causes.

Science is about effects. At its simplest, an effect is a non-random relationship between two variables. Scientific experimentation investigates effects by varying one of the variables (the indendent variable) and seeing what happens to the other variable (the dependent variable). The goal is to explain the effect - that is, become more effective in predicting the dependent variable. This model can be expanded to handle large numbers of variables. For example, one of the things I do in evaluating satisfaction with a program is to investigate simultaneously the relative importance of several variables in accounting for satisfaction. What you typically find when you do this correctly is that only a few of the variables have any relationship to satisfaction. What you often find, too, is that the variables that account for their satisfaction are different from the reasons particpants report when asked why they like the program.

The methods I use are correlational, so they cannot attribute causation. What they tell you is that as one thing varies, so does another. Furthermore, the analyses of satisfaction I do are non-experimental, so I can't even be sure that the estimates of the correlations are all that exact. What I can do, though, is make a recommendation that changes be made to see if dealing with the the variables identified by the data analysis will improve satisfaction.

The same considerations apply to a lot of health research, and that consideration alone goes a long way to accounting for the examples Mr. Lehrer adduces. What health researchers do is develop their own recommendations for further research that will test whether their conclusions are correct. In fact, the supposed failure Mr. Lehrer describes is in fact a demonstration of the success of science - a hypothesis was developed from prior research to test whether a drug was effective, and the test failed to find evidence that it was effective. That failure by itself is informative - it tells us not to prescribe the drug.

One of the commenters at the link above (urgelt) goes into the issue of the adequacy of research in more detail. My post of January 5 (click here) provides another example of this type of difficulty. What is clear is that error is inherent in the process of scientific experimentation, and that the foundation of scientific method includes a recognition that error is inherent. Reports of statistical analysis of research results typically include many estimates of the error involved in the relationships estimated by the statistical techniques.

As for Mr. Lehrer's remarks about the mythical nature of causes, scientific method has long allowed explanatory variables that have no real existence (intelligence, for example, cannot be directly measured but only inferred from behaviour). Variables like this are called explanatory fictions. The reason they are allowed is that the point of science is to explain an effect, not to find out what its actual cause is. If a fictional variable can explain the effect where something tangible and real can't, so much the better. Furthermore, even a small improvement in accuracy of prediction will often produce large benefits. Obviously, something which improves accuracy only a small amount is unlikely to be a cause in any meaningful sense, but it can still play an important role in practice.

Complex systems often frustrate scientific research simply because there are so many potential effects to examine, not because scientists are naive about the nature of causes, which anyway they aren't looking for. Mr. Lehrer freely acknowledges that science has been spectacularly successful with some complex systems (the health of large populations, for example), so concluding that failures to be successful with others mean that science has failed to solve the problem of causation is not only questionable and hasty but irrelevant as well.

I am confident that the scientific research of 100 years from now will be superior to today's research. I am also confident that the reason for its superiority will not be that it has solved the problem of causation.

Website
Twitter

Research, cause, and effect © 2012, John FitzGerald

Why information overload is a myth

2012-01-09T06:37:00.000-08:00

Everybody’s heard of information overload – a Google search I just did for information overload (in quotation marks) produced over 4 million results. In fact, though, it is data we are overloaded with, not information.

Information consists only of data that reduce uncertainty. A weather forecast is only informative if it predicts the weather accurately. If it doesn't predict the weather accurately, we could end up leaving our umbrellas at home on rainy days. Similarly, if we base corporate decisions on data that don’t predict the results we want to achieve, we could end up being embarrassed and out of pocket.

As the Schumpeter blog in the Economist said on December 31: “As communication grows ever easier, the important thing is detecting whispers of useful information in a howling hurricane of noise.” It’s that overload of noise we must fear.

How do you reduce an overload of noise?

By not collecting data that are irrelevant to the decisions you make.
By not collecting data that are nearly identical to informative data you already collect.
By not collecting more data than you need.
By not combining pieces of information in ways in ways which produce an uninformative total score (by weighting them, for example).

But how do you avoid doing these things? Chiefly by analysing your data with sound statistical methods. For example, you can estimate the relevance of data to a decision with methods like the correlation coefficient. You can use principal components analysis to find variables that are telling you the same story. You can use sampling theory to decide how much data you need to collect. You can use psychometric analysis to combine pieces of information into a single score effectively. The battle against uninformative data has not been won, but you can win that part of it that takes place in your office.

Website
Twitter

Cognitive decline research: Questions the CBC didn't ask

2012-01-05T17:53:00.000-08:00

Today's CBC news report (click here) of a study of cognitive decline is pretty standard science reporting. I'm sure that other news sources provided much the same story. Anyway, it confines itself to reporting the results the researchers reported, results which were fairly stated.

However, there is other information the CBC might have provided, but didn't. First, it doesn't provide a link to the study (or hadn't when I posted a comment asking for one). I, for one, was interested in learning what "a 3.6% decline in mental reasoning" was. Does a decline of that size have a serious effect on people's functioning?

So I looked for the link and found it (here). It's an open access article that can be downloaded free in a PDF. The article doesn't provide a quick answer to the question of how serious the declines observed are, but the authors do suggest that further attention might be paid to people whose declines are greater than the mean in the study. That suggests to me the mean declines are not that serious, although I readily admit I may be reading something into the authors' suggestion that isn't there.

What I also found, though, is that the researchers did not control for health. Since older people tend to be less healthy, were these cognitive declines due to changes in brain function or to the fatigue resulting from poor health? Information about medical risk factors was collected, but the article does not report that it was incorporated in the statistical analyses.

None of this is intended to question the adequacy of the research. Being able to carry on a rigorous study of over 10,000 people for 24 years is proof enough of the researchers' competence. What this is intended to question is the value of a news report that simply reports results without examining them. I'm sure that if the researchers had been asked about the relationship of the health information they collected to cognitive decline they could have explained it fully. I'm sure if they'd been asking questions like that the journalists would have enjoyed their jobs more, too.

Secrets of the truth cult!!

2011-12-16T07:04:00.001-08:00

For much of their history human beings have taken part in rituals in which an authority informs other people of what is supposed to be the Truth. I call this the pulpit model of information. For centuries Europeans went to church and an authority got up in the pulpit and told them what to believe about the world (and other places).

This model was later adopted by the schools, no doubt because the schools were established by churches. Whatever the reason, schooling until recently consisted of listening to an authority tell you what to believe about the world (in universities, it still often consists of this). In school, though, you were even tested to make sure you’d learned the approved view of things.

In school you also acquire the idea that Truth is something that can be found on the printed page. Consequently we come to accept something that has been published as true, without verifying that it is.

It’s not surprising that we come to look on the truth as something that is dispensed by authorities. Consequently, we look around for people who look like authorities, and treat what they say as information. Furthermore, we treat the methods they use to come up with things to say as methods that can be used to define information. We are often wrong.

Given the track record of authorities (remember all those biological weapons that, according to authorities, Iraq was just itching to use against the West?), depending on them to tell us the truth is a questionable approach. Another problem with this approach is that there is considerable doubt as to whether we need to know the truth, anyway.

Here’s something that’s true: Churchill, Manitoba, is named for John Churchill, first governor of the Hudson’s Bay Company. That’s a fact. Despite being a fact, though, it doesn’t help me get served when I drop in to the local branch of his company.

Every day we are bombarded with truths. The newspaper tells us things like what the temperature was yesterday in Beijing and what celebrities have (or had) their birthdays today. I remember once reading in the paper that it was the late Alfred Hitchcock’s birthday and thinking “I can’t really send him a card, can I?”

Better than mere truth is information. Information is confused with many things that are not informative, though.

Facts, as we have just seen, are not necessarily informative. Unless I’ve made a bet about what the high temperature in Beijing was going to be, that fact cannot be said to inform me of anything.

Furthermore, many items of information are not factual. The idea of intelligence, for example, cannot be said to be a fact, since there is widespread disagreement about just what intelligence is. However, the concept of intelligence is informative because in speculating about it we discover useful things. We have even discovered some of the shortcomings of the idea of intelligence.

As we have also seen, authoritative statements are not necessarily informative. Another reason they're not necessarily informative is that they disagree with each other. In fact, many of them work according to decision models which encourage disagreement as a way of establishing crucial issues that need to be tested. Courts of English law, for example, require two or more highly trained professionals to argue for exactly opposite points.

People also often assume that a logically sound argument is informative. However, it need not be. We can reason as soundly as it’s possible to reason and still be wrong.

Deductive reasoning starts with a general premise or principle. It then applies that premise to a specific piece of evidence and draws a conclusion about that piece of evidence. For example, we might reason like this:

All Canadians are British subjects. (general principle)
John FitzGerald is a Canadian. (evidence)
Therefore, John FitzGerald is a British subject. (conclusion)

Well, that conclusion is true. However, let’s suppose we reason like this:

All Canadians have French first names.
John FitzGerald’s first name is not French.
Therefore, John FitzGerald is not a Canadian.

That conclusion is not true, although the reasoning is entirely sound. Since my first name is not French, the conclusion that I am not Canadian follows logically from the general principle that all Canadians have French first names. The problem, of course, is that the general principle is wrong. Consequently, all statements that follow logically from it are most likely to be wrong. That example is a bit artificial, but people draw sound conclusions from erroneous premises all the time.

For example, many people reasoned out thoroughly logical arguments that on January 1, 2000 the world would be thrown into chaos. I say their beliefs were serious because they acted on them. They stockpiled food, for example, they bought portable electric generators, and some even created fortified shelters to protect themselves from people who hadn’t stockpiled food or bought generators.

As we saw on January 1, 2000, though, the computers didn’t fail. Some of the premises in those thoroughly logical arguments had been unsound. Logic is a tool. Logic does not guarantee that your arguments will stand up any more than a hammer guarantees that the bookcase you build with it will stand up.

Information is often confused with consensus. The supposed existence of a consensus among scientists about global warming is supposed to imply that the consensus opinion is highly likely to be true. Well, a hundred years ago a consensus of scientists would have told you that other races were inferior to whites.

The issue of consensus about global warming seems to have been raised initially as a red herring. That is, people argued against taking action against global warming because there was no scientific consensus about what caused it.

However, consensus has nothing to do with it. At one time there was a scientific consensus that the sun revolved around the earth. That point seems to have escaped the people who are opposed to taking action against global warming, though. Now they complain that this consensus they considered so desirable is being forced on them.

What is informative about an idea is its ability to predict events. The chief value of consensus seems to be coming up with a plan that everyone, or at least everyone important, is willing to go along with. To me, that seems a lot like what lemmings do.

Information cannot be defined by its source. If an expert meteorologist says tomorrow will be sunny, clouds don’t decide to go somewhere else just because a respected source says they will. Information is defined by its effect. Information increases the probability that we will act in effective ways. If it never rains on days when the weather forecast calls for rain, you’re going to end up lugging around a useless umbrella. If it always rains on days your bunions hurt, though, your bunions are a mine of information.

Another dubious sports statistic

2011-12-13T07:23:00.000-08:00

I believe it is against Canadian law for a televised hockey game to be completed without the announcer mentioning, somewhere amid his (sic) endless recitation of players' hometowns, that getting the first goal is all-important, since the team that gets the first goal wins such a high percentage of games.

This belief seems to have come from a study of all major league baseball games between 1966 and 1987 which found that 66% of the games were won by the team that scored first. That’s an interesting finding because in baseball the visiting team is more likely to score first (since it bats first). However, the home team was still more likely to win, so the importance of the first run was still questionable. In 1998 Tom Ruane published an article in which he showed that teams scoring the first run were less likely to win than teams who were the first to score each of the second through ninth runs. The first run, it seemed, was actually the least important run to score. How can that be?, you may be asking. How can a run associated with 66% of victories be unimportant?

The reason it’s unimportant is most likely that the winning team scores more runs than the losing team. Consequently, it’s more likely to score the first run. So even if scoring the first run has no effect on the chances of winning a game, the winning team is still more likely to score the first run.

To examine this possibility I chose data from another sport in which teams don’t alternate offensive and defensive sessions. I collected scores from 110 National Hockey League games played from November 30, 2006 to December 14, 2006. I included games settled by shootout, but gave no credit to the winning team for the goal awarded for the shootout. The team scoring the first goal won 70% of these games (77 of the 110). However, the winning team also scored 68% of the goals (439 of 649). So, if scoring the first goal did not improve a team’s chances of winning a game, you’d still expect the winning team to score the first goal in 68% of the games, or 75 games. The improvement here is all of two percentage points.

But is it an improvement? You can’t reasonably expect that teams scoring 68% of the goals will necessarily win exactly 68% of the games. Other factors have some effect on the outcome, so you’d expect them to win a number around 75. Fortunately, we can estimate the probability that:

if scoring the first goal does not increase a team’s chances of winning and
if winning teams score 68% of the goals then
the team scoring the first goal will win 77 games.

That probability is 44%. Conventional standards of statistical signficance would reject the idea that the first goal is of any importance when the percentage is that high. However, arguing that the probability of the difference being real is still greater than 50% is entirely reasonable. But if we look at the difference that way, we still have to conclude that there is only a 56% chance that scoring the first goal increased the likelihood of winning a game, and that if it did increase the probability of winning a game, it increased it by only 2 percentage points (aka one chance in 50). Either way, that first goal doesn’t seem all that important.

I propose an alternative to the Law of the All-Important First Goal/Run. I modestly call it FitzGerald's Law: the first team to score the winning goal will win. My law has as much explanatory value as the Law of the Fatal First Goal/Run, but is logically more elegant. It also reminds me of another statistical topic which baffles me: why, in a baseball game which finishes with a score of 11-10, can the player who drove in the first run for the winning team get credit for the game-winning RBI? Hm?

Another dubious sports statistic © 1995, 2006 John FitzGerald
More articles at ActualAnalysis.com

Better living through multiple linear regression analysis

2011-12-09T05:59:00.000-08:00

I probably say somewhere on the main ste that multiple regression analysis is overused, and indeed it is. Nevertheless, it does have valuable uses which I don't want to frighten people away from, so here's an article about one of them.

I regularly use regression analysis to clarify for a client the factors affecting satisfaction with training and rehabilitation programs the client offers. It started with a review of a program about which the client knew rhat the more enthusiastic about the program consumers were on entry, the more satisfied they were at the end. The question was whether final satisfaction or dissatisfaction with the programs was simply a self-fulfilling prophecy – did consumers say they were satisfied or dissatisfied with the programs simply to justify their initial attitudes?

The client also collected information about consumers' opinions of various characteristics of their programs. This information was not correlated with initial attitude, nor were different types of this information correlated with each other. It was therefore easy, using multiple linear regression analysis, to estimate what proportion of final satisfaction could be explained by initial attitude toward the programs, and then see if characteristics of the programs explained the remainder of the final satisfaction (the residual, as it's known in regression analysis). It turned out that characteristics of the programs were twice as important as initial attitude in determining satisfaction with the programs.

So not only did multiple linear regression analysis determine that satisfaction with the programs was not a self-fulfilling prophecy, it also estimated the relative importance of initial attitude and of the actual characteristics of the programs. The analysis was made easier by the lack of correlation between the different types of information collected, but correlated information can be analyzed with more complicated designs. The possible existence of correlation, though, is the chief reason you shouldn't try this at home. Statistical and database software make it easy to do multiple linear regression analysis, but if you don't know how to deal with correlated variables or how to identify outliers (extreme observations which distort the results), you'll often get the wrong results when you use that software.

We have since gone on to use this technique to determine whether what consumers say are the important factors in determining their satisfaction are in fact the most important. We have frequently found that a simple count of the most popular explanations is contradicted by the multiple linear regression analysis. This is not surprising, since counting explanations, even if they are valid, gives us only a very rough estimate of the importance of different factors. The multile linear regression analysis clarifies the issue.

Of course, it is also important that you use a proper hypothesis-testing design. Just turning multiple linear regression loose on a set of data is almost certain to produce a large proportion of unhelpful or misleading results.

Better Living through Multiple Linear Regression Analysis © 1999, 2011 John FitzGerald

The myth of information technology

2011-12-05T08:06:00.000-08:00

The term information technology implies to many people that the technology to which it refers creates information, transmits it, or stores it. The technologies we group together as information technology, however, rarely perform any of these functions. They are called information technology because they use information, not because they transmit it. A cellphone, for example, converts coded electrical signals into a facsimile of a person speaking. What the person is saying, though, may be balderdash.

Information consists of data which reduce uncertainty. The technology which we refer to as information technology is blithely unaware of whether the data it deals with reduce uncertainty or not.

The data provided by "information technology" may not be informative simply because they are irrelevant. For example, if I go looking for the box score of a particular baseball game in the newspaper, the other box scores, informative as they are, simply make it more difficult for me to find the one I'm interested in. These days, though, people use their information technology to collect large amounts of information which are of no relevance to the decision they're going to make.

Then again the data may not be informative because they are not intelligible. While Turkish newspapers are informative to Turks, they are not informative to me, because I don't speak Turkish. I deal with this problem by not subscribing to Turkish newspapers. However, people often use their information technology to collect large amounts of data which they can no more interpret than I can interpret Turkish newspapers. Turning data mining software loose on the data is not guaranteed to turn it into information, either, for reasons which are discussed in other articles on the main site.

The fact is that we make information, not technology. Even those rare items of software which perform analytical functions were created by human minds. Most of what we call information technology is actually nothing more than data technology. It gives us the capacity to collect large masses of data, but it is up to us to find or define the information in it.

Few people believe everything they read in the newspaper or see in the television. Few believe that every item that appears in the newspaper or on the television is relevant to their concerns. It's time for the same discernment to be shown in dealing with databases.

We hear a lot these days about the problem of information overload. In fact, it is data we are overloaded with, not information. If we set out to collect data, we will drown in data. If we condescend, though, to use our analytical abilities, and set off in search of the data that we need, we will find that you can never be overloaded with information.

Transparent evaluation

2011-11-24T07:04:00.000-08:00

One of the ways I earn my living is by evaluating service programs. People are often wary of program evaluation, since evaluation is a word with many meanings, many of these meanings negative. In research, though, evaluation has a very simple and neutral meaning. It is simply determining whether an event of interest has happened.

Program evaluation, therefore, is simply a matter of determining whether a program is doing the things that it is supposed to be doing. I prefer to look on it, in fact, as giving a program a chance to show what it can do. So, if you're going to show what a program can do, how do you go about it?

First of all you need a plan – a statement of what the program is supposed to be doing. If you're evaluating a program for the first time, the first step is likely to be the development of a program logic model. A program logic model is simply a description of the steps in the program and the decisions made once steps are completed.

Once you have the program logic model, you then determine if the program is following the model. Obviously, to do that you need records. A crucial part of any service program is a system of records which provides:

descriptions of the services being provided to each consumer,
descriptions of the goals which these services are to help
the consumer achieve,
measures of the extent to which the goals have been achieved,
and
descriptions of the decisions made as a result of the
achievement or non-achievement of goals.

Obviously judicious examination of records like those is going to help you determine if the program logic model is being followed. If the program is not implementing the plan fully, then you can take steps to improve its chances of doing so.

The records system will also permit a thoroughgoing outcome evaluation. Accurate estimates of the program's success in achieving its ultimate goals can easily be calculated.

Furthermore, a good system of records will enable program staff or anyone else to perform the outcome evaluation by themselves. When you require an external evaluation, for example, you won't have to pay your independent consultant to develop an evaluation from the ground up. The evaluative standards will be set, and the evidence will be collected. Your consultant can spend time doing something more sophisticated and effective, such as studying specific aspects of the program that you consider important.

In short, the goal of program evaluation is to make the program transparent. If program evaluation is successful, there will be general agreement about what the goals of the program are, about the ways in which the program should be trying to achieve these goals, and about what the world should look like if the program is successful. There will also be clear standards by which anyone can reliably measure the degree of success achieved by the program. That also makes evaluation more bearable for staff, since they don't have to worry about their work being evaluated by standards of which they have not been informed.

Doing all this can be a lot of work. However, the benefits are enormous, and you need spend no more money, in either the short or long term, than you could end up spending on less productive approaches.

Now on Twitter

2011-11-14T10:34:00.000-08:00

You can now follow me on Twitter: just click here.

DIY Guide

2011-11-14T10:30:00.001-08:00

I have expanded the current newsletter into a do-it-yourself guide for tuning up your rating systems; just click here.

Actual Analysis website

Average vs. average

2011-10-31T09:07:00.001-07:00

I have run across people who, when calculating a mean, will discard their two or three highest and two or three lowest pieces of data and calculate the mean for the rest of their data. What they want to do is protect themselves against the effects of skew, specifically the distortion of a mean by a few extreme scores.

That probably doesn't hurt, but there is a simpler and much more effective way of dealing with this problem – use the median. The median is the score that is midway between the highest and the lowest. In other words it is the true average of your set of data (the mean is an estimate of the median). So use the MEDIAN function in your spreadsheet rather than the MEAN function.

There are some exceptions to this rule, though. If you're using your data to estimate a total – the total value of donations to an organization, for example – you'd use the mean. If you want to compare two sets of data with a statistical test you would usually be better off to use the mean.

And if the SKEWNESS function in your spreadsheet provides a skewness coefficient for your set of data that is higher than -1.00 and less than 1.00 you normally don't worry about this at all.

Current newsletter

2011-10-28T08:10:00.000-07:00

For my current newsletter click here.

Uninformation (4)

2011-06-24T06:56:00.000-07:00

Information is not identical with experience

We often assume that because someone is experienced in a field that they are therefore well informed about it. One is particularly likely to believe this if the person involved is oneself. One might as well argue that because I take the streetcar every day that I am an expert on public transportation, or that because I watch television every day I'm an expert on television. Obviously you acquire some knowledge from your experience, but it does not necessarily constitute an understanding of your experience.

And we may simply fail to learn from our experience. Psychologists talk about the consulting room phenomenon — faced with evidence that a diagnostic test such as the Rorschach test doesn't work the way it's supposed to, some psychologists and psychiatrists will reply that they've seen it work in their consulting rooms. In fact, individual practitioners have little opportunity to establish in their practice that a test actually works. The chief criterion they can use is the success of treatment, and even a correct diagnosis may lead to unsuccessful treatment, while an incorrect one may lead to successful treatment. We can also sometimes be a little lenient in deciding how successful we’ve been.

We have seen how authorities — people with great experience in their fields — usually disagree with each other. That is, their experience has led them to contradictory conclusions, and those conclusions cannot all be informative.

We derive information from our experience — we don't just pick it up by accident. We derive it by analyzing our experience in certain ways, acting on the conclusions we’ve drawn from our analysis, and then testing the adequacy of our conclusions.

First article in the Uninformation series

Actual Analysis
Uninformation (4) © 2011, John FitzGerald

Uninformation (3)

2011-06-21T05:58:00.000-07:00

The opinions of authorities are not necessarily informative

We often treat anything printed in an authoritative journal or asserted by an expert to be informative. Although authorities and experts do tend to be far better informed about their subjects than the average person, we still cannot assume that whatever they say is informative or even true . All you have to do to learn why we have no justification is to read what authoritative foreign journals and experts have to say about your own country. The influential journal Le monde diplomatique once published an article whose author claimed that Canada had no constitution, but rather “a collection of texts with the force of a constitution”, and that these onstitutional texts could not be challenged in lower courts . Well, the latest of this collection of texts explicitly defines it as the national constitution, and it explicitly gives all courts the power to review all matters of law, which of course includes the constitution.

Our lives are rife today with experts and expert opinions. The news media are constantly presenting experts and their opinions about every topic under the sun, the implication being that an expert=s opinion is more informative than the opinion of someone who is not an expert..

For an assertion to be informative to us, though, we have to have some idea of the likelihood that it’s true. If the expert is an expert on gardening or cooking, verifying the accuracy of what he or she says is fairly easy. If, however, the expert is an expert on politics or medicine or some other field which requires special or complicated knowledge which you do not have, you may well have no way of verifying his or her opinion. A few years ago we saw experts queuing up to predict that the stock market would rise, if not forever, at least for a long, long time to come. Certainly these experts made arguments for their positions, but usually they were adducing as evidence for their opinion facts which the ordinary person could not verify.

Another problem about expert forecasts is that the experts are rarely experts in forecasting. J. Scott Armstrong and Kesten Green have observed that the scientific forecasts we are often encouraged to believe in are too often forecasts by scientists rather than forecasts arrived at scientifically.

Another problem is that experts are not impersonal compendia of information but human beings who advocate certain disputed positions in their field. They are advocates for ideas which other experts in their fields dispute. The Western intellectual tradition is to have as many people as possible arguing about ideas. Many of these ideas have the same quality that ideas about what was going to happen on January 1, 2000 had – they are founded on data which are not fully understood.

We can hardly expect experts to be perfect. If we cannot expect them to be perfect, then we have to assess the soundness of their opinions. If we are unable to assess the soundness of their opinions, then their opinions are not informative to us. They may well be valid, but if we cannot verify that they are valid then they are not informative. At the same time as all those experts were predicting that the stock market would rise forever, some experts were predicting that the bubble was going to burst. Those experts were right, but most of us had no way of verifying that they were. Therefore, even though they were right, they were not providing us with information.

First article in the uninformation series

Next: Information is not identical with experience

Website
Uninformation (3) © 2011, John FitzGerald

Uninformation (2)

2011-06-14T16:39:00.000-07:00

2. The logical or reasonable is not necessarily informative

People often believe that if they can construct a chain of reasoning which supports their beliefs that therefore they have demonstrated that their belief is true and informative. For example, many people reasoned out arguments which they seriously believed demonstrated that on January 1, 2000 the world would be thrown into chaos. I say their beliefs were serious because they acted on them – they stockpiled food, for example, they bought portable electric generators, and some even created fortified shelters to protect themselves from people who hadn't stockpiled food or bought generators.

As we found out, they were wrong. However, I can’t say that their conclusion was any less sound than the conclusion I and most other people drew that any disruption that might occur on January 1, 2000 would be minor. The people who drew this conclusion were sane and their reasoning from their data was sound. It was probably as sound or sounder than my own. In the end, one reason I and most other people were right and they were wrong is that we were using better data – data which were more informative. Another reason is that we were just luckier. In fact, no one fully understood all the factors one would have to assess to produce an accurate forecast of what would happen to the power grid on January 1, 2000. Furthermore, we probably weren’t aware of all the factors that would have to be considered.

If sound reasoning is based on invalid and inadequate data, it will reach invalid and inadequate conclusions. None of us is perfect – not even, as unlikely as it may seem, you or I – and we all at one time or another base logical conclusions on unsound data. And sometimes our reasoning just slips a gear, too. Even if our reasoning is perfect, none of us is omniscient, either. We can easily overlook important considerations.

That’s why the betting industry exists. If you’ve ever heard some of the explanations – often vehement ones – which horseplayers come up with to explain why the sure thing they bet in the last race ran as if he was pulling a milk wagon, you’ll know that relying too much on reason can not only cost you money but also lead you into an unjustified skepticism about the honesty and competence of one’s fellow human beings.

Obviously logic is involved in the development of information, just as facts are involved. It is not by itself informative, though. Two plus two equals four, but if the right answer is five you’re still wrong. That is why conclusions drawn from data need to be tested before they can be accepted as sound. If you think the 5-horse in the next race is going to romp, you won’t know that you’re right till the race has been run. And no matter what the weather report says, you won’t know whether it’s going to rain tomorrow or not until tomorrow arrives.

First article in the Uninformation series

Next: Information is not the statement of an authority

Actual Analysis
Uninformation (2) © 2011, John FitzGerald

Uninformation (1)

2011-06-12T18:31:00.000-07:00

Information consists of data which establish whether or not an assertion is false. Not all data do this. One of the reasons we have difficulty becoming and staying informed is that we sometimes accept as informative things which really aren’t, or least aren’t necessarily. This is the first in a series of posts in which we’ll look at a few things which are not information.

1. Information is not synonymous with facts

People often confuse information with facts. Someone who knows a lot of facts is considered to be well informed. A fact is only informative, though, if it helps you settle a question you need to know the answer to. If someone is on trial for armed robbery, the Crown does not submit evidence that the defendant is a skilled bridge player, true as that evidence may be.

Here's a fact: Churchill, Manitoba, is named for John Churchill, first governor of the Hudson's Bay Company. That=s a fact. Despite being a fact, though, it doesn't help me answer the question “Where do I find the men’s shirts?” whenever I drop in to one of the Bay’s branches. So for me that datum is not informative, factual though it be.

Furthermore, there are plenty of items of information that are not factual. The idea of intelligence, for example, cannot be said to be a fact, since there is widespread disagreement about just what intelligence is. However, the concept of intelligence is informative because in speculating about it we discover useful things. We have even discovered some of the shortcomings of the idea of intelligence.

Information is always derived from facts, and it always helps to predict facts. However, it need not be factual itself, and something which is factual need not be informative. As someone who has spent his life filling his memory with facts whose relevance to my life is highly questionable (see note about John Churchill above), I realize that collecting trivia can be enjoyable. Until they tell you something useful, though, trivia are just trivial.

Next: The logical or reasonable is not necessarily informative

Actual Analysis
Uninformation (1) © 2011, John FitzGerald

The value of political polls

2011-05-03T04:17:00.000-07:00

I've been questioning the value of polls here, so here's some evidence of how well they work. I was interested in Ekos Politics' claim that methods of predicting the seats won by each party in Canadian federal elections "work pretty well" (the quotation is from a PDF I can no longer find on their website, but I have a copy if you want one). Here are Ekos' final projections for the election of May 2, 2011 (you can verify them here):

Conservatives: 130 to 146 seats
New Democrats: 103 to 123
Liberals: 36 to 46,
Bloc Québécois: 10 to 20
Green: 1.

And the results:

Conservatives 167 seats
New Democrats 102
Liberals 34,
Bloc Québécois: 4
Green 1.

In other words, Ekos got the Green seats right and no other party's. Of course, there is no reason they should get them right. The regional variation in voting is so great in Canada (the BQ only runs in Quebec, for example) that you'd need extensive polling in each riding to even hope to approximate the results. Even then the non-representative samples you'd be working with would seriously limit the accuracy of your estimates.

At any rate, the Ekos projections missed the two important events of May 2: the Conservative majority and the collapse of the Bloc. Journalists will probably go on acting as if polls mean something, but that doesn't mean you have to.