Did you think Hillary would win the election? Her campaign certainly did.

Why wouldn’t they? Nearly all polls showed her leading right up to the election.

The final Real Clear Politics poll of polls had Hillary ahead in 12 of 13 polls it tracked.

Fivethirtyeight said her chance of winning was 71.4%. The New York Times gave her an 85% chance. British odds-makers put the chance at 80%.

Fox News, of all sources, flatly declared, “Trump is headed for the worst (Republican) defeat since 1984.”

Yet Trump won. Donald J. Trump will become America’s 45th President on January 20th.

Remember all those wrong polls next time you’re trying to make sense of the latest Nielsen ratings.

Nielsen ratings are just like the Presidential polls.

And Nielsen ratings can be just as wrong as all those polls that said Clinton would win the election.

Nielsen ratings are estimates based on polling listeners. And like every other poll Nielsen ratings have a margin of error.

The problem is that the potential mathematical error inherent in polls, the so-called margin of error, doesn’t capture the extent of the unreliability of polls including Nielsen’s ratings.

Improper weighting of responses can have a bigger impact than sample size on the accuracy of a poll. And critics point to misapplied weighting for leading pollsters astray.

Each month the political pollsters weighted their responses. They “adjusted” their numbers so that the make-up of each month's sample reflected their expectation of who would vote.

The problem is that small variations between the make-up of each monthly sample can be amplified by the weights pollsters apply to the numbers.

Misapplied weighting has the potential to exacerbate month to month swings of a poll as well as the differences between one poll and others.

That’s why even on the eve of the election the gap between polls was six points, exceeding the theoretical potential for error.

Now think about Nielsen ratings.

Nielsen ratings have their own theoretical margins of error. But do those calculations really reflect the entire uncertainty of the numbers?

As with political polls Nielsen’s listening data have to be weighted each book to better reflect population demographics of each market.

For a diary market, each diary is assigned a person-per-diary value (PPDV) that is applied to each individual’s diary based on sex, age, and ethnicity.

In PPM markets the weighting that Nielsen applies is much more complicated.

Each day each active panelist is re-weighted based on the day’s usable meters to match Nielsen’s best guess about the make-up of the market.

One day a given panelist might (say) represent 2,000 women, the next day she might represent 1,000 women. The next day it might be 3,000.

Small changes in a meter’s weighting is amplified across meters resulting in swings that are potentially well beyond the theoretical accuracy of the numbers.

Challenge Nielsen about their rating estimates and they will confidently tell you that they are accurate and any shift in the numbers from once month to the next is real.

How is this different from the pundits who confidently used the polls to claim that Trump had virtually no chance to win?

Next time you look at the Nielsen ratings think about poor Hillary Clinton and her supporters. They all trusted the numbers.

Don’t make the same mistake.

Did you find the switch to PPM added additional stability, precision, or depth to radio ratings?

PPM may have increased granularity to the numbers, but it’s hard to claim that PPM brought stability or precision to radio when ratings are too often based on one or two meters.

Now Nielsen intends to use PPM to measure local television promising, wait for it...greater stability, precision, and depth.

If you think you’ve got problems with PPM, take some solace that television broadcasters in 44 markets will soon feel our pain and then some.

While Nielsen’s plans may not seem to have a direct impact on radio, the move has implications that do not bode well for radio.

According to Nielsen’s press release:

The addition of PPM panelists to the local TV service will...provide local clients with ratings that reflect precise measurement of local viewing behaviors and insights, and boost ratings fidelity....

The inclusion of PPM measurement into the Nielsen Local TV service will bring additional stability and depth to local TV ratings. The integration will advance Nielsen's ability to provide clients more granular data while measuring consumers wherever they watch.

We hope TV people talk to radio people before they get too excited.

Radio people were given those same assurances as Arbitron rolled out PPM.

Radio may have been sold a bill of goods with PPM, but at least Arbitron ** replaced** the diary with PPM in the top markets.

Nielsen isn’t even going to do that.

Nielsen apparently intends to combine data currently gathered via other means with PPM data, much like the company’s RADAR studies for network radio combine PPM numbers from large markets with diary numbers from small markets.

How’s that worked out for radio?

Radio’s RADAR was stripped of MRC accreditation in 2010 when Arbitron started combining PPM and diary data. The MRC has a problem combining different measurement methods.

Local television diary measurement is Media Rating Council accredited. However, the MRC is warning that local television too may lose MRC accreditation.

Every time the MRC refused accreditation to one of Arbitron’s new products executives quickly reaffirmed their commitment to accreditation:

We will continue to focus on obtaining and maintaining Media Rating Council accreditation in all of our PPM and

Diary markets. (Michael Skarzynski, CEO Arbitron 2009)

MRC accreditation for our PPM markets is a company-wide priority at Arbitron. (Gregg Lindner EVP Arbitron 2013)

Nielsen, in contrast, barely acknowledges the accreditation process, perhaps because so many of the company's products are not accredited.

In fact, no new radio PPM market has been accredited since Nielsen acquired Arbitron, and today nearly half of PPM markets remain unaccredited a decade after PPM launched.

Even the number one market in the nation—New York—is not accredited!

Nielsen pays lip-service to accreditation and then rolls out another unaccredited product.

If Nielsen is willing to compromise television ratings to save money how likely is it that Nielsen will invest to improve radio ratings?

Each month Nielsen releases client PPM 6+ rating estimates to the world, and each month we read numerous analyses of who’s up and who’s down.

While the monthly rating play-by-plays are widely read and taken seriously, they shouldn’t be.

This is the sort of analyses we see:

WHTZ was up to #2 for the first time in over a year but with only a slight increase of share.

WLTW’s 6.0 in July was its lowest 6+ share since November 2014. Its +.1 (share) in August halted four successive declines with a cumulative loss of seven-tenths.

WSKQ tumbled for the third successive sweep, a combined -1.4 since June, drifting from third to sixth.

This is the wrong way to look at Nielsen estimates. We all would like to believe that monthly numbers reflect reality, that the number one station really has more listeners than the number two station, but that’s not the case.

The ratings are not month-to-month battles between stations to be analyzed like a horse-race.

Stations inch ahead in the monthly trends not like horses but rather like heads and tails in multiple coin tosses.

Month-to-month rating changes are ** random**. They are no more predictable than whether a coin toss “winner” will be heads or tails. It’s something we’ve addressed here, here, here, and here.

That’s why wobbles are not rare events. And the wobbles can be great enough for stations to swap positions from month to month.

Even New York with its 4,000 daily in-tab doesn’t have a large enough sample to prevent month to month swings in rank.

To illustrate the degree to which uncertainties can impact month to month changes, we took a close look at recent New York numbers.

The first graph reflects the way station ratings are typically reported emphasizing month-to-month changes.

We’ve graphed the 6+ numbers for the top five New York stations trended from June to August.

As you can see, even top rated New York stations go up one month, drop the next, then repeat the process month after month.

For example, WLTW started the period in first place, fell to second one month, and rebounded the next month.

Meanwhile WCBS-FM started in second, wobbled its way to first in July, and ended the four months right where it started.

A much more useful way to look at monthly PPM trends is to look across the same months in previous years. Are stations ahead or behind where they were one or two years ago?

To make comparisons even more useful we can average three month’s numbers, a quarter’s worth of estimates. It won’t eliminate wobbles, but it will temper their impact.

The second graph is the result.

It shows the same five stations, but here we’ve averaged three months and compared the three months to the same months in the previous three years.

Looking at the numbers this way gives us a completely different view of what’s happening at the top of the New York ranker.

The first thing that jumps out is that four of the five stations are below their 2013 numbers.

Worse than that, three of the five stations are at their lowest point going back four years, some significantly.

One notable oddity is that four of the five stations spiked in 2015, only to fall again this year. Why would four of the top five stations all spike simultaneously?

It is unlikely that all four 2015 spikes are real.

Share is a zero-sum estimate. It always equals 100, so for one station to go up, some other station goes down.

Even with WSKQ’s growth, more than two shares evaporated from the top five New York stations in 2016. It's hard to believe that four out of five of the top New York stations would really tank all at the same time.

And what about Mega 97.9's steady growth over five years?

Recall that quote from one of the analyses, "WSKQ tumbled for the third successive sweep, a combined -1.4 since June, drifting from third to sixth."

The steady growth of the station over the past three years puts that “tumble” into perspective

It’s an example of why month-to-month comparisons can be so misleading. None of this was apparent from the monthlies.

We kept this analysis to five stations to make it easier to see what’s going on. Now that you’ve got the idea we’ll dig a little into New York numbers as well as other markets in future posts.

Your ratings are just like Presidential polls: Both are estimates, an approximation of something that can’t be precisely measured.

Presidential pollsters like to present their poll numbers as accurate and precise, but there is a limit to their precision and accuracy.

You may have noticed that every Presidential poll mentions a margin of error or confidence interval.

Margins of error are statistical calculations that indicates how much variance we should expect from one poll to another.

The larger the margin of error, the more the numbers are going to jump around.

Nielsen too likes to present their ratings as precise, an accurate measure of how well each station is doing.

However, like Presidential polls, Nielsen ratings are also estimates that have a certain margin of error. The "real" numbers might be higher than reported by Nielsen. They might be lower.

Unfortunately in analyzing Nielsen numbers we rarely acknowledge the uncertainty associated with polling.

We talk about "wobbles" as aberrations, but the truth is that every Nielsen number is a wobble.

Radio's problem is that while Presidential polls always clearly state their margins of error, variance of radio ratings is rarely reported.

You have to dig pretty deeply into the book to find it, and even then it isn’t obvious how much the numbers can vary and still be "within the margin of error."

Here’s one example that illustrates what you would find.

The first graph is a cume ranker of seven stations in one medium size PPM market. The ratings and stations are real but we’ve labeled them generically to avoid running afoul of Nielsen’s "fair use" rules.

They are ranked by Nielsen’s published estimates, but rather than show a single number for each station we’ve shown the range of ratings that would be accepted as "normal," that is within Nielsen’s confidence interval.

For example, Station 1 has a cume audience somewhere between 280,000 and 370,000. Meanwhile, Station 2 has a audience that could be as high as 370,000 or as low as 280,000.

This means that while Nielsen ranked the first station over the second, we really can’t say whether Station 1 or Station 2 has a larger audience anymore than a Presidential pollster can say which candidate is really leading if the lead is within the margin of error.

But it gets a lot more complicated than that.

Draw a horizontal line across at any point and see how many stations it touches.

Draw the line at 350,000 persons and it will run through three stations. That means that the top three stations are "within the margin of error," essentially tied.

Draw a line at 300,000 persons and the line runs through six stations.

It means that any one of the top six stations could be first...or sixth! We just don’t know.

You’re looking at 6+ full week numbers, the numbers with the lowest variance and we can’t definitively say which of the top six stations is number one.

Imagine what the variance is when we get into smaller demos or specific day parts.

The second graph is kind of messy but even more important. Take some time to study it.

The graph shows one station’s four month trend as you’ve never seen trends displayed. Again, the numbers are real but we can’t tell you the market or station. Here we’re looking at share.

The solid green line in the middle is the published share for each of four consecutive months. The blue line is the lower confidence limit according to Nielsen. The red line is the highest confidence limit.

The gray area is the area that falls within Nielsen’s total person estimate variance. In other words, the station’s share could be reported anywhere within the gray area and fall within Nielsen’s confidence interval.

What this means is that it is highly likely that the station’s share is somewhere between the blue and red lines, but we can’t say for sure exactly where it is within it.

So officially month 2 was a good month for the station, gaining nearly a full share. The station then lost ground for two months in a row.

But is that what really happened?

The margin of error for this station is about three shares. What that means is that the station has to gain or lose three shares before we can confidently say something real happened to the station.

(I’m simplifying here since Nielsen margins of error are asymmetrical, but you get the idea.)

The dashes show three very different trend scenarios, all equally likely to have happened.

First, the yellow dashed line shows the station flat over the four months. No change, solid as a rock.

Next look at the turquoise dashed line. It shows a steady uptrend for the station, from a mid four to nearly a six share in three months. Good job!

Now look at the purple line. From a mid six share the station has fallen below a four share!

Imagine how you would feel losing a third of your audience in a few months.

A flat trend, strong growth, or a free fall. All equally possible.

Two of the three plausible scenarios are wrong, but which two? And what if you react to one of two wrong scenarios by making changes to the station?

And keep in mind that our illustration is total week 6+ shares. You are probably studying demos, and typical programming demos have a variance two or three times the 6+ numbers.

This is why we always caution stations to resist the impulse to react to monthly changes in the numbers.

Wobbles are not aberrations. Wobbles are baked into ratings.

Next month when your new numbers roll out you might want to reread this post. And if it's a bad month you might want to make copies and pass them out.