Cold Hard Facts

Eurovision Forecasts: New site

2015-05-17T00:00:00+00:00

This year’s Eurovision forecasts are now live at a new dedicated website.

Eurovision 2014: Bonus final predictions

2014-05-10T00:00:00+00:00

This is post is part of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.

And there’s more!

As an addendum to my predictions for tomorrow’s final, I thought it would be interesting to look at the distribution of finishing places for each country. One of the main motivations for this is a common misinterpretation of the list of winning probabilities I give each year.

Last year, I had listed the UK as the 19th most likely country to win the final. As it turned out, the UK came in 19th place, and a number of people congratulated me on the accuracy of the prediction. Now, as lovely as it is to be congratulated, I never predicted that the UK would come in 19th. There’s a world of difference between “19th most likely to win”, and “will most likely come 19th”. I demonstrated this two years ago in my wrap-up post for the 2012 contest, with the particular example of Malta that year.

I think the easiest way to avoid this confusion again is probably to make some actual predictions for the finishing place of each country. That way, hopefully nobody will misinterpret winning probabilities in this way.

What I’ve plotted here, as the green bars, is the inter-quartile range of finishing position for each country. So, for example, Austria’s bar goes from 7 to 16. This means that in 50% of simulations, Austria finished between these two positions (inclusive). In 25% of simulations, Austria finished in 7th or better, and in 25% of simulations 16th or worse. The black bar shows the average placing, which is not necessarily a whole number. In this case, on average Austria finished 11.87th.

Because we’ve got better knowledge of song quality this year, there’s definitely a strong correspondence between the order from winning probability (as seen last time and the ordering here. There are a few discrepancies though. For example, Belarus are one of the only countries which didn’t win a single simulation. However, they’re still forecast to come in ahead of France and Germany, because they can rely on votes from the other ex-Soviet bloc countries.

I should mention here that the model is generally pretty bad at scoring at the bottom end of the table. Although the dreaded “nul points” hasn’t happened since 2003, the model still predicts it in over half of simulations. This may mean that the placings near the bottom are a bit off as well.

I should also mention that it’s extremely unlikely that this exact ordering comes up. With twenty-six finalists, there are approximately four hundred trillion trillion¹ possible orderings. Picking the right one by chance alone is roughly as likely as three separate entrants being independently struck by lightning during the interval act.

Because these are probabilistic forecasts, it’s likely that some of them will be wrong. In fact, it would be a problem if they weren’t². If a model claims something will happen 50% of the time, then that thing should fail to happen 50% of the time as well. In a well-calibrated model, a 48% prediction (such as our prediction that Sweden will win) should be wrong a little over half the time.

If we take the inter-quartile ranges as predictions, then we should expect the true position to lie inside them 50% of the time³. Or, looking at it another way, we should expect half of our predictions to be right, and half to be wrong. Much more or less than that and there’s a problem with the model.

But who’s got the best song?

Another way of looking at this data is to compare how well a song is likely to finish with how “good” the model thinks it is. Of course, by this stage the model “quality score” incorporates information about the running order and probably other information, but it’s still interesting to get an idea of how the mechanics of the contest favour some countries over others.

There is definitely a strong relationship between song quality and finishing position, as one would hope, but the countries which deviate from this relationship are probably the most interesting. Netherlands and Austria, according to the model, have the second and fourth best songs, and yet they finish in fifth and eleventh place respectively. This reflects their lack of established voting patterns.

On the flip side, Ukraine managed to squeeze out a third place finish from the seventh best song. Russia, even more egregiously, have a song which ranks third from the bottom in quality, but they still manage to crack the top ten in terms of mean finishing position.

Before I get deluged with complaints from Russians, I should say that the model quality score isn’t really intended to match up with objective musical quality. Instead it models the opinion of a completely average Eurovision voter⁴. It’s entirely possible that they just don’t like twins.

Four hundred and three septillion, two hundred and ninety-one sextillion, four hundred and sixty-one quintillion, one hundred and twenty-six quadrillion, six hundred and five trillion, six hundred and thirty-five billion, five hundred and eighty-four million. You go and count, I’ll wait here. ↩
Nate Silver wrote a nice piece on checking this calibration for his March Madness basketball model. ↩
In this case it will be slightly more than 50%, because the position variable only takes on discrete values. The quantiles don’t actually divide up the set of outcomes very nicely. ↩
The best available approximation to this is a Hungarian. ↩

Eurovision 2014: Final predictions

2014-05-09T00:00:00+00:00

This is post is part of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.

A learning experience

It was probably reasonable to think of Tuesday’s model performance as mixed: the overall score was bad, but only because of some quite surprising qualifiers. On the other hand, last night’s performance was terrible. The particular set of qualifiers didn’t show up even once in 10,000 simulations. That’s substantially worse than you’d expect if the model was picking qualifiers at random.

The most egregious mistake was the model’s near-absolute certainty (93%) that Israel was going to qualify. This, as it turns out, was not the case. Looking into it, I think I now understand why this happened, and hopefully can avoid it next year.

The problem arose partially because of how weak the field was. One of the consequences of this was that the Betfair odds for all of the entries were extremely long. The “favourite” was Norway, with a price of 26.5 (~4% win probability). Israel had the third shortest odds, at a price of 65.0 (~1.5% win probability).

Now, as I described previously, the song quality is estimated from a linear fit to the logarithm of the odds. This means that the difference in quality between a song at odds of 2.0 and a song at odds of 4.0 is the same as between a song at 20.0 and a song at 40.0, or a song at 200.0 and a song at 400.0. This almost certainly overemphasises the differences between songs with long odds. In reality, there’s not a huge amount of difference between a song which trades at 100.0 and a song which trades at 1000.0, while a song at 2.0 and a song at 20.0 are worlds apart.

A fix for this would be to introduce more variability in quality at longer odds. This is definitely something I’ll look into for next year’s model. For this year though, I’m going to stick with things as they are, and console myself that most of the countries with truly long odds have already been eliminated.

While looking into this mistake, I also noticed a rather strange bug in the model, which resulted in some songs getting huge scores. One of the simulations showed San Marino getting a perfect 12 from every country. I’ve fixed this bug¹, which affected about 5% of simulations, but it does mean that the final predictions are now made using a slightly different model to the earlier predictions, so there may be some slight changes.

For the win

Moving on to the final then, the field is looking fairly balanced. After a Yugoslav-free final last year, and with the departure of Serbia and Croatia this year, it seemed that we were entering a new era for Eurovision. However, Slovenia and Montenegro have both qualified, for the first time in Montenegro’s case. I’ve seen it suggested that the former Yugoslav diaspora may be concentrating its votes on these two. If so, it bodes well for them.

Also interesting is that we have a full complement of Scandinavians present. With most voting blocs this would serve to weaken the strongest member, but the Scandinavians are usually fairly good at avoiding this. Expect plenty of vote-swapping, but the 12s should go to Sweden.

Not a lot has changed in the model prediction since the second semi-final, mostly because we already knew that nobody from the second semi was going to win. The main change is that, because I fixed the bug that gave them superhuman powers of singing, San Marino et al don’t really win very much any more.

From the Betfair side, the odds of a Dutch win have spiked enormously, to the point where they’re now second favourite. One of the main reasons for this is that the draw has now been made for running order. In general, over the past ten years, songs in the second half of the draw have performed much better than songs in the first. This year, Ukraine have drawn the first position, and many of the other highly rated songs are in the first half. The Netherlands are performing in 24th place, two before the end, while the UK take last position.

I had intended to incorporate running order into the model for this year, but it slipped my mind. This means that the model is missing out on some information. To try to combat this, I’ve re-run the model, but instead of using Betfair data from before the contest began, I’ve used the latest data.

As you can see, this causes the model to greatly focus on Sweden. Sweden have a fairly middling slot in the running: they perform 13th, at the end of the first half. However, this is probably much better than their competitors Armenia and Ukraine (1st and 7th respectively). Netherlands and the UK do receive bumps up the order in this new set of simulations, but not enough to make them really competitive.

Bellwethers, likely twelves, etc

Usually I try to identify “bellwether” countries: countries whose 12 points are most likely to go to the winner. In this case, because the model is so certain about Sweden winning, this has largely boiled down to the other Scandinavian countries. So, while Denmark, Norway and Finland all have around 40% chance of picking the winner, this is largely because they’re probably going to pick Sweden, and Sweden’s probably going to win.

A few of the lower-ranked bellwethers may be more reliable in practice. Poland, Spain and Israel are only slightly less likely to pick the winner, and are probably more varied in their tastes. Toss in Hungary and Macedonia, and you’ve got yourself some predictions. As always, be aware that the voting order is chosen for dramatic purposes, once the televotes are known. It can be a fun game to try to think yourself into the mindset of a Eurovision vote order planner, assuming you’re sober enough by that point in the evening.

We don’t have Cyprus voting this year, so Eurovision’s favourite pairing (Gryprus? Cypreece?) won’t be in evidence. The model is also fairly cool on the Moldova/Romania relationship, although I’m a bit skeptical about that. There are, however, some vote patterns that we can rely on.

Belarus → Ukraine (48%)
Greece → Armenia (56%)
Russia → Armenia (61%)
Georgia → Armenia (63%)
Finland → Sweden (64%)
Netherlands → Armenia (67%)
Azerbaijan → Ukraine (71%)
Norway → Sweden (75%)
France → Armenia (76%)
Denmark → Sweden (82%)

Summing up, Sweden are the likely winners—I hope they kept all the decorations from last year. Armenia has a posse, but probably not enough to give it victory. Don’t buy the UK/Netherlands hype, unless you enjoy that sort of thing, in which case go have fun, what are you listening to me for?

P.S. I’ve made some extra predictions, which you should read if you want to know what position you can expect each country to finish in.

If you’re interested, the problem was that the burn-in period for the Gibbs sampler wasn’t being discarded correctly, and so some parameters occasionally had rather stratospheric values. ↩

Eurovision 2014: Second semi final predictions

2014-05-07T00:00:00+00:00

This is post is part of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.

San Marino, I take it all back

Well, that was a bit surprising. Both San Marino and Montenegro qualified for the final for the first time in their histories. Either one on its own would be interesting, but both is downright bizarre. This particular set of qualifiers came up in only 18 out of 10,000 model simulations. For comparison, the most likely set of ten, which I predicted last time, came up in 628 simulations.

It’s quite likely that the model is poorly calibrated, and that it should have rated these two’s chances slightly higher. On the other hand, I think they were both always unlikely qualifiers, and I’d be very suspicious of a model which put them through with any confidence. The one upside is that now that these two have qualified for the final, the model should have a much richer set of voting data about them for next year.

So we move on, and I apologise to any Sammarinese whose hopes I may have dashed prematurely¹.

Where do we stand now?

If we incorporate the knowledge of which countries qualified for the final, we can update our winning predictions. Although the odds on Betfair have also changed, I haven’t updated the model to match. This is partly for convenience, and partly because I don’t want to introduce extra biases, given that the Betfair odds now presumably incorporate a lot of data from the first semi-final, some of which is already accounted for in the model. It’s also theoretically possible that someone has read my last blog post and made bets on the basis of the predictions therein. I really hope that nobody’s making any significant financial decisions on the basis of my half-cocked calculations.

Interestingly, one of the big changes in the Betfair odds is that Sweden and Armenia have traded places at the top of the board, so the implied probabilities from the betting data look a lot more like the model outputs now. It’s unclear what’s caused that change, as we really don’t have much extra information about those two now than we did 24 hours ago. Netherlands and Hungary have also had their odds tighten, which is probably simply a reflection of their having qualified, which was at least somewhat uncertain.

The strangest thing, as far as I can see, is that the odds for the UK have also tightened substantially. While we do have some new information about the countries that performed in the first semi-final, pretty much nothing has happened which could have an impact on the UK’s chances. I think this probably supports my theory that many of those backing the UK are not operating entirely on cold financial logic.

What then is new in the model predictions? The biggest change is that our surprise qualifiers, San Marino and Montenegro (and to a lesser extent Iceland), are now showing real possibilities of winning. The probabilities are still small (about 1-2%), but that’s a lot higher than it was previously. To put this in perspective, they’re both now ranked ahead of perennial favourites Russia.

The other noticeable effect is that, due to Iceland qualifying, the chances of all the Nordic countries have been depressed slightly, as they’re likely to siphon a few votes away. This is a pretty small effect though, and Sweden has taken most of the hit. In general, everything looks quite a lot like we left it before the semi-final.

But we’re not there yet

All that is getting a bit ahead of ourselves though. First, we’ve got the second semi-final to look forward to. This is a much weaker field: the model gives less than 8% chance that one of these countries goes on to win. This also makes the qualifications a bit less predictable, at least in theory. On the other hand, this is a small semi-final, with only 15 entries, and 10 qualification spots, so we can probably expect some stuff to go through which might not otherwise.

Greece, Norway and Romania are usually safe bets for qualification, and the model is giving them all over 95% likelihood. We can almost certainly add Israel, Finland and Austria to that list. After that, things start to drop off a bit in certainty.

It’s also fairly easy to eliminate a few unlikely qualifiers. Macedonia and Slovenia were never among the powerhouses of the former Yugoslavia, and without their neighbours for support, they’re unlikely to go any further. Belarus is similarly dependent on absent friends, and has a tendency to produce the kind of slightly odd pop song that only a totalitarian regime could love.

That leaves six countries on the bubble: Georgia, Ireland, Lithuania, Malta, Poland and Switzerland. Both Ireland and Malta benefit from the UK voting in this semi-final, and the model expects them both to qualify. Poland and Lithuania should both get some support from Ireland, but Poland also benefit from having Germany voting, so we’re likely to be seeing their uniquely creepy blend of nationalism, agriculture and heaving bosoms in the final as well.

Georgia, who are usually a safe bet for qualification, have drawn a particularly bad semifinal for them with almost no former Soviet republics for backup. According to Betfair, they’re the least likely country to qualify (price of 4.9, meaning a probability of about 20%). However, the model still reckons they’re the most likely to take the last slot, with a probability of 56%. Betfair thinks this slot will go to Switzerland, who have qualified only once in the past eight years, with Anna Rossinelli in 2011. She came last.

Looking at my web stats, I can’t actually find evidence that anyone from San Marino has ever read this blog. ↩

Eurovision 2014: First predictions

2014-05-05T00:00:00+00:00

This is post is part of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.

For the last two years, I’ve been publishing the results of a statistical model for predicting the results of the Eurovision Song Contest. This year’s final takes place on Saturday in an abandoned shipyard in Copenhagen, so it’s time for some more predictions. I’ve made some small changes to the model this year, which have had huge consequences for the results, which I think should be a lot more accurate now.

What’s new in model-land?

The big change in this year’s model is that I’ve incorporated data from betting markets, specifically the Betfair Eurovision winner market. In previous years, the model has had no real information about song quality to go on, apart from what we know from previous contests. This year, I took a look at the relationship between a song’s betting odds and the quality score estimated by the model.

Using data from 2004-2010¹, I’ve plotted the betting odds available in the week before the contest against the song’s overall quality score, as estimated by the model after the fact. There’s a pretty clear relationship: better songs get shorter odds, as you might expect. It’s not perfect, but it’s definitely better than what we had before.

Interestingly, there doesn’t seem to be as good a relationship between the betting odds and actual performance in the contest. It seems that gamblers are better at taking into account the quality of a song than the complicated voting patterns which exist. This is good for us, because it means that we can use the betting odds as a proxy for song quality without worrying about double-counting voting relationships.

I’ve also removed the effect of performer gender. After a bit of experimentation, it seems that this wasn’t helping much, and may even have been making things worse. It was also a bit of a pain to classify objectively, so I’ve dropped it. There’s not much effect on the final result.

On a technical level, I’ve reimplemented the model in Julia as a learning exercise. In general, I’m pretty impressed with Julia as a language. There are some mild annoyances with the type system, but I expect that’s more a result of my slightly dodgy beginner’s code than anything to do with the language itself. Performance is pretty fantastic, and all that’s really missing is the mature package ecosystem that more established languages have.

Enough with the nerding, what about the contest?

A few more countries have dropped out (Serbia, Croatia, Bulgaria, Cyprus), mostly citing economic worries. This leaves a bit of a hole in the Balkan region, which is historically one of the stronger voting blocs. It’s not immediately clear to me what effect that will have, but it’s probably good news for the other large blocs in Scandinavia and the former Soviet Union.

Returning, we have Portugal and Poland. Portugal will probably give 12 points to Spain, but this isn’t likely to affect the outcome of the contest by much. Poland are more of a wildcard, so it’s doubtful they’ll have a huge effect either. Overall, it’s likely that there’s been a slight rebalancing of the contest from east to west.

From a geopolitical perspective, it’s obvious to ask what the effect will be of recent events in Ukraine. The EBU have ruled that televotes from the Crimea will be treated as Ukrainian votes, as Ukrainian telecoms operators are still active in the region. As Ukraine usually gives a fairly high score to Russia anyway, it’s doubtful that this will skew things greatly.

Generally speaking, it’s quite rare for international events to have a big effect on voting in the contest, but it’s conceivable that there could be a small sympathy boost for Ukraine. Given that Ukraine is in semi-serious contention anyway, a small increase in votes could be all they need. It’s unlikely that there will be much negative backlash against Russia, for the simple reason that it’s impossible to cast votes against a country in Eurovision.

Here are the results of the Bayesian jury

Anyway, what are the predictions? The betting public seem to have chosen Armenia’s Aram MP3 as their favourite, but the model likes Sweden’s Sanna Nielsen a little bit more. As I said before, Ukraine are in with an outside shot, and the probabilities drop off very quickly after that.

Compared to previous years, the model is showing very high degrees of certainty, but this is largely due to having incorporated the Betfair data. In reality, this is a year with no stand-out entries, so it’s probably more open than usual to a strong performance on the night.

If we compare the model probabilities with the implied probabilities from just the Betfair data, there are some interesting patterns. Betfair gives a much higher win probability for the UK than the model, which might be explained by the primarily UK-based customer base of Betfair. Similarly, the chances of my personal favourite, Austria’s Conchita Wurst might be overestimated by some in the west.

Interestingly, the three countries the model projects as probable winners are all competing in the first semi-final on Tuesday night, along with Azerbaijan, Russia and Hungary, all of which are also highly rated. The only entrants in the second semi-final with more than 1% chance of winning are Norway and Greece, both of which clock in around 3%. The draw for the semi-finals is largely designed to prevent regional bloc voting, and doesn’t do much to prevent unbalanced draws like this one.

Such a strong field makes the qualifications from the first semi-final a little bit predictable. The six countries I’ve mentioned so far all have more than 95% chance of qualifying. Of the others, the Netherlands and Belgium should be back in the final, repeating last year’s success after a long absence. Moldova and Estonia are likely the last two qualifiers, but Iceland have an outside shot. San Marino, having sent the same performer every year they’ve entered, are very likely to have the same result, immediate elimination. Sorry, San Marino.

This is the only data I could get hold of. If anyone reading has more recent data, please get in touch. ↩

Eurovision 2013: Final predictions

2013-05-17T00:00:00+00:00

This is post is part of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.

After Tuesday night’s disappointing result, I was somewhat worried about the changes to the model for this year, and considered reverting to last year’s model. However, this would be both intellectually dishonest and, more importantly, a lot of work, so I decided against it. In any event, the second semi-final threw up fewer surprises than the first, and the model did fairly admirably, predicting 8 out of 10 qualifiers. This is better than random by quite a bit, but not an improvement on last year’s model. 14 out of 20 overall is respectable, but nothing to write home about.

Let’s get this out of the way

We now have all of the information we’re going to get before the final itself takes place on Saturday night. That means it’s time to make some forecasts.

As Macedonia failed to qualify, this will be the first Eurovision final since 1985 not to feature any (former) Yugoslavian entries. It will definitely be interesting to see what this does to the voting, as all of these countries’ points become up for grabs. The former USSR, on the other hand, will be there in strength, with only Latvia letting the side down. This is probably not great news for any of the ex-Soviet states, but Russia, Ukraine and Azerbaijan can probably weather the storm.

Now that they’ve qualified, Azerbaijan have retaken the top spot from Russia, and even extended their lead. This is probably because they’re better at drawing votes from outside the former Soviet Union, whereas Russia will now be competing somewhat with the other eight former Soviet republics. Scandinavia is also very well represented in the final, which may be a blow to the chances for everybody’s favourite, Denmark (now at an implied win probability of 55% on Betfair).

Overall, the chance of an ex-Soviet winner is a very respectable 47%: with nine entries, they must have a decent song in there somewhere.

Are you as good as you think you are?

Of course, all of these probabilities are based on a very vague idea of what each song’s quality level is. The model hasn’t heard any of the songs, so there’s some very important information missing in these calculations.

It’s possibly more interesting to ask, rather than “who will win?”, “how good does a song have to be to win?”. If we have an answer to that, we can apply our own judgment to the songs which we hear tomorrow night. I’ve plotted the quality level a song has to reach before its country has a 50-50 shot at victory. As the model’s quality units are a little abstract, I’ve also included five recent winners (and one not-winner) for comparison.

For reference:

By this measure, Russia have the easiest run of things, but they’ll still need a better song than they’ve ever produced to reach this level. In fact, only five countries have produced songs which, if they entered them this year, would give them a better than 50% chance of winning. From the graph, we can obviously see that Norway, Greece and Finland have done so - if they can replicate these performances, they’ll have an excellent chance of victory. Azerbaijan have also managed this, but interestingly not with their winning song - the model claims that their 2009 entry, “Always” was considerably better.

The other country is maybe more interesting. The United Kingdom have produced two entries which were good enough to win, but failed to do so for various reasons. In 1998, Imaani came a very close second with “Where Are You?”, losing out by only six points. In 2009, Jade Ewen sang the Andrew Lloyd Webber/Diane Warren number “It’s My Time”, but lost out to the Alexander Rybak juggernaut. In a less strong year like this year, either of these songs would be easily in with a good chance of winning.

The UK are one of the most variable countries, and this isn’t something the model takes into account. In a good year, they can severely overperform the model predictions. In a bad year, they can be among the worst countries out there. It’s up to you whether you think Bonnie Tyler is at the high or low end of that spectrum.

Early indications

Some countries vote very predictably, and other countries less so. Like last year, the voting order this year will be rigged for maximum excitement, so it’s likely that the more predictable countries will be got out of the way early in the voting, to increase the suspense. However, we can still look at which countries are likely to be good predictors of the final winner.

In this case, we’re looking at the “bellwether probability”, the chance that the entry that each country gives 12 points to goes on to win the contest. The more predictable countries tend to be very low on this score. Cyprus gives its 12 to Greece almost all the time, so like a stopped clock it’s only “right” when Greece wins. On the other hand, Hungary has no particular alignments, so its votes are more likely to match with those of Europe as a whole.

Last year, the best predictors were a diverse group of central European countries and outliers. This year we’ve added a new and intriguing group of bellwethers. As there are no former Yugoslavian entries in the final (nor their neighbour Albania), this normally completely predictable area has sprung wide open. If an entry can appeal to this area of the map, there are a lot of points available. If only everyone had known that beforehand.

Old friends

At the other end of the scale, there are the perennial relationships that lead people to claim that the voting is “rigged”. I’m reliably informed that people last year used these as the basis of a drinking game. I couldn’t possibly condone such behaviour, but I feel I should list them for completeness.

Actually, in the absence of the Balkans and Turkey, many of the longstanding relationships are left dangling. This could be one of the most unpredictable sets of voting in recent memory. However, some relationships remain strong:

Lithuania → Georgia (42%)
Ukraine → Azerbaijan (43%)
Albania → Greece (45%)
Belarus → Russia (48%)
Italy → Romania (49%)
France → Armenia (51%)
Armenia → Russia (56%)
Moldova → Romania (59%)
Romania → Moldova (74%)
Cyprus → Greece (90%)

As I said, these are less certain than last year, so adjust beverage sizes accordingly.

I don’t have time to read all that nerd stuff

To summarise, if this is a typical year, then Azerbaijan have the best shot at things. Russia have the easiest ride of things, but don’t have quite as consistent a record as the Azeris. The UK could probably win this thing if they bother to try this year, and avoid a Humperdinck-style disaster.

Things are a little unpredictable this year, because the qualifiers are a little bit unbalanced. You can still rely on Cyprus loving Greece to prove you haven’t slipped into an alternate timeline. He who controls the Balkans controls the universe.

For listeners in the UK, I’ll be doing an interview with BBC Radio Wales on Saturday night, around 7:50pm, as part of their Eurovision coverage, live from my (and Bonnie Tyler’s) local pub.

This is, according to the model, the best song never to have won Eurovision. It actually came third in 2004, behind Ukraine and Serbia/Montenegro, both of which benefitted greatly from their regional voting blocs. ↩

Eurovision 2013: Predictions for semi final 2

2013-05-15T00:00:00+00:00

This is post is part of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.

Didn’t we do well?

I think it would be fair to say that the results of the first semi-final were surprising all round. Western Europe and the former Soviet Union both had very strong nights, while the former Yugoslavia managed zero qualifications from four attempts. From the standpoint of the model predictions, the most surprising thing was that Serbia failed to qualify. It’s a little unclear to me why the model was predicting such a high probability for Serbia (they failed to qualify in 2009 as well) but I would have expected that votes from the other former Yugoslavs would push them through. Perhaps these bonds are weakening somewhat with time, or perhaps the slight changes in the voting system have made things more difficult for them. Or maybe nobody liked the song (I quite enjoyed it, but probably wouldn’t have voted for it).

The model managed 6 out of 10 correct qualifications, which doesn’t sound bad, or 2 out of 6 knockouts, which does. In most of these cases, the probabilities were fairly balanced, so we got Belarus (50%), Netherlands (51%) and Ireland (55%) instead of Slovenia (58%), Cyprus (58%) and Croatia (74%). It would have been more surprising if there weren’t a few switches like this. Losing Serbia (91%) for Belgium (46%) seems a bit more serious, but it’s still something that’s going to happen from time to time. Overall, 6 out of 10 is less good than we might have expected, but still not terrible - it’s definitely not a statistically significant failure¹.

Another day, another prediction

So, given that we’ve lost one of our most favoured countries (Serbia) and have gained a few surprise qualifiers, how do the winning probabilities change? Obviously, everyone who’s qualified has gained a bit of a bonus. Moldova, particularly, have jumped from respectable also-ran to outside shot. The big winner is obviously Russia, with a strong ex-Soviet showing pushing them into the first place slot. Bookies’ favourites Denmark have leap-frogged hosts Sweden to become the top Scandinavian country. I’ve colour-coded the graph below by qualification status: green have qualified from the first semi-final, purple have yet to qualify, and grey are the automatic qualifiers.

The model is confident enough about the qualification prospects of Azerbaijan and Greece that it doesn’t really matter that they’re yet to qualify. Of the automatic qualifiers, it looks like a tussle between established powerhouse and hosts Sweden and Italy, recently returned from a decade in the Eurovision wilderness. Spain, suffering from the lack of their old friends Portugal and Andorra, will not be doing so well.

A lot will now depend on the qualifiers from the second semi-final. If Azerbaijan do qualify, as expected, then they’ll retake Russia’s top spot. If not, the model predicts a straight fight between Russia and Ukraine. However, it’s always possible that some dark horse candidate, particularly from the West, could swoop in and change things completely.

A better tomorrow

So let’s look at tomorrow night’s second semi-final and see who’s going to qualify. This is a larger semi-final (17 rather than 16 contestants) but still only 10 qualification slots.

After the first semi-final I feel rather silly claiming anything is certain, but Greece, Armenia and Azerbaijan look like they’re safer than most. After that there’s a fairly smooth decline in probability: Albania and Romania look pretty good, Norway, Israel and Iceland maybe a little less likely, and Malta and Georgia round things out to ten. According to the model, Switzerland have only a 19% chance of qualification, but given the strong performance from Western Europe in the first semi-final, it would be silly to rule them out completely. The former Yugoslavia’s only hope rests with Macedonia, and having heard the song, I don’t think they’ll be celebrating.

Note that Azerbaijan have a relatively tough semi-final compared to how the model predicts they’ll do in the final. This semi-final is fairly low on former Soviet republics, and one of the four that are here is their old enemy Armenia. Assuming they qualify, they’ll do a lot better when they can get votes from Russia, Ukraine, Moldova, etc.

So as an overall prediction, we have:

Greece (90%)
Armenia (87%)
Azerbaijan (84%)
Albania (77%)
Romania (72%)
Norway (68%)
Israel (67%)
Iceland (66%)
Malta (65%)
Georgia (60%)

Let’s see how wrong I can be this time, and I leave you with my personal favourite from Tuesday night, the Montenegrin space program.

Next post in the series

p ≈ 0.17 ↩

Eurovision 2013: First predictions

2013-05-12T00:00:00+00:00

This is post is part of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.

Previously on the Eurovision Song Contest

Last year around this time, I wrote a series of blog posts outlining a Bayesian predictive model for the Eurovision Song Contest, and making a set of predictions about last year’s contest. Apart from a few hiccups relating to Malta, the model was a fairly qualified success. This year, by “popular” demand, I’ve revisited the model and brought it up to date for 2013, ahead of Saturday’s showdown in Malmö.

For those just coming in, I’m going to give a quick recap of the model that I used last year. If you’re already familiar, feel free to skip ahead to the next section, where I’ll talk about what’s new for this year. If you’re looking for a more detailed look at last year’s contest, take a peek at the series of posts from the start

Essentially, we can look at people’s voting preferences in the Eurovision Song Contest as composed of two components: song quality, and a “friendship” score, which takes into account how much the voting country likes or dislikes¹ the country being voted on. If we want to know whether a voter V will rank country A or country B higher, we add up the song quality and the friendship score in each case, and subtract the two. Then we can take this difference, feed it through a logistic curve and use the result as a probability.

I’ve taken voting results from both the Eurovision finals (going back to the introduction of televoting in 1998) and the semi-finals (going back to their introduction in 2004). I’ve then used a Markov Chain Monte Carlo sampler² to calculate the song qualities and friendship scores, assuming that they’re both normally distributed.

Once I’ve got the parameters, it’s relatively straightforward to run a simulation of this year’s contest, including the semifinals and all the voting procedures. Last year I ran 10,000 simulations like this, and looked at both who the likely qualifiers from the semifinals were, and the overall winner. The model managed fifteen out of twenty of the qualifiers, as well as the eventual winner. Although this level of success, particularly predicting the winner, was probably mostly luck, this year I’d like to do the same.

What’s new?

The biggest change to the contest since last year is that four countries have pulled out (Bosnia-Herzegovina, Portugal, Slovakia, Turkey), and one has returned (welcome back, Armenia!). This obviously has effects on the voting dynamics, both directly (e.g. all the Balkan countries no longer get votes from Bosnia-Herzegovina) and indirectly (e.g. Germany will have an extra voting slot free, rather than spending it on Turkey). Overall, we should expect a slight rebalancing of the votes, although the changes are fairly evenly spread across Europe, so it’s unclear what the overall effect should be.

In terms of the model, I’ve added two terms to try to increase the accuracy. The first of these is a term which sets the average song quality for a given country. If we look at countries’ past record in Eurovision, some stick out as more consistently successful (or unsuccessful) than others. For example, Azerbaijan have never finished outside the top ten, and only once outside the top five. On the other hand, Switzerland seem to have trouble even qualifying for the final, and on one occasion even scored “nul points” in a semi-final, which is quite an achievement, given the competition. This leads us to an idea that some countries might, through greater enthusiasm for the contest, a larger talent pool, or some quirk of their selection process, just be plain better at producing Eurovision entries than others.

The term I’ve added simply affects the mean song quality for the country: all countries still have the same variance, and they all still have the potential to produce songs of any quality. However, on average, some countries do end up better than others. The effect here is smaller than the general variation in song quality, but still relatively large overall: it ends up explaining about a third to a half of the variation in song quality. I’ve plotted the value of this term below for all the countries which have competed since 1998. This year’s entrants are in green, and the bars show one standard deviation of song quality: roughly speaking the song quality should be inside the bars about two-thirds of the time.

Countries near the top of the list (e.g. Azerbaijan, Russia) tend to be those which put a lot of resources into the contest, and see it as a way of promoting their culture throughout Europe. We also see high places for countries such as Sweden and Italy which have very well-developed national song competitions, which gives a strong talent pool to draw on. Near the bottom of the list we see a lot of smaller countries (Andorra, Monaco, San Marino) which simply don’t have the resources or the talent pool to compete successfully.

It’s also fairly notable that (barring Turkey and Bosnia-Herzegovina), the countries which are not competing this year are largely those which typically produce low-quality songs. It’s hard to tell which way the relationship works here. It’s possible that these countries have low enthusiasm for Eurovision, and thus have small talent pools to pick from, and pulling out is less of a big deal. It’s also possible that a string of poor performances could lead to a country becoming disillusioned with the competition.

The second change I’ve made to the model is to introduce a term accounting for the gender of the performers³. There’s a definite effect there: all-female entries are slightly better than all-male entries, which are a lot better than mixed entries. However, the overall magnitude is fairly small, around 0.3 quality units, roughly equal to the quality bonus a song gets for being from Malta.

Qualified success

This year, I’ve run 100,000 simulations of the full contest. Looking just at the first semi-final for now, there are 10 qualification places available, from a field of 16 countries. From a completely naïve standpoint, this means each country has a basline qualification probability of 62.5%. In reality though, some countries are more likely than others. For the first semi-final, this is what the model says.

In general, this year’s model is more certain about its predictions than last year’s; time will tell if it’s any more accurate. Anyway, as I think most people would predict, Russia and Serbia are dead certs for qualification. Montenegro, on the other hand, have a mountain to climb. There aren’t enough Montenegrins for the diaspora to have any significant effect on most countries’ voting patterns (although Serbia, Croatia and Slovenia will probably give them a few).

The interesting stuff is in the middle of the table. Ireland benefit from having the UK and Denmark voting in their semifinal, while Slovenia will have a boost from having three other Balkan countries in the mix. I’m not sure whose idea it was to put Netherlands and Belgium in the same semifinal, but it’ll be interesting to see if it gives either of them a big enough boost.

Overall, I think Russia (94%), Serbia (91%) and Ukraine (86%) are safe. Denmark (76%), Croatia (74%), Estonia (69%) and Moldova (66%) are also pretty good bets. Beyond that, it looks like Lithuania (60%), Cyprus (58%) and Slovenia (58%) but I don’t think it’s safe to rule out Ireland (55%).

Back in Baku?

Looking on to the final then, once again the model is more confident than last year (although possibly no more accurate). The top predictions are similar, but not identical, to the list of countries with highest average song quality. Quirks of the voting have boosted Serbia’s relative chances, for example. Overall though, the lists are very similar.

Overall, we’d be foolish not to plump for either Azerbaijan or Russia as winner. The bookies, on the other hand, are heavily pushing the Danish entry. At time of writing, the available odds on Betfair are around 2.5, implying a win probability of 40%. I’d take this with a note of caution though. While the wisdom of the crowds is often right, Eurovision betting has something of an echo chamber effect, with lots of people piling on the perceived favourite, and driving the odds downwards. Two years ago, the favourite was France which finished in an ignominious 15th place, losing out to… Azerbaijan.

Next post in the series

Let’s ignore for now what it means for a country to “like” another country. Countries that like each other vote for each other. ↩
Last year I used JAGS, this year I’ve used Stan. ↩
Gender is performance, and Eurovision doubly so. In each case I’ve assigned people the gender that they appear to be performing as. For avoidance of dount, Dustin the Turkey is male, and Verka Serduchka is female. ↩

EMI Music Hackathon: How I Did It

2012-07-29T00:00:00+00:00

Last weekend, Kaggle and Data Science London ran a second hackathon, this time focused around the EMI One Million Interview Dataset, a large database of musical preferences. I took third place globally in this competition, and this is an attempt to explain how my model worked. The code is available on GitHub.

For this competition I took a “blitz” approach. Rather than focusing on one model, trying to tweak all the performance I could out of it, I threw together a bunch of simple models, and blended their results. In the end, I combined ten separate predictions for my final submission.

Because I knew that I was going to be blending the results, for each model I retained a set of cross-validation predictions for the training set. These were used as input to the blending process, as well as to give me an idea of how well each model was performing, without having to use up my submission quota. In general, I used ten-fold cross-validation, but for models which used random forests, I simply used the out-of-bag predictions for each data point.

Preprocessing

As given, the data consists of a table of (user, artist, track, rating) quadruples, along with tables of data about users (demographic information, etc), and about user-artist pairs (descriptive words about each artist, whether the user owned any of their music, etc). These secondary tables were quite messy, with lots of missing data and a lack of standard values.

I generally don’t enjoy data cleaning, so I did one quick pass through this data to tidy it up a little, then used it as-is for all the models. I merged the two “Good lyrics” columns, which differed only in capitalisation. For the “OWN_ARTIST_MUSIC” column, I collapsed the multiple encodings for “Don’t know”. Similarly, I collapsed several of the levels of the “HEARD_OF” column. The responses for “LIST_OWN” and “LIST_BACK” needed to be converted to numbers, rather than the mish-mash of numeric and text values which were there to begin with.

To fill in missing values, I used the median value for numeric columns, and the most common value for categorical columns. I then joined these tables with the training data.

In most cases, the results were aided by first removing “global effects”. I subtracted the overall mean rating, and then estimated effects for users and tracks, with Bayesian priors which reduced these effects towards zero for poorly sampled users and tracks. These effects were then added back in after the model prediction had been made.

Chuck it all in a Random Forest/GBM/Linear Regression

The first thing I tried was an attempt to mimic Ben Hamner’s success in the last hackathon, by throwing everything into a random forest and hoping for the best. It turned out that while this was a pretty good approach, it was also extremely slow. I was only able to run a limited number of trees, with a reduced sample size, so the results probably weren’t as good as they could have been. I also originally ran this without removing global effects, and didn’t have time to go back and do it again.

As variations on the same theme, I tried a GBM and a simple linear regression. These were less successful, but much faster.

Partitioning the data

While a simple linear regression seemed unlikely to be successful, I thought it might be reasonable to try a separate regression for each artist. The results were surprisingly good.

Given that the random forest was so successful, and that the per-artist linear regression was quite good, it seemed like a good idea to try a per-artist random forest approach. This was also good, but not as good as I’d hoped.

Weirdly, the per-artist approach was much more successful than a per-track approach. A linear model on a per-track basis was much worse than the per-artist model – so bad that I didn’t even try a random forest.

SVD

The models I’ve used up to now have been the kind of tools that one would use on a generic machine learning problem. However, the kind of user-item-rating data we’re given here leads to what is called a “collaborative filtering” problem, for which there are a number of specialised techniques available. The most successful approaches in the past have come from matrix factorisation models, the simplest of which is SVD (singular value decomposition).

The particular form of SVD I used is one which was developed for the Netflix Prize competition by Simon Funk, who wrote an excellent blog post about how it works, and its implementation. Oddly, there doesn’t seem to be a standard implementation available in R, so I wrote my own in C and interfaced with that.

I ended up using results from two separate SVD runs: a quick early one, and a longer one with more features. The results were not fantastic, but they contributed greatly to the blend.

Nearest neighbour

Another common approach to collaborative filtering problems is nearest-neighbour filtering. I calculated a distance measure between tracks, based on the correlation between ratings given by people who rated both tracks. To predict new ratings, I looked through the user’s rating history and calculated a weighted average of the “most similar” tracks this user had rated. The results were fairly disappointing — most users didn’t have enough ratings to make this approach viable.

Demographics

As a final attempt at adding some variety to the blend, I tried an approach based purely on the demographic info given. I divided the users into five age quantiles, then for each track calculated the average score for each age quantile and gender. These turned out to be pretty terrible predictors of the actual ratings.

Blending

For the final blend I used a neural network with five hidden units, and skip-layer connections. Given that there were only ten inputs and over a hundred thousand training examples, overfitting was never a huge concern. However, a small weight decay did help with convergence.

Code, etc

The code is now available on GitHub. This is a slightly cleaned-up version of the actual code I wrote during the competition – most of the results I actually generated were done with a lot of tweaking hard-coded parameters and so on. However, the overall results should be pretty similar to what I submitted. As always, questions and comments are welcome, but suggestions for improvement are probably not going to be followed up.

Eurovision visualisation

2012-06-03T00:00:00+00:00

This is part five of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.

A brief diversion

I still haven’t gotten around to doing a full assessment of the Eurovision model’s performance on the night, but I did spend an afternoon messing about in D3.js, and I managed to come up with the network graph you see below. This is based on updated values of the friendship matrix, including the 2012 data, so it’s not identical to the graph I showed originally, but it’s quite similar. You can drag the slider underneath to change the threshold from 2 (approximately one standard deviation above average) to 6 (approximately three standard deviations above average).

Countries are represented by ISO 3166-1 alpha-2 codes, but you knew that already. This probably needs a modern browser, but I don’t have any non-modern browsers to test it in, so who knows?

Eurovision statistics: after the final

2012-05-27T00:00:00+00:00

This is part four of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.

They think it’s all over

The final is over, the dust has settled, and Sweden’s Loreen has won the 2012 Eurovision Song Contest. Congratulations to her—it wasn’t my favourite song on the night, but it’s a worthy winner, and obviously very popular across Europe.

Mostly I’m pleased because now I can test the model against new data. How did it perform? I’m going to put off doing a full evaluation because I don’t yet have the full results in a convenient form, but we can still look at some fairly simple measures of success.

First, though, I want to talk about the “Malta thing”. In the model predictions for the final, Malta was given a relatively high probability of victory—in fact they were ranked as the second most likely winners. As this was fairly counterintuitive, given their lack of strong friendship links, I picked out this prediction and tried to explain how it came about.

This prediction caused quite a stir in Malta, with a story in the Times of Malta and over 16,000 pageviews from Malta¹ on Saturday alone. Many took this as good evidence that Malta were going to do well in the contest, and some people were rather annoyed with me when they did not.

I’d like to apologise if I misled anyone. I didn’t expect anyone to take the model predictions particularly seriously, and if I had known, I would have included some more caveats and explanations of exactly what the model was predicting. Instead, I was fairly loose and jokey about the model results, and didn’t really talk about what they meant in real terms. Sorry, guys.

Risky strategies

The case of Malta does bring up an interesting point about the Eurovision results which isn’t widely understood. How is it possible for the second most likely winner to come 21st? We can ask similar questions about Norway (3rd most likely winner, came last) or Denmark (4th most likely winner, came 23rd). Is this a failure of the model, or is it a feature of the system as a whole? It turns out that it’s the latter—in general, countries which are more likely to win are also more likely to do extremely badly.

I’ve plotted the distribution of each country’s placing in 10,000 simulation runs of this year’s final:

The green bars show the place that each country actually achieved. The final bar for France is actually clipped, as the model predicted a 38% chance that they would come last.

There are essentially three kinds of distribution here. The first covers the Big 4 (France, Germany, Spain, UK). These countries have few inbound friendship links, and qualify automatically for the final. This means that they have a rather large chance of coming dead last. Their placing distribution reflects this, with a large peak in the lower ranks. It’s still possible for these countries to do well, but automatic qualification means that their worst songs don’t get weeded out early, and they have to compete with the best of the rest of Europe.

The second kind of distribution is associated with countries which have a lot of friendship links, both positive and negative. The best examples of this are Balkan countries, such as Macedonia and Albania. These countries have enough guaranteed votes to prevent them from ever coming last. However they also have enough negative links to prevent them from gathering points outside of their friendship circle². As such, they sit squarely in the middle of the table, year after year.

The third and final kind of distribution is the one which all of our likely winners display. Malta and Sweden are perfect examples of this. In statistical terms, this distribution is bimodal. There are two peaks in the distribution: one in the high rankings, and one in the low rankings. These countries have very variable results. Either they do well, in which case they place very highly, or they do badly, in which case they place very low.

This last group is where the winner usually comes from. Countries from the second group are perpetual also-rans. It’s impossible to win with a consistent middle-of-the-road performance. You’ll usually do well, often better than a lot of countries from the third group³, but there are enough countries with high variability that one of them is usually going to do well. It’s just very hard to predict which one.

I realise this is probably little consolation to fans of these countries who have just seen them crash and burn, but take solace in the fact that you at least have a shot next year, unlike poor Macedonia, which looks to be relegated to a future of constant 13th place finishes.

Evaluation

Looking again at the distributions above, the most surprising thing from the model perspective is that Norway came last. This was a fairly unlikely outcome, coming up in only 1% of simulations. I think that the reason the model underestimated this probability is the strange nature of the voting relationship between the Scandinavian countries. Although on average they give each other more points than might be expected, this relationship is really quite variable on a year-to-year basis. This isn’t captured by the model, which assumes a constant level of friendship, rather than the wild mood swings which characterise Nordic relations.

We can also look at the ten most likely 12s that the model predicted:

Iceland → Denmark (Only 5! Denmark did do pretty badly this year)
France → Turkey (5 again, the lowest since 1999)
Azerbaijan → Turkey (Yes.)
Germany → Turkey (8, the lowest since 2001)
Belarus → Russia (Yes. Of Belarus’ 58 points, 46 stayed in the former USSR)
Moldova → Romania (Yes)
Macedonia → Albania (Yes)
Romania → Moldova (Yes)
Greece → Cyprus (Yes)
Cyprus → Greece (Of course)

Seven out of ten predictions were correct. Turkey seems to have underperformed with its diaspora in the west—its other habitual donors such as Belgium, Netherlands and Switzerland also ranked it relatively low. It’s possible that the song, which seemed less traditionally Turkish, didn’t provoke the same enthusiasm among emigrants as in previous years.

In terms of the five “bellwether” countries, Hungary, Slovakia, Israel, Malta and Belgium, all but Malta gave twelve points to Sweden. In contrast, only eighteen out of the forty-two countries voting did so. This is a reasonable vindication of the model performance, although the probability of doing this well with five randomly selected countries is still around 11%.

Coming up next time

There’s still a more comprehensive evaluation to do, but it will have to wait for a few days. I’ll also try to dig into what we can expect for next year’s contest.

The population of Malta is approximately 450,000, so this is a significant proportion of the island. ↩
These negative links are probably more due to cultural incomprehension than active animosity. I don’t think anyone west of Vienna truly understands turbo-folk as a musical genre. ↩
Macedonia actually came ahead of Sweden in the majority of simulations. ↩

Eurovision statistics: final predictions

2012-05-24T00:00:00+00:00

This is part three of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.

Keeping score

Last time round, I compared my model’s original predictions to the results of the first semifinal. It managed to predict eight of the ten qualifiers correctly, which is significantly¹ better than the five or six that random guessing would give, and seemed to compare well with the numbers I was seeing on Twitter for human predictions (which have the benefit of knowing what the song sounds like).

This time around the model’s predictions were not quite as good. It correctly predicted that Bosnia-Herzegovina, Serbia, Turkey, Macedonia, Ukraine, Estonia and Sweden would qualify. However, it failed to predict Norway, Malta or Lithuania. Seven out of ten is still better than random though, and fifteen out of twenty is a fairly respectable overall score. There’s about a 1% chance of doing this well by random guessing.

What did we learn today?

Just like last time, we can look at how our knowledge about the songs’ quality has changed, now that we know that they’ve qualified.

The green areas show the marginal distribution of song quality, given that the songs have qualified. We’ve learned next to nothing about the Balkan qualifiers: those guys were always going to get through on each others’ coattails. The less expected qualifiers, on the other hand, must have pretty well-liked songs if they got through. In particular, the two Baltic countries, Lithuania and Estonia, have expected scores over 1.5. This raises their victory chances by quite a lot.

The final countdown

I’ve used these marginal distributions to feed the model and produce a final set of predictions.

Sweden have jumped back into the lead, having now qualified for the final. They’re joined in a very close second place by Malta, one of the surprise qualifiers from the second semifinal. Malta have very few strong inbound friendship links, either positive or negative. This means that their chances of victory are more sensitive than most to song quality. The fact that they qualified from what should have been a fairly tough semifinal shows that they’ve got a well-liked song, with a reasonable shot at winning the whole contest.

Overall, it looks likely that next year’s contest will be in Scandinavia, with Sweden, Denmark and Norway all having decent chances. The Balkans are not looking quite so good, nor are the Soviet bloc. In fact, this might be one of Western Europe’s best contests in a while.

Reading the signs

I had hoped to take a look at what the first few countries to report votes would tell us about the eventual winner. Unfortunately, there’s been a change to the format this year. Rather than the votes being shown in a predetermined random order, the order will be chosen to “maximise the excitement” once the jury votes are known, meaning that the early votes could be chosen to be deliberately misleading.

We can still look to see which countries are likely to be the most reliable indicators of the final winner. Looking at 10,000 simulation runs, I’ve calculated the probability that each country gives 12 points to the eventual winner.

There are a few countries which are very unlikely to pick the winner. The probability that Cyprus gives 12 points to the winner is pretty much synonymous with the probability that Greece is the winner, given typical Cypriot voting behaviour. Similarly, the former Yugoslavia is probably going to exchange 12s among itself, and given how unlikely it is that the winner will come from this region, the chances are low that one of these will hit the jackpot.

There is, however, a clear group of five countries whose votes are better indicators of how the contest is going than any of the others. The strongest two are both “unaligned” Central European countries, Hungary and Slovakia. It’s likely that if they were competing, Poland and the Czech Republic would also be in this group. These countries are unlikely to vote for a regional favourite, and will probably choose a song which appeals to Europe as a whole.

Israel, Malta and Belgium are similarly outside of voting cliques, and usually reflect broad opinion. However, they do have a few occasional voting habits which are worth bearing in mind. It probably doesn’t mean much if Israel votes for Russia², Malta votes for the UK or Italy, or Belgium votes for Greece or Turkey.

If you’re looking for an easy way to judge how things are going, and don’t trust the actual scores because of the “excitement maximisation”, these five are probably worth keeping track of. In 90% of simulations, the winning song gets a 12 from at least one of these five, and in 63% of simulations, from two. If two or more of these countries agree, there’s a 65% chance that it’s on the winner. If three or more agree, the chance rises to 84%.

Stating the obvious

There are also some 12s which are completely expected. I’ve produced a list of the ten most likely votes. When these happen, don’t get excited. It means absolutely nothing. Although I couldn’t possibly condone it, I imagine drinking a shot for each one might make for an entertaining drinking game, if you’re so inclined:

Iceland → Denmark (55%)
France → Turkey (60%)
Azerbaijan → Turkey (60%)
Germany → Turkey (60%)
Belarus → Russia (62%)
Moldova → Romania (68%)
Macedonia → Albania (76%)
Romania → Moldova (79%)
Greece → Cyprus (81%)
Cyprus → Greece (88%)

If any of these don’t happen, then that might be a sign that something interesting is going on. Or it might be random noise, who knows?

In conclusion

Sweden’s going to win, unless it’s Malta, or maybe somebody else. If you average together the taste in pop music of all of Europe, you get a Hungarian. Don’t trust the scores on Saturday night, they’re just toying with your emotions. Greeks and Cypriots love each other very much.

Next post in the series

p ≈ 0.03 ↩
This year’s Turkish entry, Can Bonomo, is a Sephardic Jew, and will probably also get some significant votes from Israel. This isn’t reflected in the model though; the friendship link from Israel to Turkey is actually slightly negative. ↩

Eurovision statistics: Post-semifinal update

2012-05-23T00:00:00+00:00

This is part two of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.

So how’d we do?

Last time I showed a model that gave probabilistic predictions for the Eurovision Song Contest, based on evidence from previous years’ voting. The first semifinal is over now, so we can see how well the model did predictively. Unfortunately, the full scores for the semifinals won’t be released until after the final, but we can still look at which countries qualified, and make a judgment from that.

The ten countries which were most likely to qualify according to the model, in descending order of probability, were: Romania (74%), Greece (73%), Albania (67%), Moldova (65%), Russia (63%), Iceland (60%), Denmark (59%), Hungary (54%), San Marino (53%) and Israel (52%). Of these, the first eight did indeed qualify, and the last two were replaced by Ireland and Cyprus, which had predicted probabilities of 52% and 51% respectively. If anything, this is evidence that the model is being too conservative with its predictions; if the probabilities were correct, we should expect to only get six or seven right.

In any case, the model seems to have passed its first test, even if it was an easy one. Now we can look forward to the second semifinal on Thursday, and the final on Saturday night.

The more you know

What extra information do we now have that can improve our predictions? While in a strict Bayesian sense we have learned more about voting patterns, and therefore should be able to improve our predictions for the second semifinal, in practice it’s just about impossible to extract any relevant information from what we learned yesterday. On the other hand, we’ve learned quite a lot that can inform our predictions for the final.

At the most basic level, we know that eight countries have been knocked out. The probability that the winner is Montenegro, Switzerland, Belgium, Latvia, Finland, Austria, Israel or San Marino is now zero, and the probabilities for everyone else shuffle a bit accordingly. We’ve also learned a little bit about the ten songs which have qualified. For example, they can’t be completely terrible. If they had extremely low quality scores, they would have been knocked out.

Using the same 10,000 simulation runs as last time, we can look at the quality distributions for the qualifying countries, and identify the marginal distributions: how likely it is that they have a song of a given quality, given that they qualified.

These are the distributions of song quality for each of the ten qualifiers. The overall shapes should look identical, but because we’re only using 10,000 samples, there’s a little bit of noise. The green areas show the simulation runs in which each song qualified. Our best estimate for the quality of each of the songs is now the green distribution, which is always to the right of the overall distribution.

In particular, the countries which were “surprise” qualifications, Ireland and Cyprus, have marginal distributions with much higher average quality than the full distribution. On the other hand, we really haven’t learned much about Romania or Greece. We always expected them to qualify, so the information that they actually did doesn’t tell us much about their song quality.

Sadly, marginalising over all ten qualifications at once doesn’t work as easily. If we look at just the model runs with this particular set of qualifiers, we’re down to four runs, and I’d be foolish to draw any conclusions from these. Instead I’ve generated a new set of 10,000 model runs, with the song qualities for the songs which have already qualified selected from the marginal distributions above.

Strictly speaking, this is not the correct thing to do. In doing this, I’m implicitly assuming that the qualification prospects of each country are independent, when in reality they’re not. However, I’m just going to gloss over this detail, and hope it doesn’t come back to haunt me later. It’s certainly not the worst approximation I’ve made when building this model.

The new order

Again, we can look at the probability of winning in the final for each country. This time round I’ve colour-coded the countries by their qualification route. Green shows the countries which qualified in the first semifinal, red shows the countries that have yet to qualify, and black shows the automatic qualifiers.

The clearest thing is that the countries which have already qualified are much more likely to win than those which haven’t. So much, so obvious. It’s maybe a little less obvious that the countries which qualified yesterday are in a better position than the automatic qualifiers. Those six songs are complete unknowns—as far as we know they could be horrendously awful. The songs which got through the qualifiers, on the other hand, have passed a basic sniff test. That raises their victory chances by quite a bit.

There’s also been a bit of a shift in the order of things. Some of the less likely qualifiers from yesterday, such as Hungary, Cyprus and Ireland, have jumped up the table. These countries had a tough qualification battle, and the fact that they succeeded bodes well for them in the final. There’s also been some rearrangement of the other countries, but it’s hard to tell how much of this is just random differences between two sets of simulations, and how much is real. Still, it’s easy to imagine that Sweden have benefitted from Finland’s exit, and you can tell similar stories about other countries.

Wrapping up

A few people have contacted me with suggestions of ways to improve the model. Some have suggested new sources of information, such as the performer’s age, gender, style of music, etc. Others have suggested systematic changes to the model, such as varying the distribution of song quality by country, or having time-dependent friendship factors. One point which has been raised repeatedly is that the voting system has changed between pure televoting and a mixed jury/phone-in system over the years. Modelling this is an obvious next step. If anyone is aware of the details of how the jury and phone votes are combined, I’d be interested to hear about it in the comments.

I’m unlikely to make significant changes to the model before the final, but I am interested in possible improvements. Suggestions which come with the data required to put them into practice are particularly welcome.

Finally, I should make some predictions for the second semifinal. On the basis of the model runs so far, it looks like Bosnia-Herzegovina (77%), Serbia (72%), Turkey (70%) and Macedonia (66%) are fairly safe bets. The next four, Georgia (60%), Belarus (58%), Ukraine (56%) and Slovakia (56%) are reasonably likely as well. The final two slots go to Estonia (55%) and Sweden (53%), but those are a bit shakier. I wouldn’t be surprised to see something else come through from further down the table, maybe Croatia (51%) or Norway (50%).

Next post in the series

I've got Eurosong fever, Ted

2012-05-20T00:00:00+00:00

This is part one of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.

This Saturday, the final of the 57th Eurovision Song Contest will take place in Baku, Azerbaijan. To the uninitiated, it’s quite hard to explain the cultural significance of Eurovision to Europe. On one level, it’s a simple pop music contest, a throwback to the 1950s, when men wore tuxedos, fostering cultural unity seemed like a good way to stop World War III, and Luxembourg still had a shot at winning international competitions. On another level though, it’s a fascinating insight into the complex interconnected web that is European geopolitics¹.

The system is relatively simple: each participating country sends a performer or group of performers with a three-minute pop song. Phone-in and text message votes in each country determine how that country’s ranks the entrants (other than its own). These rankings are then translated into points; 12, 10, 8, 7, 6, 5, 4, 3, 2, 1 for the top ten.² Points are tallied, and a winner is announced. Since 2004, the number of entrants has been too large for one contest, so semifinals have been held in the week before the contest, using the same rules.

While the system may be simple, the results are anything but. Certain patterns of voting are fairly predictable. Cyprus gives 12 points to Greece. Former Soviet republics give high scores to Mother Russia. What happens in Scandinavia stays in Scandinavia. These phenomena have been described in detail, both academically and in cartoon form. Depending on the level of politeness involved, these patterns are described as “bloc voting”, “political voting” or “collusive voting”.

These terms are probably unfair. Croatia didn’t sidle up to Bosnia one day and suggest a sneaky vote-swap. In most cases the results can be explained in terms of immigrant populations (Germany’s fondness for Turkey), or cultural similarity (the Irish love for Danish pop). Determining which is true in any particular case is probably impossible³. However, regardless of their source, the existence of patterns opens up the possibility of models which can predict the results of the contest in advance, at least probabilistically.

A Gibbs Sampler For Europe

I’m going to work with data from both the finals and semifinals, starting in 1998. This date is chosen partially to reduce the quantity of data, partially to ensure the relevance of the data to predicting modern contests, and mostly because this is the point at which the modern phone-in voting system was introduced. Prior to this, a country’s votes were determined by a pre-selected jury. This led to slightly less predictable voting patterns.

While most analyses I’ve seen have worked in terms of typical numbers of points awarded between countries, I’m going to take a more fundamental approach. The basic unit of data here is the comparison: if Country A gives Country B \(m\) points, Country A gives Country C \(n\) points, and \(m > n\), then we say that A prefers B to C. We know nothing about the strength of this preference, simply that it exists. This also extends to the case where one of the countries involved receives zero points.

This choice of datum means that our knowledge about a preference is not affected by the presence or absence of any other countries in the competition. This is a problem with a points-based approach, where for example the mean number of points given by Norway to Sweden is partially determined by which years Denmark took part in the competition.

In any given contest, we can observe a preference of A for B over C, for C over B, or neither. If we ignore the cases where no preference is observed, treating them as missing data, we can try to model the direction of the preference as a Bernoulli random variable. Our goal, then, is to model the probability that a given preference is expressed.

The model I’ve chosen for this is a modified form of a logistic regression. I model the affinity of a country for a given song with the sum of two terms. The first is a “friendship” term, dependent on the country awarding the points and the country receiving the points, but independent of the year of the contest. The second is a “quality” term, dependent on the country receiving the points and the year of the contest, but independent of the awarding country. The friendship term captures the “political” aspect of the voting, while the quality term should represent how universally liked a particular song is.

Given a country A and two songs (a, b), I then model the probability of preferring one over the other as \(\mathrm{logit}(Aff_A(a) - Aff_B(b))\). The distributions of both the friendship and quality terms are assumed to be normal, with zero mean and unknown variance.

I’ve run this model using JAGS, a Gibbs sampler, to determine the values of these parameters. There are approximately 130,000 preferences recorded, and about 3,000 parameters in the full model, so most of the parameters are fairly well constrained. In some cases, such as friendship ratings between two countries which have never competed in the same contest, the results are not so well constrained, but the prior distribution means that these values become small on average.

Anyway, what this gives us is a matrix indicating how much each country “likes” each other country in voting terms, along with a “quality score” for each song. The first question we can ask is then which of the two terms is more important. Is the voting more influenced by regular voting patterns, or is the quality of the song a bigger factor?

The answer is that they’re about equal in importance. The largest friendship factors are somewhat larger than the highest quality scores, but typical values are very similar. These really large friendship factors are mostly small countries giving points to big neighbours, for example:

Andorra → Spain (9.5)
Cyprus → Greece (9.1)
Monaco → France (8.1)

I’ve plotted the friendship links with strength greater than an arbitrary cutoff of 4.

The two most obvious groupings are the countries of the former Yugoslavia (near the top) and the former Soviet Union (near the bottom left). Both of these areas, because of their former political unity, have a lot of cultural similarity⁴ and many people have familial and ethnic ties to countries other than those they live in. As such, it’s in no way surprising that a lot of points are exchanged within these areas.

More interestingly, this phenomenon doesn’t seem to happen as strongly in Western Europe, where national borders have been stable for longer. Instead, there are a number of strong pairings (Andorra/Spain, Monaco/France, Iceland/Denmark), linked by chains of weaker links. The Scandinavian situation is particularly odd, with a chain of votes going roughly eastward: Norway → Iceland → Denmark → Sweden → Finland → Estonia.

We can also look at the strongest negative links: those countries which appear to never vote for each other. In many cases, these seem somewhat bizarre. I have no idea why Andorrans hate Serbia so much (-6.0), but the data never lies. Is it possible that Moldovans really dislike Maltese pop music (-5.2)?

In one case, however, the reason is clear. On all eight occasions when it was possible for Azerbaijan to award points to Armenia, they have failed to do so. The reason for this is the Nagorno-Karabakh war, a conflict between the two countries which took place immediately after the collapse of the Soviet Union, and which has been at a shaky ceasefire since 1994. While Armenia has on occasion dispensed a few points in the direction of Azerbaijan, the reverse has never occurred, and with good reason. In 2009, it was reported that the 43 Azerbaijanis who texted in votes for Armenia in that year’s contest were summoned to the National Security Ministry to explain their actions.

Song quality

The other thing we can look at is the quality of individual songs. The “quality” term in the regression gives us a measure of how well liked a song is, independently of any voting patterns that may have affected its score in the actual contest. Strictly speaking, these scores aren’t completely comparable between years, as we have no way of knowing what the average standard was in any given year. I’m just going to finesse this detail, in service of providing you with the Top Five Best Eurovision Songs Ever (1998-2011):

Alexander Rybak - Fairytale (Norway, 2009) (5.8) - It was always going to be this grinning loon with a violin. Holds all sorts of records for the most points, most 12s, etc.
Ruslana - Wild Dances (Ukraine, 2004) (5.3) - A Game Of Thrones filtered through post-Soviet Eurodance. She was a member of parliament in the Ukraine from 2006 to 2007.
Željko Joksimović - Лане моје (Serbia and Montenegro, 2004) (5.2) - I’m pretty sure every middle-aged woman in the Balkans has a little shrine to this guy. He’s entering again this year, but the betting markets are only giving odds of around 18/1.
Sakis Rouvas - Shake It (Greece, 2004) (4.5) - The Hellenic Ricky Martin. I’m reliably informed that he’s huge in Greece. Nobody else has ever heard of him.
Lordi - Hard Rock Hallelujah (Finland, 2006) (4.4) - After this won, we thought that all future Eurovision winners would be heavy metal orcs. How wrong we were.

If these scores are to be believed, then 2004 was a bumper year for Eurovision entries. 2011, on the other hand, was the worst year in the dataset, with the winning Azerbaijani entry scoring a measly 2.6.

The scientifically determined worst song in Eurovision history (-6.6) is then these rapping Gypsy superheroes who represented the Czech Republic in 2009. They scored zero points in their semifinal, and thus failed to qualify for the final. The Czech Republic has not competed since.

Putting my money where my mouth is

Of course, any model is only as good as its predictions. If we want to use this model predictively, we need a way to translate preference probabilities into consistent total orderings on the set of all entrants.

I’ve chosen to do this again using Gibbs sampling, this time in R. I treat each preference as independent, given known values for the affinities. This means that if I fix an ordering for all but one of the entries, I can calculate the probability that the remaining entry fits into each “slot” in the ordering. By repeatedly selecting an entry and re-inserting it using this method, I eventually achieve a sample from the implied distribution of total orderings.

For the 2012 contest, I assume that all friendship values remain the same as in previous years, while the quality scores are sampled randomly from the posterior distribution calculated previously. Then, running through the contest mechanics, I arrive at a sample prediction for the result of the contest. I do this 10,000 times, producing a distribution over the contest results.

The first thing I want to look at is the probability of qualifying for the final. The final generally draws much bigger viewership figures than the semifinals, but only in countries which qualify. However, national broadcasters have to commit to showing the final, without knowing in advance whether their country will be competing. This is a potential loss of a large number of primetime Saturday evening viewers. It’s therefore useful to know what the chances are that any given country will be participating in the final.

The “Big Five” countries (France, Germany, Italy, Spain, UK) and the host nation automatically qualify for the final, while the others must come in the top ten in one of the two semifinals. Were the semifinals completely random, or equivalently, completely dependent on song quality, the chance of any given country qualifying would be approximately 56%. However, the reality is quite different.

Greece benefits from having Cyprus in its semi-final, and Romania and Moldova’s mutual love-in raises both of their qualification chances. The second semi-final is dominated by the Balkans, with Bosnia and Serbia benefitting both from each other, and from their neighbours Slovenia, Macedonia and Croatia.

As an indication that dividing up voting clusters is effective, the qualification chances of the Scandinavian countries are relatively low. Only Denmark and Iceland have better than random chances of qualification. We should also spare a thought for poor old Portugal, which is missing its perennial boosters Andorra (not competing this year) and Spain (voting in the other semi-final). The chances of a Portuguese finalist are relatively slim as a result.

Looking on to the final, things are quite different:

The runaway favourite here is Sweden, which is likely to receive the lion’s share of Scandinavian points. Paradoxically, the high qualification chances from the Balkans actually act against them, as points are likely to be split between a number of countries. Scandinavia, on the other hand, will probably only get one or two qualifiers, and this will help in unifying the Nordic vote.

Coincidentally, Sweden are also the favourites on the betting markets, trading at 2.94 on Betfair at the time of writing, corresponding to an implied probability of 34%. The increase in certainty is probably due to additional information gleaned from actually listening to the song. We shouldn’t trust the betting markets too much though; last year their favourite was the French entry, which came a fairly shoddy 15th. It was very popular in Belgium though.

Wrapping up

The data I used for this, along with the (rough, undocumented, partial) code, are available on GitHub. The data is modified from that originally collected by Anthony Goldbloom of Kaggle in 2010, corrected in a few spots, and augmented with 2010 and 2011 data from Wikipedia. The code is my own, and not the most beautiful thing I’ve ever written.

There are a few obvious ways that this could be improved. It wouldn’t be hard to include information about the song, such as gender of the performer, language of the lyrics, or position in the running order. People have suggested that all of these could have an effect on a song’s chances, and it would be reasonably easy to test this hypothesis.

Next post in the series

Strictly speaking, Eurovision participation is open to all countries within the European Broadcast Area, or members of the Council of Europe. This leads to a rather loose definition of Europe which includes large chunks of North Africa, the Middle East and Central Asia. ↩
Psephologically speaking, this is a variant on a Borda count. ↩
The problem is similar but not identical to that of distinguishing contagion and homophily in social networks. ↩
At least for the purposes of Eurovision. For example, while there may be huge differences in culture between Croats and Serbs in terms of history and religion, these don’t generally come out in pop music. ↩

How to Hack a Thon

2012-04-29T00:00:00+00:00

Yesterday was the EMC Data Science Global Hackathon, a 24-hour predictive modelling competition, hosted by Kaggle. The event was held at about a dozen locations globally, but a large number of competitors (including myself) entered remotely, from the comfort of their own coding caves.

I finished in fourth place globally, knocked out of third at the last minute by a horde of Australian data scientists. The code I used is now available on GitHub, and I’m going to use this post to talk through some of the decisions I made along the way.

The problem

The overall goal is to predict (anonymised) measures of air quality over a three day period, given eight days of previous history of these measures, along with some meteorological data. The meteorology isn’t available for the prediction period though, so I decided to leave it out of my model. The dataset is relatively small, with about 700,000 total measurements in the training data, and about 40,000 values to predict. Even with terrible code driven by time pressure, I had trouble writing anything that took more than 15 minutes to run.

The tricky part about the dataset was that there are a lot of missing values. Of the 4009 different time series, only 776 give a complete 8 day record with no gaps, and over a thousand are missing a day or more. This sort of problem is common in real world data, and mechanisms for dealing with it can easily take as much effort and ingenuity as the actual modelling itself. It’s also an enormous source of bugs, as I discovered around two o’clock this morning.

A simple start

The first thing I did was build a series of extremely simple baseline models based on summary statistics of the data. I focused on the medians of variables, rather than the means, because of the error metric (mean absolute error). The purpose of these models was partially to get something quick and simple up on the leaderboard, but mostly to provide a fallback model for when fancier models fail due to lack of data.

I calculated medians for each variable, grouped by hour, 8-day chunk, month, hour and chunk combined, and hour and month combined. Then I took a weighted median of the five predictions, weighting by the reciprocal of the error as calculated on the training data. This is technically a bad thing to do, as we’re evaluating the model on the same data used to fit it, but I was in a hurry and didn’t really care. I also vaguely looked at using day of the week as a predictor, but didn’t bother following through.

Surprisingly this baseline model, which is barely a model at all, put me in eighth place on the public leaderboard at the halfway point. I took a break at this stage to eat some food, watch some TV, and ruminate on what a real model would look like.

ARIMA

By the time I got back, I had slipped to twelfth place, and things were hotting up at the top of the board. I had decided to fit ARIMA models to the data, as they’re a reasonably good generic time series tool, and I knew that R could fit them quickly and easily.

It took an embarassing amount of time (about 5 hours) to get this working without crashing. The handling of missing data in R is quite finicky, and I spent far too long debugging things and catching every possible problem. I think the lesson learned here is that I need to either improve my R debugging skills, or learn to write R code which is easier to debug.

Before fitting the ARIMA models, I transformed the data onto a log scale. This is usually a good way to work with concentrations, which is what I assumed the target variables were. It certainly made their histograms look more reasonable, and with time short that was good enough for me. I replaced zeros and negative values with the smallest positive value in each dataset to avoid infinities in the transformed data. I then filled in missing values using spline interpolation in the log space. If there were too many missing values, I simply fell back to predicting the median of the data available.

To begin with, I fitted a (1,0,1) × (0,1,1) seasonal ARIMA model with a 24-hour period, using the arima0 function from R. This particular choice of order was made very unscientifically, after playing around with a few different choices on the training data, and choosing the one I liked the look of best. I fitted a separate model to each time series, and predicted 72 hours into the future. The results put me up to thirteenth on the leaderboard (I had previously slipped to fifteenth), which was much worse than I had expected.

Post-processing

Looking at the predictions, it was clear why the score wasn’t as good as it could be. For some time series the ARIMA model was predicting explosive growth, in some cases giving predictions which were fifty times larger than anything in the training data. This seemed unlikely to me, so I clamped the predictions for each time series to the bounds of the observed data. This little change brought me up to 9th place.

The next experiment I tried was a simple blend. I took the results of the clamped ARIMA fit and the weighted median baseline model and averaged them. I didn’t expect this to improve things much, but it moved me up to seventh place.

I guessed that the reason for this was that the ARIMA model was making very bad predictions for the later part of the time series. Ideally, the predictions would regress towards the long term average as the prediction window moves further out. Rather than try to calculate properly how this process should work, I went with a quick and dirty approach I called “cross-fading”.

I set the solution to the ARIMA fit at the first predicted hour, and the weighted median fit at the last. For the in-between times, I linearly interpolated between the two fits as a function of time. My initial submission with this technique gave horrible results, before I realised that I’d done the interpolation backwards. Once I fixed that, I jumped up to fifth place.

The next thing I tried was a silly little trick which came to me in a moment of sleep-deprived inspiration. All of the target variables seemed to take a discrete set of values. Looking at the distribution of these values, it was clear that all the measurements for each variable were multiples of some discrete unit. I back-calculated what each unit was, and used that to round my predictions. This did give a very small boost to my score (0.0003!) but wasn’t enough to move me on the leaderboard.

Final submission

At this point I had two submissions remaining. I went back to playing around with ARIMA parameters, and discovered that I could get pretty good fits to the early part of time series using a (1,0,1) model with no periodic component. I tried cross-fading that with the weighted median fit, and rounding the result, but it performed less well than the previous fit.

As a last-ditch attempt to squeeze some value out of this model, and because I was tired and wanted to go to bed without having to code up anything new, I blended the periodic ARIMA model with the aperiodic one in a two-to-one mix, then cross-faded with the weighted median model and rounded. This was my final submission, and it jumped me to third place on the leaderboard. It was six o’clock in the morning and I went to bed.

When I woke up six hours later, I found that with one minute and sixteen seconds left in the contest, the ‘feeling_unlucky’ team had leapfrogged me for third place. Congratulations to them, and to Ben Hamner and James Petterson, who took the top two spots.

The code I used, in all its hacky glory, is available on GitHub. Feel free to gawp and stare, but please don’t send me any bug reports.

Postscript

It turns out that the two-to-one mix I chose for the final blend is damn near close to optimal. Experimenting after the deadline, I see that I can improve the score by 0.00007 by switching to a five-to-two blend, but two-to-one beats everything simpler. Score one for blind intuition.

The Million Song Dataset Challenge: Part I

2012-04-27T00:00:00+00:00

This is the first in a series of posts where I’m going to attempt to develop a solution to the Million Song Dataset Challenge, a Kaggle competition about predicting music listening patterns. If you want to follow along, the code I’m using is available on GitHub. You’ll also need to grab the data from the competition website.

This is not intended to be an introduction to predictive modelling, or collaborative filtering, or even the particular problem I’m dealing with here. If you want introductory material, there’s plenty available on the web. What I’m going to give here is a description of the particular process that I’m going through as I approach this particular problem. I’m going to assume that you’re either familiar with the techniques I’m using, or are capable of looking things up on the internet if you don’t understand. I am, however, happy to answer questions.

I don’t know yet what kind of tools I’ll be using, but I’m going to start out with Python, including numpy and scipy. Judging by previous experience, R will probably be involved, and it’s quite likely I’ll need to drop down to C for some performance-intensive stuff at some point. However, to begin with I’m going to start with Python for some preprocessing.

First impressions

This is a really big dataset. The full Million Song Dataset collection of triples is around 3GB on disk; just parsing that much data will take a significant amount of time. Experimenting with this problem is going to be very slow going. I can speed it up slightly by using random samples, and by making sure to cache the result of even basic calculations, but things are definitely going to get slow.

The first thing I want to do is get rid of the long tags that are being used as identifiers. It’s generally a lot easier to deal with integers. We’re already given a mapping from songs to integers, and I can use the order in kaggle_users.txt to produce one for users. Then I can translate at least the evaluation data into a more manageable form. The following script does this, yielding a CSV file with three integers per line.

# Script to convert evaluation triples to numeric form
# Usage: python numberify.py <userfile> <songfile> <infile> <outfile>

import sys

userfile, songfile, infile, outfile = sys.argv[1:5]

users = {}
for n, line in enumerate(open(userfile), start=1):
  users[line.strip()] = n

songs = {}
for line in open(songfile):
  song, n = line.split()
  songs[song] = n

with open(outfile, 'wb') as out:
  out.write('user,song,count\n')
  for line in open(infile):
    user, song, count = line.split()
    out.write('%d,%s,%s\n' % (users[user], songs[song], count))

Getting stuck in

I want to put together some kind of non-trivial prediction algorithm as a first step. Really what I’m doing here is some useful preprocessing, but I’ll throw together a simple prediction along the way as a sanity check.

I want to calculate a “co-listening” matrix for the songs. This will be a (sparse) matrix colisten where colisten[i,j] is the number of users who listened to both tracks i and j. This will allow for some very basic “people who listened to X also listened to Y” style collaborative filtering. To begin with, I’ll generate it from just the evaluation data, but eventually I’ll want to do this for the full MSD.

# Build colisten matrix from triplet CSV and save in mtx format
# Usage: python colisten.py <infile> <outfile>

import scipy.sparse, scipy.io
import sys
import util

infile, outfile = sys.argv[1:]

colisten = scipy.sparse.lil_matrix((util.N_SONGS, util.N_SONGS))

for listens in util.songs_by_user(infile):
  for s, _ in listens:
    for t, _ in listens:
      colisten[s-1, t-1] += 1 # Songs are 1-indexed, but scipy uses 0-indexing

scipy.io.mmwrite(file(outfile, 'wb'), colisten)

This takes about 10 minutes to run on my machine. I could probably optimise it, but it’s not code I’m going to run very many times, and it’s still short enough that I can go make coffee while it’s doing its thing, so I’ll leave it as it is. You’ll notice that I’ve created a util.py to include some basic constants and functions that make things a bit easier to read.

Making an entry

Now that I have the co-listening matrix, I can generate a prediction based on it. Let’s imagine that every song i is associated with a co-listening vector colisten[i,:]. If I sum this vector for every song that a user listened to, I should get a vector indicating how similar each song is to their listening history. I can then sort this in descending order to estimate which songs they’re likely to listen to in the future.

In many cases, I’m not going to come up with 500 songs this way, so as a backup I’ll use the overall listening frequencies, as in the sample entry. I can get this simply by reading off the main diagonal of the co-listening matrix. I also need to filter out the songs that the user has already listened to.

# Do prediction based on colistening matrix
# Usage: python predict_colisten.py <mtxfile> <evalfile> <outfile> 

import sys
import itertools
import scipy.io
import numpy
import util

mtxfile, evalfile, outfile = sys.argv[1:]

colisten = scipy.io.mmread(file(mtxfile)).tocsr()
listens = colisten.diagonal()

listenranked = numpy.argsort(-listens)[:500]

with open(outfile, 'w') as out:
  for history in util.songs_by_user(evalfile):
    songs, counts = zip(*history)
    
    sim = numpy.array(counts)[numpy.newaxis, :] * colisten[numpy.array(songs) - 1,:]
        
    # All this nonsense is an optimisation to avoid the fact that
    # sorting 300,000 numbers 110,000 times is bad for your health.
    # I only sort the songs where sim > 0
    simidxs = sim.nonzero()[1]
    srt = numpy.lexsort((-listens[simidxs], -sim[0,simidxs]))
    rankidxs = simidxs[srt]
    
    guess = []
    for s in rankidxs:
      if s+1 in songs:
        continue
      guess.append(str(s+1))
      if len(guess) == 500: break
    else:
      for s in listenranked:
        if s+1 in songs or s+1 in rankidxs:
          continue
        guess.append(str(s+1))
        if len(guess) == 500: break
      
    out.write(' '.join(guess) + '\n')

This time around I have done a little bit of optimisation, as it would have taken all day to run otherwise. It still takes about an hour or so, which is slower than I’d like, but as I said at the start, this is an annoyingly large dataset.

The moment of truth

So I submit this and…

…first on the leaderboard! It’s still very early days yet (there are only three non-benchmark submissions), but it’s a nice piece of encouragement. I think that’s a good point at which to stop for now.

To do

Faster sparse matrix saving/loading: scipy.io.mmread takes forever to parse a matrix
Look into some canned solutions like MyMediaLite
Grouping by user, rather than by song
Work out how to visualise the data, or at least get meaningful song titles

Cold Hard Facts

Eurovision Forecasts: New site

Eurovision 2014: Bonus final predictions

And there’s more!

But who’s got the best song?

Eurovision 2014: Final predictions

A learning experience

For the win

Bellwethers, likely twelves, etc

Eurovision 2014: Second semi final predictions

San Marino, I take it all back

Where do we stand now?

But we’re not there yet

Eurovision 2014: First predictions

What’s new in model-land?

Enough with the nerding, what about the contest?

Here are the results of the Bayesian jury

Eurovision 2013: Final predictions

Let’s get this out of the way

Are you as good as you think you are?

Early indications

Old friends

I don’t have time to read all that nerd stuff

Eurovision 2013: Predictions for semi final 2

Didn’t we do well?

Another day, another prediction

A better tomorrow

Eurovision 2013: First predictions

Previously on the Eurovision Song Contest

What’s new?

Qualified success

Back in Baku?

EMI Music Hackathon: How I Did It

Preprocessing

Chuck it all in a Random Forest/GBM/Linear Regression

Partitioning the data

SVD

Nearest neighbour

Demographics

Blending

Code, etc

Eurovision visualisation

A brief diversion

Eurovision statistics: after the final

They think it’s all over

Risky strategies

Evaluation

Coming up next time

Eurovision statistics: final predictions

Keeping score

What did we learn today?

The final countdown

Reading the signs

Stating the obvious

In conclusion

Eurovision statistics: Post-semifinal update

So how’d we do?

The more you know

The new order

Wrapping up

I've got Eurosong fever, Ted

A Gibbs Sampler For Europe

A social network

Song quality

Putting my money where my mouth is

Wrapping up

How to Hack a Thon

The problem

A simple start

ARIMA

Post-processing

Final submission

Postscript

The Million Song Dataset Challenge: Part I

First impressions

Getting stuck in

Making an entry

The moment of truth

To do