The Numbers Don’t Lie
A while back, during the World Cup, Paul W directed me to an article in the Athletic. It was published just before the England-Senegal game, which was to be shown on ITV.
The basis of the article was this: major tournament matches involving England have been shown 31 times on BBC and 23 times on ITV. Of these matches, England have won 21 of the games shown on BBC and 9 of those shown on ITV, implying win ratios of 68% and 39% respectively. Conclusion: England are jinxed when matches are shown on ITV and therefore less likely to win the Senegal game.
The Athletic puts it like this:
The numbers don’t lie — England have statistically won far more World Cup or European Championship matches screened on the BBC in the past few decades. In fact it’s not even close.
Of course, numbers never lie, but the interpretations attached to them often do. And to be fair to The Athletic, the above quote is, I believe, intended to be tongue-in-cheek, with much of the article making the point that the difference in win rates is quite likely to be just coincidence.
So let’s look more closely at the evidence. There clearly is a difference in the win ratios for matches on BBC and ITV, but you’d never really expect the win ratio to be identical for games shown on each. Just like if you toss a coin 100 times, it’s not very likely that you’ll get an exact 50-50 split between Heads and Tails.
Naturally, if the Heads-Tails split is a very long way from 50-50, you’ll start to suspect the coin is biased. And similarly, if the difference between England’s win ratio on ITV and BBC is big enough, you’ll have evidence that England are genuinely less likely to win when the match is screened on ITV. But is the difference between win ratios of 68% and 39% so big, or could it have easily happened by chance?
Bread-and-Butter Statistics
This is a bread-and-butter statistics question, for which any statistical package would provide an answer. Here’s one way of carrying out the analysis in R:
prop.test(counts = c(21, 9), totals = c(31, 23))
and here’s the output:
2-sample test for equality of proportions with continuity correction
data: c(21, 9) out of c(31, 23)
X-squared = 3.2955, df = 1, p-value = 0.06947
alternative hypothesis: two.sided
95 percent confidence interval:
-0.01032723 0.58255724
sample estimates:
prop 1 prop 2
0.6774194 0.3913043
Without getting too hung up on details, if you look at the bit that says…
p-value = 0.06947
… this means that if England were equally likely to win all matches regardless of the channel screening the match, a difference in observed win ratios as extreme as the one we actually observed (29%) has a probability of around 6.9%.
For reference, depending on the context, a value of 5% or less is generally considered as indicative of a possible genuine effect. So, the analysis implies the difference in win ratios is a little surprising, but not enough to strongly suggest there is a genuine effect, and we’d conclude that actually there’s insufficient evidence to suggest England are less likely to win when their match is screened on ITV.
But there are additional reasons why the evidence for an “ITV effect” could be even weaker than this analysis suggests.
1. The calculation leading to a p-value of 6.9% assumes that all games are equivalent. But if ITV were – either by chance or choice – showing a greater proportion of England’s more difficult games, we’d expect the win ratio to be greater for BBC than ITV, and the probability of getting results as extreme as those observed would naturally be greater and therefore less surprising.
2. The result also assumes that match outcomes are independent of each other. In practice, especially in group stages, any team’s probability of winning is likely to be affected by their results in earlier matches, as they adjust to their qualification requirements. So, for example, if England’s first group match is shown on ITV, and they fail to win, it might have the effect of increasing their probability of winning the second match, which might be on BBC.
3. There are very few data available. Admittedly, the calculation in R above allows for this fact, but nonetheless the result could be considered more reliable if it were based on more data. Indeed, shortly after the Athletic article was published, England won the Senegal game, and the updated analysis in R leads to
p-value = 0.09698
In other words, still under the assumption that England are equally likely to win whoever is screening the game, the probability of getting a difference in win ratios for ITV and BBC differences as big as the one we actually obtained is nearly 10%, even without allowing for the effects described in 1 and 2. So, with just one extra piece of data, the results change from marginally surprising to not that surprising at all.
4. Finally and most importantly, as discussed in many other posts in this blog, there are real dangers in retrospective analyses of this type. Millions of events happen every day and some of them will be surprising. Toss a coin 10 times and you probably won’t get 10 Heads. Do that same experiment many times and once in a while you will get 10 Heads. If you then take the data from one of the experiments where you got 10 Heads and do a test of whether it’s feasible that the coin is fair, the result will suggest that the coin is almost certainly biased in favour of Heads. But that’s only because you ignored the hundreds of other experiments where the results were more balanced. Similarly, if you ignore thousands of unsurprising things over time, identify one surprising thing – like a fairly large difference in the win ratios of England games on ITV and BBC – you shouldn’t be surprised if a statistical test suggests the results are surprising.
Paul the Octopus
And on this final point, remember Paul the Octopus?
Paul became famous worldwide for correctly predicting the winner in all seven of Germany’s games at the 2010 World Cup, after having previously correctly predicted 4 out of 6 of their 2008 Euro matches. He also predicted Spain to win the final of the 2010 World Cup. His success rate in the 2010 World Cup was 12 out of 14, or 85.7%.
Again, it’s standard stuff to assess Paul’s predictors powers. A statistical test in R of whether Paul could have been choosing randomly in the 2010 World Cup and just got lucky leads to…
p-value = 0.01616
That’s to say, Paul had less than a 2% chance of being as accurate as he was, if he was just choosing teams at random.
Does this mean Paul the Octopus was a predictive genius? Well, the Wikipedia article on Paul also says:
The animals at the Chemnitz Zoo were wrong on all of Germany’s group-stage games, with Leon the porcupine picking Australia, Petty the pygmy hippopotamus spurning Germany’s apple-topped pile of hay (instead of Serbia), and Anton the tamarin eating a raisin representing Ghana.
So Paul’s results were surprisingly accurate, but if you get enough animals predicting results, it’s much less surprising if one or two get decent results. And these are the ones that get worldwide news coverage, not Anton the tamarin who ate the sampling equipment.
Conclusion
To conclude: looking at a set of numbers that seem surprising, and testing whether they are surprising or not, is a dangerous way to do statistics. It’s implausible that Paul the Octopus had genuinely predictive powers, regardless of any statistical analysis. It’s much more likely that Paul was simply lucky. And if we were able to analyse the data from all animals who were used to predict the 2010 World Cup – it’s not just Smartodds playing this game – it’s likely we’d find it unsurprising that one animal would do as well as Paul did.
England’s record for matches shown on ITV stands out in the same way that the results of Paul the octopus do, but to assess whether England’s win rate for such games is genuinely low, we need to remember that there are many Anton-the-tamarin-type-events that are happening regularly and which we don’t pay attention to. Include those in the mix and it’s not surprising that something as apparently surprising as the England-ITV performance will have occurred.