The Opta Million
I’m writing this post shortly before the start of the World Cup, for which Opta have posted the following competition: Entry to the competition is free and the prize is a million dollars. To win, you just have to predict the correct final team order in each of the World Cup groups, and then the winning team in each of the subsequent knockout games, right up to final and 3rd place playoff match. Any free bet is automatically good value, all the more so if the prize is as much as a million dollars. So deciding to enter is a no-brainer unless you value too highly the two minutes or so it would take to complete the entry form. But there are some interesting statistical questions:
- What’s a good playing strategy?
- What are the chances of winning?
- Are OPTA mad for offering a million dollar prize in a competition that has free entry?
The Simplest Model
The simplest assumption is that all teams are equally likely to win any match, notwithstanding the possibility of draws in the group stages. Under this assumption – by reason of symmetry – all possible arrangements of the final group tables are equally likely, as is the winner of each game in the knockout stage. This means that any strategy for choosing your entry is as good as any other – you might as well choose at random, go alphabetical, order by average annual rainfall in participating countries, pick your favourite countries or anything else: all valid combinations have equal probability of winning. Since each group contains 4 teams, the number of possible arrangements for each group is
4 \times 3 \times 2 \times 1 = 24
That’s because 4 different teams could finish first, after which there are 3 remaining teams who could finish second, then 2 who could finish third, leaving just one choice for fourth. Then, since there are 8 groups, the number of possible arrangements across all groups is
24 ^ 8 \approx 1.1 \times 10^{11},
which is 11 followed by 10 zeros. Or in words: a hundred and ten thousand million. Once you’ve got this choice correct, you then have to correctly predict the knockout game winners. There are 16 knockout games, in each of which there are 2 possible winners. So the number of possible winning combinations here is 2 multiplied by itself 16 times:
2^{16} \approx 65,000
Combining the group and knockout stages, the total number of arrangements is therefore
24 ^ 8 \times 2^{16} = 7.2 \times 10 ^{15}
which is roughly seven thousand million million. By assumption, each of these arrangements is equally likely, so the probability of any entry to the competition being correct is
1/(24 ^ 8 \times 2^{16}) = 1. 4 \times 10 ^{-16}
To put that number in some sort of perspective, the standard UK lottery consists of choosing 6 correct numbers from 1 to 59, and there are roughly 45 million ways of doing that. So, you’d be around 160 million times more likely to win the lottery jackpot with a single ticket than winning the Opta million with a single entry. Or another way of looking at it: there are around 8 billion people on this planet. If every single one of them made an entry to the Opta million once every minute, it would take on average around 1.7 years for someone in the world to win.
A More Realistic Model
In reality, teams are not all equally likely to win, meaning that some group combinations and knockout game winners are more likely than others. This means that different selections for the Opta million have different probabilities of winning. If you choose to make a random selection anyway, your chances of winning remain at one in seven thousand million million, exactly as above. But assuming you want to maximise your chance of winning, you should aim for the group arrangements and knockout winners that have highest probability. This requires an alternative probability model, which needs to satisfy a couple of requirements.
- It must conform to some basic rules of probability. For example, the sum of probabilities over all possible outcomes must be 1.
- It should conform to basic beliefs about the process: events that are more likely should have a higher probability.
A third requirement is that the model should be compatible with historical data. Indeed, this axiom is often used as a framework for model estimation. But to keep things simple, I’ll skip this issue here, and focus only on 1 and 2. Actually, a strategy for choosing the Opta million entry can be derived without giving a complete probability model. We just need to assume:
- Teams can be ranked by some measure of strength;
- Whenever two teams play, the higher ranked team is always the more likely to win.
Under these assumptions, the most likely group outcomes will be an ordering of teams by rankings; and the more likely winner of each knockout match will be the more highly rated team, providing a simple framework for choosing group orderings and knockout match winners. This just leaves the question of how to rank the teams. One simple option is to use the current FIFA rankings. This means that Brazil – the highest ranked team – is predicted to beat all other teams and therefore become the tournament winner. But the strategy also provides predicted group orderings and knockout match winners, and is therefore sufficient to define an optimal Opta million entry.
Chance of Winning
Calculating the probability of winning with this strategy requires a more detailed specification of the model. It’s no longer sufficient to say if team 1 is rated more highly than team 2 they have a higher probability of winning – we also need to know that that probability is. For knockout matches a simple model can be used. In a match between teams 1 and 2, with respective ‘strengths’ S_1 and S_2 , we might assume that the probability team 1 wins is
\frac{S_1}{S_1 + S_2}
It’s easy to check that if the teams have equal strength, this probability is 1/2, while the stronger team 1 is relative to team 2, the closer the probability is to 1. As such, the model behaves as we would want it to. Group stage calculations are more difficult, since there multiple match outcomes would lead to the same group ordering. Moreover, draws are possible results in the group stage, so the model above is actually insufficient. But even a model that included draw probabilities is insufficient, since tied positions are separated by goal difference. A full model would therefore require probabilities for goals scored and conceded. To avoid these complications, we can design a simple model specifically for the group orderings. If a group comprises teams with respective strengths S_1, S_2, S_3, S_4 , we’ll assume that (for example) team 1 wins the group with probability
\frac{S_1}{S_1 + S_2 + S_3 +S_4}
If team 1 wins the group, the probability that team 2 finishes second is assumed to be
\frac{S_2}{ S_2 + S_3 +S_4}
and given that teams 1 and 2 are first and second, the probability that team 3 finishes third is taken to be
\frac{S_3}{ S_3 +S_4}
These assumptions ensure our two requirements are satisfied: fundamental rules of probability are respected and the stronger a team, the greater their chance of finishing higher up the table. In serious application the strength parameters of the teams might be estimated using historical data. In the present setting we’ll simply use the Fifa rankings. Well, almost. A little trial-and-error suggests the numbers are more realistic with a small change to the Fifa rankings, defining
S = R - 1350
for each team. This doesn’t change the relative ordering of the teams, but leads to more plausible win probabilities. Recall that the optimal strategy for choosing the Opta Million entry – let’s call it OSOM – is to rank groups and knockout matches by Fifa rankings. With this strategy and the model described above, the probabilities of correct predictions of groups range from 0.068 for Group B to 0.177 for Group H. These differences are explained by the fact that in group B, 3 of the teams are quite similarly ranked, whereas in group H all of the teams have quite different rankings. Combining results across groups, The probability that OSOM correctly predicts all 8 groups turns out to be 3.9 \times 10^{-8} . Tiny, but more than 4000 times greater than the same calculation made under the assumption of all orderings being equally likely. Similarly, the probability that OSOM predicts correctly each of the knockout games is 9.3 \\times 10^{-5} . So the overall probability of OSOM correctly predicting the whole tournament is
3.9 \times 10^{-8} \times 9.3 \times 10^{-5} = 3.6\times 10^{-12}
which is around 26,000 times greater than the probability of winning with a selection chosen at random.
Conclusion
So, are Opta mad for offering a million dollar prize in a free entry competition? Well, even with our optimal strategy, we’d be more than 6000 times as likely to win the UK lottery jackpot with a single ticket, and it’s a pretty safe bet that Opta won’t need to be paying out.
There is, in principle, one further complicating issue when making a selection for the Opta Million. If more than one entry predicts the whole tournament correctly, the million dollar prize is shared among all winning entries. This being so, it might be better to make selections that are sub-optimal in terms of win probability, if they are more likely to be unique entries and lead to a win prize of the entire million. As I say, it’s an ‘in principle’ argument only. When your chances of winning are around 3.6\times 10^{-12} , it’s probably not worth worrying too much about the downside of having to share your million dollar potential winnings.
Post-Postscript
I’m sending this post on 1st December, at which point 4 of the 8 groups have been completed. I won’t be a millionaire. As explained above, my choice for the Opta million was based on team superiority according to Fifa rankings. This didn’t require the best ranked team to win every game at the group stage, but it did require groups to finish up in an order determined by those rankings. This worked for Group A, but not for Groups B to D. So, it’s disappointing. But at least I did better so far than an average random selection. As explained above, a random selection would get any group in the correct order with probability 1/24. On average, therefore, the number of groups correctly predicted (out of 4) with a random strategy would be 4/24 = 1/6. My score of 1 correct group is therefore not a bad score by comparison. Just nowhere good enough to keep me involved in the Opta million.