So, Smartodds (still) loves Statistics.
Since you’ve reached this page, you probably know that Statistics is central to what Smartodds does. From exploratory analysis to inference and decision-making, a large toolbox of statistical techniques is used to turn data into relevant information that support Smartodds’ various activities. In recognition of this, prior to the pandemic, Smartodds hosted a statistics-based blog under the title ‘Smartodds loves Statistics’. The very first post in the blog was here, from which you can scroll down to read the entire history.
The aim of Smartodds loves Statistics was to promote the understanding and communication of Statistics among the various people connected to Smartodds. Though not explicitly sports-orientated – the majority of posts had no direct connection to sports – the ultimate aim was to provide knowledge to Smartodds employees and clients to help with their sports-related activities. It was intended to be light-hearted, topical, and wide-ranging in the topics it covered, though at the start of the Covid pandemic in 2020 it focused explicitly on pandemic-related issues. For various reasons it was paused later in 2020, and you can find the final post here.
But now, with the revamp of the Smartodds webpage, the blog itself is having its own mini-revival. No real changes: the aims and style of the blog will be exactly as before – though posts will now be made every month or so.
With all this in mind, the first couple of posts in this new edition will be related, touching on a number of themes that occurred regularly in the original version of the blog:
1. The ambiguities that arise when discussing chance and probability;
2. The difficulties we all have in calculating the chance of random events;
3. The relevance of these issues to sports markets.
This first part is unconnected to sport. The new year brought this touching story from the United States about twins who were born in California, either side of midnight on 31 December 2021. While Alfredo Antonio Trujillo was born just before midnight in 2021, his sister was born some 15 minutes later in 2022. So although this pair are twins, they were born in different years.
The article itself is pretty much free of hyperbole, but in a post on the Guardian
Instagram page – see the header photo above – the event is described as a ‘one in 2 million chance’. Which raises 2 questions:
1. What is the event that’s been described as having a 1 in 2 million chance?
2. Is 1 in 2 million an accurate assessment of the likelihood of such an event?
These questions are obviously inter-connected: the calculation of the chance will clearly depend on what event we are talking about. Is it the chance that a particular couple who are hoping to have a child, end up having twins who then happen to be born in different years? This would be an extremely unlikely event. Or are we talking about the chance of Alfredo Antonio and his sister Aylin Yolanda, whose mother was likely to go into labour towards the end of December, being born in different years? This is still unlikely, but very much less so, since it will have been entirely plausible that the twins were born around New Year’s Eve, and the births in separate years scenario becomes not at all surprising.
There are many similar ambiguities in a sporting context. When people know what I do as a job, they sometimes make a remark along the lines of how incredibly unlikely it was that Leicester City could have won the Premier League in 2015-16 and that ‘no statistician in the world’ could have predicted that. In a sense it’s true. Smartodds have models that will enable the calculation of the probability that any particular team wins a League in any season, and the chances of Leicester City winning in 2015-16 will have been very small indeed. But if the question is framed a bit differently – what’s the chance of a team like Leicester winning the Premier League in one season over, say, a 20-year period? – then it becomes a much less surprising occurrence. Naturally though, the question is usually posed about Leicester winning in the 2015-16 season because that’s the version of an underdog team winning the Premier League in the last 20 years that actually happened. Yet there are many other versions of this event that didn’t happen. And it’s dangerous to retrospectively identify the events that did happen, observe they have low probability, and conclude that something extremely unlikely occurred. That’s to say, there are many extremely unlikely events that could occur all the time, and it’s inevitable that some do now and again. So yes, the chances of Leicester winning the Premier League in 2015-16 were very small, but the chances of something like that happening to some team over the lifetime of the Premier League are nowhere near as small.
Anyway, returning to the Californian twins, I think the most reasonable interpretation of the event discussed in the article is this: ‘what’s the chance of a random pair of twins in the United States being born in different years?’. On that basis, we can set about answering the question. But before reading on, you might like to try to work out for yourself what that chance should be. Or at least, go with your intuition: is the one in 2 million assessment about right in this case, way too low or way too high?
|
|
|
|
|
|
|
|
|
Here’s one way to tackle the problem. We’ll need to make a few assumptions. The Trujillo twins were born 15 minutes apart. Some twins are born closer together than that, and others further apart. But to simplify things let’s assume all twins are born fifteen minutes apart. There’s also a seasonality in the timing of births – some periods of the year have more births than others, and some times of the day have higher birth rates than others. Again, to keep things simple, let’s ignore this aspect, and assume that births are equally likely to occur at any time and on any day throughout the year. Then, consider the time at which the first child is born. Given that the second child is born 15 minutes later, for the 2 children to be born in different years this first birth would have had to have occurred any time in the final 15 minutes of the year.
But what proportion of the year does this represent? There are 24 hours in each of 365 days per year. So 15 minutes represents 1/4 of 1/24 of 1/365 of a year (ignoring the small effect due to leap years). That amounts to 1/35040 of a year. And if births occur at a constant rate throughout the year, this means the chance that the older child is born in this portion of the year is precisely 1/35040. it follows that, subject to the assumptions made, the chance that two twins are born in separate years is around one in thirty-five thousand and four hundred. So, unlikely, but nothing like the ‘one in two million chance’ quoted in the instagram page. In fact one in 2 million is more than 100 times less likely than the value we’ve calculated here. This means either the event being discussed is different from what I’ve assumed, or the calculation of the chance is badly wrong. Either way, it’s poor journalism, but typical of the way Statistics is used in the media – and elsewhere – to grab attention.
As an aside, as the article points out, around 120,000 twins were born in the US in 2019. Let’s assume the count for 2021 was similar, meaning 60,000 pairs of twins. Then each of those pairs had the same 1/35040 chance of being born in different years. So, on average, we’d expect around 60,000/35040, or 1.7, pairs to be born in different years. In other words, the ‘1 in 2 million chance event’ actually becomes something we’d expect to see in the U.S. around 1 or 2 times each year on average and it would have been rather more surprising if there hadn’t been a pair of twins born somewhere there in different years at the end of 2021.
We’ll discuss the relevance of the issues in this post to matters concerning Smartodds in the next post, but in the meantime (tongue in cheek):
If the chances of twins in the U.S. being born in separate years are one in 2 million, what are the chances of a pair of twins in the U.K. being born in separate millennia?