Congratulations to Lewis Hamilton on his 5th world championship. He now equals the number of championship wins by Juan Manuel Fangio, but remains behind Michael Schumacher, who won 7.
I hadn’t planned to do a post on this, but got hooked by this article in the Guardian, which I recommend you read. It’s a kind of celebration of Lewis Hamilton’s achievement, but it’s also a critique of the way statistics are used when assessing performance in sports, summarised by this excerpt:
So, statistics are fun but they do not tell the whole story
The specific point being made by the author is that while you can use statistics to compare win rates and other measures of performance of racers from one era with those of another, the statistics themselves don’t take any account of changes in circumstances. In the case of Formula 1, that includes huge changes in levels of safety standards, as well as extraordinary technological improvements in the cars themselves. So, is Lewis Hamilton a better driver than either Michael Schumacher or Juan Manuel Fangio? And who was the better of those two? You can make an argument based on most statistics for any of them, but that simple approach fails to take the development of the sport into account. As the Guardian article explains about the statistics:
They do not describe the conditions in which Fangio raced, in death-trap cars on circuits lined with trees, ditches and houses, wearing highly flammable cotton shirts and trousers and eggshell helmets made of layers of linen soaked in shellac.
Similar arguments apply to other sports as well: Maradona or Lionel Messi?; Rod Laver or Roger Federer?; Jack Nicklaus or Tiger Woods? It’s easy to compare the statistics and pick a winner, but as with Formula 1, the statistics don’t take account of changes in circumstance, which can be massive in some cases.
Anyway, the point applies equally well to the data that go into our models. They are just that: data. Once reduced to a number all context disappears (other than the context that’s contained in other data). And though, over many fixtures/races you might hope that the variations in context balance out, so that it’s reasonable to rely on models that are driven entirely by data, that won’t always be the case.