A Terrible Sense of Humour

by Stuart Coles | Feb 1, 2023 | Latest News

Using Statistics to Enhance Sports Betting: A Football (Soccer) Perspective

Sports betting has become increasingly popular in recent years, with football (soccer) being one of the most widely-bet-on sports globally. However, many sports bettors don’t realize the significance of using statistics in their betting strategy. In this post, we’ll take a closer look at how statistics can be used to enhance sports betting, with a particular focus on football (soccer).

Statistics play a crucial role in sports betting as they allow bettors to make informed decisions based on past performance and trends. For example, in football (soccer), statistics can be used to track a team’s win-loss record, goals scored, and goals conceded. This information can then be used to predict the outcome of future matches and make informed betting decisions.

One of the most useful statistics for sports betting is the concept of expected goals (xG). Expected goals is a statistical model that uses data from past matches to estimate the probability of a goal being scored from a particular action (e.g. a shot). By comparing a team’s actual goals scored to their expected goals, bettors can gain insight into a team’s true performance level and make more accurate predictions.

Another important statistic for sports betting is the concept of head-to-head records. Head-to-head records show the past performance of teams when they have faced each other. By analyzing head-to-head records, bettors can identify trends and patterns that can be used to predict the outcome of future matches.

Conclusion: Statistics play a crucial role in sports betting, especially in football (soccer), where they can be used to track past performance, identify trends, and make informed decisions. As a provider of statistical services to clients who wish to gamble on sports, especially football, we are dedicated to helping our clients make the most of these mathematical tools to improve their chances of winning.

Ok, confession, in case you thought the quality of this blog had suddenly improved: I didn’t write any of that. In a previous post I mentioned that Benoit and Gavin both attempted to use ChatGPT to solve a puzzle I’d posted. So I thought I might go one better and ask ChatGPT to write a blog post for me. Specifically, I wrote:

I work for a company providing statistical services to clients who wish to gamble on sports, especially football (soccer). I also manage a statistics blog – that aims to be interesting, topical and entertaining – for the company employees and clients. Please write a post that I can include in the blog.

And what’s written above is the ChatGPT reply. I was pretty impressed, so I took the conversation a little further:

Tell me a bit more about xG

Here’s the reply…

Expected Goals (xG) is a statistical model that uses data from past matches to estimate the probability of a goal being scored from a particular action, such as a shot or a pass. It is based on the idea that not all shots are created equal, and that some shots are more likely to result in a goal than others.

The xG value for a particular shot is determined by analyzing a number of factors, such as the location of the shot, the angle of the shot, and the type of shot (e.g. header, volley, etc.). These factors are then used to calculate the likelihood of the shot resulting in a goal.

The xG value is usually expressed as a decimal, with 1.0 representing a 100% chance of a goal, and 0.0 representing a 0% chance of a goal. For example, if a team takes 10 shots in a match and the xG value for those shots is 6.0, it means that based on the quality of the shots, the team would have been expected to score 6 goals.

xG is useful for sports betting because it can help to identify teams that are underperforming or overperforming based on the quality of their shots. If a team has a high xG but is not scoring many goals, it may be an indication that the team is unlucky and due for an upturn in form. On the other hand, if a team has a low xG but is scoring many goals, it may be an indication that the team is overperforming and due for a downturn in form.

xG is also useful for analyzing a team’s performance over a period of time. By comparing a team’s xG to their actual goals scored, one can get an idea of whether a team is performing better or worse than expected. Additionally, by comparing the xG of different teams, one can get an idea of which teams are creating better quality chances than others.

I’m honestly not sure I could have written a better short summary about xG than that myself, though there is a numerical error in the explanation, which you can probably spot.

Remarkably, ChatGPT can do more than just chat, and, er, write blog posts. It can also write code. First I asked it to write a football prediction model in R:

Write code in R for a soccer prediction model

I’d be happy to help you write code for a soccer prediction model in R. However, keep in mind that the specific details of the model will depend on the data you have and the problem you are trying to solve. Here is an example of how you might use R to build a simple goal difference model for soccer predictions:

It’s basic, but:

It works;
It does what was asked;
The variables are sensibly named to make the code clear;
The code came with various disclaimers and suggestions for extensions that I’ve omitted;
The whole thing took about 3 seconds.

I find all that pretty remarkable, so again I tried to take things a little further:

Write a dynamic version of the Dixon-Coles soccer model in R:

It gave me this solution…

… together with this explanation:

This code creates a dataframe with match results, including the match date. It then creates a new column for match outcomes (home win, draw, away win). The glmer function from the lme4 package is then used to fit the dynamic model, with match date included as a random effect. The summary of the model can be viewed by calling the summary function. Please note that this is a simplified version of the Dixon-Coles model and you may need to consider other factors like home advantage, goal scoring rate, etc. Additionally, this is a model with a small sample size and may not be robust enough for practical use.

Now, to be honest, you might say that the core of the Dixon-Coles model is a Poisson distribution for home and away goals, which ChatGPT’s solution doesn’t contain. One might also quibble that this model is based on just 3 teams playing matches on 5 dates. Nonetheless, I didn’t tell ChatGPT that the Coles-Dixon model had anything to do with football. Moreover, the use of random effects for match dates to handle the dynamic element isn’t completely stupid. So though its a basic solution that doesn’t contain important elements of the Coles-Dixon model, ChatGPT clearly knows what the Dixon Coles model is, understands the issues of dynamic modelling and how to tackle them, knows how to formulate all this in R and can do it all in the space of a few seconds. With better prompting, it may also have produced a version of the model that was closer to being properly functional.

So, ChatGPT can be used to explain statistical techniques in a sporting context and can also provide code for fitting basic models. That’s one motivation for writing this post. But there’s another: Statistics is actually fundamental to the internal workings of the ChatGPT algorithm. Let’s see how…

Explain in detail how statistical methodology is relevant to chatGPT.

There are several ways in which statistical methodology is relevant to ChatGPT.

Data Preprocessing: Before training a language model like ChatGPT, a large corpus of text data is collected and preprocessed. This includes cleaning the data, tokenizing it, and converting it into a numerical format that the model can understand. Statistical methods are used to analyze the data, such as finding the frequency of words and the distribution of word lengths.
Language Modeling: ChatGPT is a language model, which means that it uses statistical methods to predict the next word in a sentence given the previous words. This is typically done using a variant of the transformer architecture, which uses a technique called “attention” to weigh the importance of different words in the input when making predictions.
Fine-Tuning: ChatGPT is pre-trained on a large corpus of text data, but it can also be fine-tuned on a smaller dataset specific to a certain task. This is done by adjusting the model’s parameters using techniques such as gradient descent, which is a statistical optimization method.
Evaluation: Once the model is trained, statistical methods are used to evaluate its performance. This includes metrics such as perplexity and BLEU score which are commonly used to evaluate the quality of generated text.
Inference: During the inference time, the model uses the statistical probability distribution of the words in the vocabulary, learned during the training, to predict the next word. The word with the highest probability is chosen as the output.

In summary, statistical methodology is crucial for the development and training of ChatGPT and other language models, from data preprocessing, to modeling, fine-tuning, evaluation and inference, it is used throughout the whole process.

Amazing. ChatGPT will write a blog post for me, explain concepts like xG that are relevant for sports analytics, write code for football models in R, including a dynamic version of the Dixon Coles model, and also explain how Statistics is integral to its own functioning. I might never need to write another blog post myself again.

It has a terrible sense of humour though:

Give me a joke about statistics.

Why did the statistician go to the beach? To estimate the mean sunburn time.

Stuart Coles

Author

I joined Smartodds in 2004, having previously been a lecturer of Statistics in universities in the UK and Italy. A famous quote about statistics is that “Statistics is the art of lying by means of figures”. In writing this blog I’m hoping to provide evidence that this is wrong.

← Previous Article Next Article →