Guess Who?

by | May 1, 2024 | Latest News

Guess Who? Part 1

A young boy grows up loving his local football club. He goes to as many matches as he can, home and away. In later life he builds a successful career for himself, which leads to a degree of financial security that affords him the opportunity to buy the struggling club he supports. The decision to do so wasn’t intended as a vanity project or for personal interest or ambition, but as a way of putting something back into the community and area he grew up in and loved.

At the same time, he cares strongly about the success of the club, and is knowledgeable about the potential for data-driven methods and Statistics to underpin both the football and wider management decisions at the club. Once the purchase of the club goes through, he sees about restructuring all aspects of the club so that decisions of every type are supported by evidence and backed by research.

This approach isn’t immediately successful. It’s one thing having a clear idea that the use of Statistics should be pivotal to club strategy, but another altogether getting the people involved in the club – players, managers, support staff – to share that vision. So following a run of respectable but far-from-spectacular results, the team manager is sacked and replaced with someone more amenable to implementing evidence-based methods.

Despite this change, the team still have a difficult season. Various statistical indicators suggest the team is still on a good trajectory, but the random nature of actual football matches means results still haven’t gone the team’s way. They’ve also been unlucky with long-term player absences.  Models put the probability of relegation at just 17%, but the potential cost of relegation and the pressure from fans to take short-term decisions to avoid that scenario are difficult factors to ignore. The situation is made worse by one extremely poor derby performance and result, occurring during a longer sequence of bad results.

Resisting the pressure to make changes, the club owner stood firm in his approach, placing total faith in research-based statistical methods to support the actions taken by the club and insisting that those around him are in tune with this philosophy. That’s not to say problems were ignored. But they were kept in perspective, and solutions for improvement were found that were consistent with what the underlying data were projecting.

In the end relegation was avoided, while the philosophy that the owner had instilled within the club led to stronger foundations for a potentially successful follow-up season.

Guess who?

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

So, congratulations to Jason Stockwood, the owner of Grimsby Town, who have just secured another season of football in League Two, having been promoted from the National League in the season 2021-2022.

There are differences between Grimsby’s story and those of other more illustrious teams using data-driven strategies at the heart of their decision-making processes, but there are many parallels as well. What makes Jason and Grimsby’s story especially interesting – and the reason for this part of the post  – is Jason’s openness in describing the journey he has taken with Grimsby through a series of articles written in the Guardian.

Here are a few examples:

Article 1: Jason explains his motivation for buying the club.

This investment seems like the perfect counterweight to the ownership model of the ‘Super League’ teams, a paradigmatic example of distorted capitalism where profit motives and dividend payments are regarded as the only measure of success. Football teams are viewed in high definition around the world but have become untethered from the communities that made them what they are today.

Article 2: Jason describes the data-led philosophy he adopted as owner of Grimsby and explains his motives for changing manager. This article is full of quotes that could equally have been made by other owners of clubs that leverage Statistics to inform their strategies. Here’s just one:

What this doesn’t mean is abstracted, emotionless decision‑making seen through the lens of a spreadsheet, a deadpan, robotic calculus devoid of intuition or sentiment. It is the commitment to having as much objective input as possible, an attempt to balance long‑term objectives with near‑term realties. In any game of chance it is wise to increase your probability of success with analysis and information. Football is clearly both an art and a science.

Article 3: Jason explains the turmoil of this past season, and why it helped to rely on Statistics to keep emotional aspects of the business under some sort of control.

While individual games are frustrating, disappointing and annoying it is important to have a model that looks at the aggregate of our own performances and those of others to ensure we are making balanced decisions amid the cacophony of our own opinions and emotions.

They’re a great read, especially if you have an interest in the strengths and limits of Statistics as a philosophy to underpin football management.

Guess Who? Part 2

 

 

 

As you probably know, Guess Who? is a  2-player kids’ game. Both players have an identical set of 26 cards, each showing a named character with a number of characteristics. For example, Daniel in the picture has light-brown skin, wears a hat and has a moustache (among other things). An online version of the game is available here.

You each choose one of the characters – you’ve chosen  Daniel in the picture – and you each have to guess your opponent’s chosen character which is hidden from you. To do so you take turns asking a series of questions with yes/no answers, whittling possibilities down on the basis of the received answer. the  For example, you might ask: “is your person a female?”. Or: “have they got long hair?”. And so on. The winner is first the player to correctly guess the name of their opponent’s chosen character.

The physical structure of the game is designed so that you can flip down the eliminated characters.  So, with your choice of Daniel, if you were asked if your character was female, you’d answer ‘no’ and your opponent could eliminate all of the remaining characters that were female by flipping the relevant cards down.

Question:

What strategy should you adopt to play ‘Guess Who?’ optimally?

I’ll come back here in a couple of weeks to add some discussion about this issue.

Optimal Play

So, how should you play ‘Guess Who?’ optimally?

There’s an easy answer to this problem and a much more difficult one.

The easy one assumes that you wish to minimise the average number of guesses it will take to identify the opponents hidden character. In this case, suppose there are n characters still available to choose from and you ask a question that separates the remaining sample into groups of size k and n-k. For example, there might be 10 characters left to choose from, of which 6 are male and 4 are female. In this case, n = 10 and the question ‘is the character female’ separates these 10 into groups of k = 6 and n – k = 4.

Now, after asking this question, you will either eliminate n – k = 4 characters – the females –  with this question if your opponent responds ‘yes’, or k = 6 characters – the males – if your opponent replies ‘no’. But assuming random selection of characters, so that your opponent has no prevalence towards having chosen a male or female character, the chances your opponent’s character is male is k/n, since there are n equally likely alternatives, of which k are male. Similarly, the probability it’s a female is (n – k)/n.

It follows that the expected, or average, number of cards remaining after you ask this question is

k × k/n  + (n-k) × (n-k) / n  

It’s then a very simple piece of mathematics to show that this is minimised when k = n/2. In other words, the expected number of remaining characters is minimised by asking a question which separates the remaining characters into two groups of equal size. In the above example, the question ‘is the character male’ is not optimal, since it creates a 60/40 split. Had there been 5 white characters and 5 black, the question ‘is the character black?’  would have been optimal.

Of course, this strategy can’t always be applied: if n is an odd number for example, it’s impossible to divide the remaining characters into equal-sized groups. In this case a division as close to 50-50 as possible is the best you can do. Moreover, regardless of whether n is  single characteristic that leads to a 50-50 split. In this case, one should get as close to 50-50 as possible; or better still, use combinations of characteristics in the question. For example, with 10 characters remaining it might be that just 5 are either male pr wear glasses. So you could ask ‘is your character male or wearing glasses?’, which would be exactly optimal.

The more difficult solution recognises that minimising the number of rounds required to guess correctly isn’t an optimal strategy for winning the game. Suppose your opponent has been lucky and is left with just a couple of characters to choose from, while you still have many. In that case your only chance of winning is to adopt a high risk strategy, asking a question which will probably leave you with many characters but has a small chance of allowing you to catch your opponent. This problem is analysed in an academic paper  “Optimal strategy in ‘Guess Who?’: beyond binary search”.

The mathematics is not for the faint-hearted. The conclusions, though, are simple enough. The player that goes first, or subsequently whichever player is winning, should adopt the 50-50 strategy described above; the player going second, or whichever player is currently losing, should adopt a riskier strategy, asking a question with a 60-40 or 80-20 split, say, with the level of risk depending on how much they are losing by.

If you can struggle through the maths in the paper, the optimal strategy is defined more precisely. If both players play this way, the probability that the player going first wins is around 0.65. Actually, the article goes a bit further than this, and proves that however many characters the players start with – rather than the 24 in the actual game – the win probability of the player who goes first if both players play optimally will always be between 5/8 and 2/3.

You might think, quite reasonably, that this is a pointlessly complicated piece of mathematics to solve the problem of how to play a game that is designed to be played by children as young as 6. The authors argue, however, that there are many real-world parallels of the game where it is necessary to design strategies that optimise some kind of risk-reward balance. The example they give is the race in the 1960’s and 70’s between the US and Soviet Union to get manned spacecraft on the moon. Once the US had successfully landed Apollo 8, the Soviet Union took high risks in their own programme in order to try to ‘catch-up’  before the US were due to launch Apollo 11. Maybe this is stretching things a bit, and it would be easier to come up with examples from a sporting context where managers and players have to decide between riskier and safer strategies depending on what their actual objectives are.

Stuart Coles

Stuart Coles

Author

I joined Smartodds in 2004, having previously been a lecturer of Statistics in universities in the UK and Italy. A famous quote about statistics is that “Statistics is the art of lying by means of figures”. In writing this blog I’m hoping to provide evidence that this is wrong.