When not to use Statistics

by | Sep 1, 2023 | Latest News

From a statistical point of view, a striking fact about the recent trail of Lucy Letby for the murder and attempted murder of multiple babies in her care is that neither the prosecution nor the defence presented statistical evidence to support their case. Given the circumstances, this is perhaps to be expected for the defence. But for the prosecution, it may seem surprising that the apparent implausibility of Letby being coincidentally involved in the care of so many babies who suffered catastrophic events wasn’t used as evidence of her guilt. I have no knowledge as to what the prosecution’s reasoning was, but one possible explanation is a historical case in the Netherlands with a number of similarities.

In 2003 a Netherlands court found Lucia de Berk, a paediatric nurse at the Juliana Children’s Hospital in Den Haag, guilty of the murder of 4 patients and the attempted murder of 3 others. A child had died while in her care and a search through staffing records pointed at a total of 10 suspicious incidents where patients  had either died or required resuscitation when de Berk was on duty. Police said the probability of such a sequence of events was one-in-7 billion. She was arrested on 13 December 2001 and labelled an “angel of death” in the newspapers.

During trial, an expert witness for the prosecution recalculated that the chance of de Berk being involved by chance when each of these events occurred was one-in-342 million. This was orders of magnitude lower than the one-in-7 billion originally stated by the police, but still such a small number as to imply almost certain guilt. And de Berk was subsequently convicted and sentenced to life in prison. She was not convicted for three other alleged murders since there was insufficient medical evidence to support the allegations in those incidents.

However, on appeal, de Berk’s conviction was extended to the additional three cases, with the court applying a “chain-evidencing” argument. The prosecution argued that there was strong medical evidence that proved de Berk had poisoned her patients in two of the cases. Then, although equally strong evidence was absent in the other cases, an implausibility of coincidence argument  – “chain evidencing” – was used to imply her guilt in those cases too.

But this line of reasoning requires two things: first, that the medical evidence for the two cases is as strong as it is claimed; second, that the probability of such events happening by chance is indeed vary small.

After several appeals, both aspects of this argument were shown to be faulty. First, the medical evidence for the two initial deaths was shown to be incorrect. In at least one of the cases, de Berk simply could not have been present at the time she was alleged to have poisoned the victim. More detailed medical examination also found that both deaths could very easily have been due to natural causes.  Second, alternative analysis by statisticians Richard D Gill and Piet Groeneboom shed considerable doubt on the one-in-342 million chance of de Berk being involved in so many fatal or near-fatal incidents by chance. By one argument they suggest the number could actually be as big as one-in-seven, though they settle on one-in-26 as a conservative estimate. Either way, their analyses provide a completely different interpretation to the plausibility of de Berk’s innocence when compared with the the one-in-342 million chance suggested by the prosecution.

There are a number of reasons why Gill and Groeneboom’s numbers differed so dramatically from those of the prosecution. Fundamentally, they argued that the rate of fatal incidents experienced by nurses is quite plausibly different from one to another. In statistical terminology, they argue that the rate of incidents is heterogenous across the nurses. They also present a number of arguments why this should be so. As an example, they say that it is “folk knowledge” that terminally ill patients tend to die preferentially on shifts of nurses with whom they feel more comfortable. Assuming that’s so, a more sympathetic nurse is likely to experience more deaths at work than a less sympathetic one.

Subsequently, Gill and Groeneboom suggested an alternative statistical model that allows for such differences between nurses. This leads to considerably different probability calculations from those based on the prosecution’s analysis. Although the average number of expected incidents remains the same, the variability around that number increases dramatically. As a consequence, the number of incidents coinciding with de Berk’s working pattern becomes far less implausible, even under the assumption that there was no foul play. In other words, once you accept the possibility that not all nurses are identical, it’s entirely plausible that any one nurse could have experienced as many patient deaths as de Berke did just by chance. This re-analysis of the statistical data, together with a weakening of the medical evidence, is what led to de Berk’s eventual acquittal on retrial.

Now, I have no idea whether or not the eventual pardoning of de Berke, partly on the grounds of an unreliable initial statistical analysis, was part of the reasoning for the prosecution in the Letby trial to not present statistical evidence, but the principles it revealed may well have been. The prosecution case therefore relied on each  count of murder and attempted murder being considered on its own merits, based on medical, forensic and circumstantial  evidence, without any attempt to link the cases via statistical association. This strategy most likely means that each of those individual convictions is safer in legal terms, but it  may also explain why Letby was found not guilty of a number of the other charges she faced. If a “chain evidencing” argument had been used by the prosecution, it may well have added weight to the evidence in those additional cases, perhaps to a sufficient level to have led the jury to find Letby guilty for those cases as well. The downside for such a strategy would have been that any potential weaknesses in the statistical analysis – as in the de Berke trial – would weaken the evidence for all of the charges.


 

Postscript: The Royal Statistical Society has recently written a report – “Healthcare Serial Killer or Coincidence?” – looking at precisely the statistical issues raised by the de Berke and Letby cases, among others. As they say in the report, events that one might think are implausible through coincidence happen all the time. It’s virtually impossible that you, as an individual, will win the lottery. Yet someone wins almost every week.

 

 

 

 

 

 

 

 

Stuart Coles

Stuart Coles

Author

I joined Smartodds in 2004, having previously been a lecturer of Statistics in universities in the UK and Italy. A famous quote about statistics is that “Statistics is the art of lying by means of figures”. In writing this blog I’m hoping to provide evidence that this is wrong.