Bayer Leverkusen’s Unbeaten Season: Bayesian Skepticism in Football

A football team celebrating with a trophy under stadium lights.

May 18, 2024. BayArena, Leverkusen. Bayer 04 Leverkusen completes the German football season with a 2-1 win over Augsburg, finishing the Bundesliga campaign with a 28-6-0 record. Unbeaten. The first team in Bundesliga history to win the league without a single loss across 34 matches. The crowd has been celebrating since April. The retrospective narratives, in mainstream coverage, will focus on Xabi Alonso’s tactical brilliance, Florian Wirtz’s individual genius, and the team’s remarkable resilience in late-game situations. The xG ledger, sitting in StatsBomb’s data feed, tells a more layered story. Leverkusen’s underlying xG profile suggested they were one of the three or four best teams in Europe that season. The unbeaten run, on top of that, was an extreme positive result tail — a season in which the actual outcomes substantially exceeded what the underlying performance, season-long, would have produced on average. The team was excellent. The unbeaten record was also, in a real and measurable sense, lucky. Bayesian skepticism in football analytics is the framework for holding both claims at once.

Bayer Leverkusen’s 2023-24 season is one of the cleaner case studies in Bayesian thinking applied to football coverage. The team played at an elite level. The team also produced an outcome that, by every analytical model, was substantially more unlikely than the team’s underlying quality alone would have predicted. Holding both — the genuine excellence and the variance that turned excellence into history — is what Bayesian analytical writing tries to do. The alternative is the cycle’s preferred frame: “they were destined to do this,” or “the season was a fluke.” Neither is correct. The Bayesian frame says both contain pieces of truth, and the writer’s job is to keep both in view simultaneously.

I have been writing about football analytics from London since 2014, and the analytical posture that has shaped my reading of unusual seasons most measurably is the one this article is about. Bayesian skepticism in football — what it means, how it applies to extraordinary seasons like Leverkusen’s unbeaten run, where the framework breaks, and how to hold both excellence and luck in a single piece of writing, is the subject of this article.

The origin: where Bayesian thinking entered football

Bayesian statistics as a formal framework dates to Thomas Bayes’ 18th-century work, applied widely through 20th-century probability theory, and embedded in modern analytics through the work of researchers like Nate Silver (FiveThirtyEight), Aaron Brown (financial markets), and the broader probabilistic-modeling community. The football application emerged primarily in the 2010s as analysts began applying probabilistic updating to xG, possession value, and team-quality estimation.

The core Bayesian insight is that we should update our beliefs about a team’s true quality based on the evidence we observe, while accounting for our prior expectations and the inherent variance in any single result. A team that wins 28 of 34 matches has produced evidence about its quality. The Bayesian framework asks how much we should update from our prior expectation given that evidence, accounting for how much the result could be explained by variance versus by genuine quality.

The Bayesian application to specific extraordinary seasons emerged through the work of writers like Mark Carey at The Athletic, Jonathan Liew at The Guardian, and various analyst-bloggers. The framework has been particularly useful for evaluating unbeaten runs, miraculous comebacks, and other extreme tail outcomes that the cycle treats as “destiny” but that the underlying data treats as combinations of quality and variance.

How Bayesian skepticism in football works

The basic mechanic asks: given the underlying performance metrics (xG, possession value, defensive solidity) and the actual outcome (wins, losses, draws), how much should we update our belief about the team’s true quality?

For Leverkusen 2023-24 specifically: the team’s underlying xG profile suggested they were one of the three or four best teams in Europe. By a strictly performance-based projection, that profile would produce, in a 34-match Bundesliga season, an expected win-loss record in the 24-8-2 range. The actual outcome (28-6-0) was approximately 2-3 wins above what the underlying performance would predict, with the unbeaten-streak component being the most extreme departure.

The Bayesian update on Leverkusen’s true quality, given that outcome, is: yes, the team was elite. But the unbeaten record itself was a result of running positive on close matches that, in expectation, they would have lost 1-3 of. The team’s genuine quality justified the league title and the European cup runs. The exact unbeaten record was a tail outcome that the Bayesian framework treats as variance riding on quality, not as evidence of even higher quality than the xG profile suggested.

The critical component: the prior matters

The single most important conceptual feature of Bayesian thinking in football is that your starting expectation (the prior) shapes how strongly you update on the evidence. A team that, in August, had a prior expectation of being top-five in Europe (Manchester City, Real Madrid, etc.) producing an exceptional season is less surprising than a team with a prior expectation of being top-20 doing the same thing.

Leverkusen’s 2023-24 prior, in August, was probably “top-eight in Europe, top-three in the Bundesliga.” The xG performance during the season pushed them toward “top-five in Europe, clear best in the Bundesliga.” The unbeaten record was the variance on top of that updated estimate. The Bayesian frame separates the genuine update (the xG-driven re-estimation of their quality) from the variance-driven tail (the literal unbeaten streak).

A football trophy lit up under stadium lights
Extraordinary outcomes in football combine genuine excellence with measurable variance. Bayesian skepticism is the framework for holding both halves of the explanation together.

Bayesian vs alternative interpretations: a comparison

FrameHow it explains Leverkusen 2023-24What it gets rightWhat it misses
Destiny narrative“They were destined to win”Captures the team’s genuine qualityTreats variance as if it were skill; predicts continued dominance that doesn’t materialize
Fluke narrative“They got lucky”Captures that the unbeaten record exceeded expectationUnderrates the genuine elite-level performance
Tactical narrative“Xabi Alonso’s genius made it inevitable”Captures the coaching qualityConflates manager-system effects with team identity
Bayesian frame“Elite quality + favorable variance”Holds both pieces togetherLess narratively satisfying than the alternatives

The Bayesian frame is harder to write compellingly. The cycle’s alternatives — destiny or fluke — produce cleaner narratives. The Bayesian version requires holding nuance through 1,500 words of prose. The trade-off is between narrative clarity and analytical accuracy.

What the data needs

Bayesian football writing requires underlying performance metrics (xG, possession value, defensive xG against), outcome data, and prior expectations based on roster, manager, and historical context. The standard sources — StatsBomb, FBref, Opta — provide the performance data. The prior expectations come from preseason projections (which themselves are Bayesian-flavored, drawing on returning production, transfer activity, manager history).

Building the analysis

  1. Establish the preseason prior. What was the team expected to do based on roster and manager history?
  2. Pull the season’s underlying performance metrics. xG, possession value, defensive solidity, set-piece performance.
  3. Compare actual outcomes to expected outcomes. The gap is the variance component.
  4. Update the quality estimate based on the underlying performance, not the variance.
  5. Hold both pieces — quality and variance — in the writing.

Where this gets weird: common mistakes

The hindsight inflation trap. After a great season, it’s easy to retrospectively inflate the prior — “we always knew they’d do this.” Reading back the preseason coverage usually shows the prior was much lower.

Single-season conclusions. An extraordinary single season is one data point. The next season’s data updates the picture further. Leverkusen 2024-25 has, as of late season, produced numbers more consistent with the underlying quality than with the unbeaten outlier.

Variance vs quality conflation. The cycle wants to attribute extraordinary outcomes to extraordinary quality. The Bayesian frame says some outcomes are quality and some are variance. Failing to separate them is the most common analytical error in retrospective coverage.

Mathematical Bayesian without contextual grounding. Pure probability calculations can produce conclusions that ignore football-specific factors (injury history, scheme matchups, motivational factors). The framework works best when grounded in football context, not just probability math.

When Bayesian skepticism shines

Retrospective season writing. A team that has just produced an extraordinary season benefits from Bayesian framing that acknowledges both quality and variance. The framework produces writing that ages better than the destiny-or-fluke alternatives.

Predictive writing. A team coming off an extraordinary season should be projected based on their underlying quality, not their outcome. The Bayesian frame consistently produces more accurate next-season projections than the conventional “they’ll do it again” or “regression to the mean” framings.

Player evaluation. A player with one extraordinary season and three solid seasons is better evaluated through Bayesian aggregation than by either his peak or his average alone.

Cup-run analysis. Single-elimination knockout football is high-variance. A team that wins a cup is, partly, the team that ran positive on variance during the tournament. Bayesian framing separates the team’s genuine quality from the tournament’s variance.

A working example: Leicester 2015-16 vs Leverkusen 2023-24

Leicester’s 2015-16 Premier League title — the most extreme positive outcome in modern football — is the comparison case study. Leicester’s underlying xG profile that season suggested they were a mid-table team that should have finished 7th-9th. The actual outcome (champion) was an extreme tail event. The Bayesian frame says Leicester’s genuine quality was about 7th-place level; the title came from sustained variance positive that the underlying performance never justified.

Leverkusen 2023-24 is a different shape. Their underlying xG profile suggested a top-three Bundesliga team, possibly a top-five European team. The unbeaten record was variance positive on top of genuine elite quality. The Bayesian update on Leverkusen is much smaller than on Leicester — the team was genuinely elite, and the variance just polished the outcome.

Both seasons are extraordinary. The variance components are very different. The careful retrospective writing distinguishes them. The cycle’s alternative is to treat both as “fairy tales” or both as “destiny,” which is the wrong frame for either.

The limits

Bayesian skepticism cannot predict next-season outcomes precisely. Variance is variance; it can recur or not.

Bayesian framing cannot fully resolve the system-vs-personnel question. The manager’s contribution to the team’s quality is real but partially independent of the players.

Bayesian writing can feel deflating to fans of teams whose extraordinary seasons are partly attributed to variance. The framework is honest; honesty is sometimes unpopular.

One additional limit: the Bayesian framework depends on having reliable underlying performance metrics. xG and possession value models have known limitations; the Bayesian inference is only as good as the inputs.

FAQ

Was Leverkusen 2023-24 lucky or great?

Both. The team was genuinely one of the best in Europe that season. The unbeaten record was variance positive on top of that quality.

Will Leverkusen repeat?

The Bayesian projection says they should remain elite but the specific unbeaten outcome was unlikely to recur. The 2024-25 season has, as of late spring, produced numbers more consistent with elite-but-not-unbeaten quality.

How does this apply to other extraordinary seasons?

Any season that produces outcomes substantially above the underlying performance is a Bayesian case study. The framework helps separate quality from variance in coverage that ages better than narrative-driven alternatives.

Where can I see the underlying performance data?

StatsBomb (commercial), FBref (public), Opta-derived data via various commercial feeds. The Athletic’s football coverage frequently integrates these into Bayesian-style writing.

Sources and further reading

  • StatsBomb — the commercial provider for football performance data.
  • FBref — public underlying performance metrics.
  • The Athletic football coverage — Mark Carey, Tom Worville and others writing Bayesian-flavored retrospectives.
  • FiveThirtyEight — the historical home of Bayesian-style sports writing in English.

The Leverkusen unbeaten season — historic, genuine, partly variance — is the kind of outcome football analytics writing in 2026 is finally equipped to describe honestly. The team was elite. The record was a tail. Both are true. For the broader frame on reading extraordinary football seasons carefully, our guide to expected goals is the natural starting point.