What Analysts Mean by Regression to the Mean in Sports

An empty basketball court at dusk, used to illustrate the quiet statistical truth that performance tends to drift back toward its long-term average.

In November of one recent NBA season, a veteran shooter went on a stretch nobody who watched closely thought was sustainable. He hit 47% of his threes across fifteen games on respectable volume, climbed into the league’s top ten in efficiency, and made the rounds in analytical Twitter as a case study in “finally figuring it out.” By February, he was at 34%. By April, his season-end three-point percentage had drifted back to his career mark, almost to the decimal. The stretch was real. The conclusion drawn from it was the kind of mistake that has its own name.

The name is regression to the mean, and it is the single most quietly misunderstood concept in sports analytics. People hear it and assume it is a verdict, a punishment, or a bet against the player. None of that is true. Regression to the mean is a description of how outcomes work when there is variance involved. The math does not care about the player. The math is just the math.

The piece below is the working version of the concept. What it actually means, where it shows up across sports, the myths that get repeated about it, and the short workflow we use before saying “this is real” or “this is going to regress.”

Quick read: regression to the mean in 60 seconds

  • What it is: Performance that is unusually high or low in a small sample tends to drift back toward the long-term average over a larger sample.
  • Why it happens: Most performance metrics contain a mix of skill and luck. The extreme stretches usually had above-average luck on top of underlying skill.
  • Where it applies: Anywhere variance exists — hot shooting in NBA, finishing rates in soccer, turnover differential in NFL, BABIP in baseball.
  • Where it does not: Genuine talent shifts (rookie improvement, role change, new scheme) are not regression. They are change.
  • How to use it: As a prior, not a verdict. The longer the unusual stretch holds, the more skill explanation gets weight.

The math, in plain language

Almost every performance metric in sports is a mix of two things: the underlying skill of the player or team, and the luck that surrounded the games being measured. A shooter who makes 38% of his threes across a career is not making exactly 38% in every fifteen-game stretch. Sometimes the same shooter goes 47% for fifteen games. Sometimes 28%. The career number is the steady-state average. The shorter stretches are samples drawn from that average, with variance added on top.

Regression to the mean is the observation that, when you measure performance again over a fresh sample, the extreme stretches tend to come back toward the long-term average. The shooter who hit 47% across fifteen games will almost certainly not hit 47% across the next fifteen. The shooter who went 28% across the same span will almost certainly not stay at 28%. Both stretches were real. Neither will repeat at the same rate. The underlying skill reasserts itself.

The term comes from Francis Galton’s 1886 work on heredity in human height, which is the original cited example. Galton noticed that the tallest fathers tended to have sons who were tall but slightly shorter than them, and the shortest fathers tended to have sons who were short but slightly taller. The trait regressed toward the population mean across generations. The same principle, applied to sports, explains why nearly every “in the best form of his career” piece looks embarrassing six months later.

Where regression to the mean shows up across sports

The table below is the version of the concept we keep in mind when reading any hot-streak or cold-stretch coverage. Each row is a metric where regression is reliably observable, and where ignoring it produces a particular kind of bad analysis.

SportMetric where it shows upTypical hot-streak storyWhat usually happens later
NBAThree-point percentagePlayer shoots 45%+ for a monthDrifts back within 3-4 percentage points of career average
NBATeam net rating in short windowsBench unit posts +12 in 200 minutesSettles within 4-5 points over the next 600 minutes
SoccerGoals vs xG (finishing)Striker scores 11 from 6 xG in 10 matchesReturns to roughly the underlying xG rate over 20+ matches
SoccerTeam points vs xG-based expected pointsMid-table side outperforms xG by 6 pointsCloses the gap by season end in most cases
NFLTurnover differentialTeam forces 12 turnovers in 4 gamesReverts toward league-average turnover rate
NFLThird-down conversion rateOffense converts 56% over 5 gamesDrifts toward ~40% league baseline
MLBBABIP (batting average on balls in play)Hitter posts .380 BABIP for a monthReturns toward ~.300 league baseline
WNBAHot shooting stretchesWing shoots 44% from three across a 10-game spanSettles near career average over the season

Notice what each row has in common. The hot-streak version produces the article. The regression version produces the embarrassed follow-up that almost nobody writes. A useful analytical site is one that names the regression risk inside the original piece, instead of waiting for the correction.

The five myths people repeat about regression

The misuses are as common as the uses. Each of these gets repeated weekly in sports media, and each is wrong in a specific way.

Myth one: “regression to the mean is just a way of rooting against players.” It is not. It is a description of how variance behaves. The same math applies to bad stretches regressing upward as to good stretches regressing downward. The framework is not partisan. It is arithmetic.

Myth two: “if it regresses, it must not have been real.” The hot streak was real. The 47% three-point shooting happened. The variance just got there before the skill could push back. The streak being real and the regression being real are not in conflict. Both are true.

Myth three: “once you regress, you stay regressed.” No. Regression is not a one-way trip. A player who regresses from 47% down to 34% across a stretch is just as likely to regress upward from 34% to his real talent level after that. The mean is the destination, not the bottom.

Myth four: “regression doesn’t apply to elite players.” It does. The mean of an elite player is higher, so the regression target is higher. Stephen Curry shooting 50% from three across a month will probably regress toward his career mean (around 42%), not toward league average. The principle applies. The destination is different.

Myth five: “any stretch of performance is a small sample and will regress.” Wrong in the other direction. Some stretches reflect real underlying change — a rookie improving, a player taking a new role, a team installing a new scheme. Those are not regression candidates. They are genuine signal. Distinguishing the two is the actual analytical skill.

A decision framework: when to invoke regression honestly

The table below is the workflow we run before writing “this is going to regress” in a piece. Each row gives the kind of stretch, the regression likelihood, and the sample window over which the regression usually plays out.

Pattern observedRegression likelihoodTypical regression windowWhat to write
Career-average shooter going hot (15 games)Very high20-30 games“The math suggests this drifts back; watch the next ten games”
Striker overperforming xG by 5+ goalsVery high10-15 additional matches“The underlying numbers do not support this finishing rate”
Team forcing turnovers far above baselineHighRest of season“Turnover differential is unstable and likely to settle”
Rookie improving across consecutive monthsLown/a“This looks like genuine development, not variance”
Team net rating shift after coaching changeLown/a“The change in process suggests this is real”
Player taking on new role with new shot dietLow to moderateDepends on role fit“Watch the shot quality, not just the percentage”
Mid-table soccer team beating xG over 8 matchesHigh10-15 matches“Expect the points-per-xG ratio to revert”
Elite player on a cold stretch (under his career averages)Very high (upward)15-20 games“This will bounce back; the skill profile has not changed”

The pattern is that regression is most reliable when the underlying skill profile is stable and the surface results are extreme. It is least reliable when something structural has changed — coaching, role, age, injury recovery. The framework is not asking “will this revert.” It is asking “is there a reason this stretch should not revert.”

Where regression to the mean does not apply

The most common analytical error involving regression is invoking it when something has actually changed. Rookies who break out are not regression candidates the same way veterans on hot stretches are. Players who switch roles inside an offense are not measured against their old shot diet’s baseline. Teams that change coordinators are not measured against the previous scheme’s outputs.

The distinction is between two kinds of unusual performance. The first kind is the same person doing the same thing better or worse than usual; that is the regression candidate. The second kind is the person doing something different; that needs a new baseline, not the old one. A young player whose minutes doubled and whose role shifted from spot-up to primary creator is not regressing if his efficiency drops. He is doing a new job, and the new job has its own statistical equilibrium that the data has not yet stabilized around.

This is also where survivorship bias quietly enters the conversation. The players who post breakouts and never regress are the ones we remember. The much larger group of players who post similar-looking breakouts and regress hard get forgotten. Reading the regression literature without that correction overweights the dramatic non-regression cases. For the broader frame on how the vocabulary of analytics works (including how regression fits next to terms like xG, EPA, and BPM), our field guide to sports analytics terms is the natural companion read.

Frequently asked questions

How long does regression usually take?

It depends on the metric and the sport. NBA three-point percentage regression usually plays out across 20-30 games for individual shooters. Soccer xG-vs-goals regression for strikers tends to need 15-20 additional matches. NFL turnover differential regression typically requires most of a remaining season to play out fully. The general rule: the more variance a metric carries per observation, the longer the regression window. Public sources like Basketball Reference and FBref publish stabilization notes for most of the core metrics.

Can regression be partly avoided through skill?

The mean itself can shift through genuine improvement, which is not regression. Adding skill raises the long-term average and gives the regression a higher floor. But within a given skill level, the variance around the mean is mathematical, not optional. A shooter cannot will himself out of cold stretches any more than he can will himself into hot ones. Both are part of operating around an underlying skill level that, in turn, can shift.

Why does regression seem to “happen faster” for some players?

Sample size mostly. A player taking 25 attempts per game stabilizes his shooting numbers faster than a player taking 5 per game. High-volume players hit their underlying averages quicker because the regression has more observations to work with each week. Low-volume players can sustain weird-looking numbers for longer simply because each new sample is small. The math is doing the same thing in both cases. The pace is different.

Is “regression to the mean” the same as “the mean reverting”?

Close, but not identical. Mean reversion in finance and economics specifically describes prices or asset values cycling around a long-term equilibrium, often with a known driver. Regression to the mean in sports is a more general statistical observation that does not require any mechanism — just variance around a stable underlying skill. The two terms get used interchangeably in casual writing. Statisticians draw a sharper line.

The takeaway, in one paragraph

Regression to the mean is not a punishment or a prediction. It is the quiet observation that variance, given enough samples, gets undone by skill. Hot streaks cool. Cold stretches warm. The career averages do most of the work in the long run. The discipline is asking, inside every hot-streak piece, whether the stretch should keep happening — and writing the regression risk into the original article instead of waiting for a follow-up that nobody publishes. For the broader vocabulary this concept lives inside, our sports analytics field guide is where most readers will want to start.