Which Sports Stats Age Well — and Which Collapse Overnight

An outdoor baseball scoreboard with trees and fence behind it, used to illustrate how some sports metrics outlast the eras that produced them.

Pick up an NFL broadcast from 2009 and the on-screen quarterback graphic will lead with passer rating — that opaque, multi-input formula introduced in 1973 that scaled completion percentage, yards per attempt, touchdowns, and interceptions through a series of coefficients almost nobody could recite.

Pick up an NFL broadcast from 2025 and the same graphic position now displays QBR, EPA per dropback, completion percentage over expectation, or some combination. Passer rating still appears, but it has been demoted to a parenthetical. Nothing about its formula has changed. The world around it learned to ask harder questions, and the metric stopped answering most of them.

This is the quiet attrition that defines sports analytics. Most metrics that get popular eventually get retired or demoted. The ones that survive several decades of scrutiny are doing something the rest were not. Knowing the difference saves writers and fans from arguing in the language of yesterday’s consensus while the field has already moved on.

The piece below is the working version of that distinction. What makes a sports metric age well, which ones have done so, which ones collapsed, and the short workflow we use before recommending any newer metric for the long term.

Quick read: durability in 60 seconds

  • What ages well: Metrics that capture stable underlying skill, scale with sample size, and survive new data being added to the field.
  • What collapses: Metrics tied to era-specific play styles, opaque formulas, or single-input shortcuts that newer data exposes.
  • The classic survivors: Batting average on balls in play (BABIP), true shooting percentage, expected goals, EPA.
  • The biggest casualties: Passer rating, PER, raw plus-minus, traditional QB rating, fielding percentage.
  • How to spot a metric that will not last: Watch how often its rankings agree with newer, better-built metrics. Persistent disagreement is the early signal.

What “aging well” means for a sports metric

A metric ages well when, five or ten years after its introduction, it is still being cited by the careful analysts in its field. That is the survival test. Inside the test sit three more specific traits.

The first is methodological transparency. A metric whose formula is clear can be audited, criticized, and incrementally improved. A metric whose formula is opaque — passer rating is the canonical example — accumulates objections without a way to address them. Transparency does not guarantee survival. Opacity nearly guarantees demotion eventually.

The second is structural validity. The metric captures something real about the sport, not an accident of how the sport happened to be played in a particular era. Batting average on balls in play has survived for thirty years because the underlying phenomenon — variance in batted-ball outcomes — is permanent. Save percentage in hockey, by contrast, captures shots faced regardless of difficulty, which made sense when public shot-quality data did not exist and now reads as obviously incomplete.

The third is compatibility with new data. A metric that can absorb tracking data, biomechanics data, and modern public play-by-play sources without losing meaning ages better than one that was designed for the box score and cannot extend. The metrics that survived the 2010s tracking-data revolution were the ones whose architecture had room to update. The ones that did not had to be replaced.

Stats that have aged well

The table below lists metrics that, by 2026, are still being cited by the careful analytical writers and front offices in their sports. Each is roughly two decades old or older. Each has survived multiple waves of newer competing metrics.

MetricSportIntroducedWhy it lasted
BABIP (Batting Average on Balls in Play)MLB~1999, Voros McCrackenCleanly isolated luck from pitcher skill; framework survived modern shot-quality data
True shooting percentageNBA~2000s, Dean Oliver lineageSingle transparent formula; folds in threes and free throws honestly
Expected goals (xG)Soccer~2012-2014 public versionsCaptures shot quality independently of finishing variance
EPA (Expected Points Added)NFL~2006 academic, public ~2014Adds leverage to per-play evaluation; extends naturally with new data
DVOANFL2003, Football OutsidersOpponent-adjusted, situation-aware; methodology published openly
Pythagorean expectationMLB, NFL, NBA1980s, Bill JamesSimple, transparent, predicts season win totals from run/point differential
OPS+ and similar park/era adjustmentsMLB~1990sNormalizes for era and stadium effects; ages with the sport’s eras
Pace-adjusted scoring rates (per 100, per 90)NBA, soccer, hockey1980s onwardRemoves pace as a confounder; the framework everything else is built on

The pattern is striking. Every surviving metric on this list has a clear formula, isolates a specific signal from noise, and was built to be extended. None of them tried to do everything at once. The survivors are the ones that did one job cleanly.

Stats that collapsed overnight

The table below covers the opposite. Each entry was, at some point in the last twenty-five years, central to mainstream sports analysis. Each has since been retired or demoted to a parenthetical by the people who pay attention.

MetricSportPeak relevanceWhat killed it
Passer rating (traditional)NFL1980s-2000sOpaque formula tied to 1970s passing environment; outperformed by QBR and EPA
PER (Player Efficiency Rating)NBA2000s-2010sOverweighted volume scoring; rarely disagreed with eye test; replaced by BPM, RAPM
Raw plus-minusNBA2000s broadcastsDid not adjust for teammates or opponents; replaced by RAPM, on/off splits
Fielding percentageMLBPre-2000sCounted error frequency without measuring range; replaced by UZR, DRS, OAA
Save percentage (hockey, raw)NHL1990s-2010sCounted all shots equally regardless of difficulty; replaced by GSAx
Possession percentage (sole metric)Soccer2000s-2010sCounted ball control without territory or threat; complemented by xG, field tilt
Rushing yards per game (offense)NFL1970s-2000sDid not adjust for pace, score state, or opponent; supplanted by EPA on rushes
Total bases per gameMLB20th centuryBundled too many distinct skills; replaced by isolated power, wOBA, OPS+

The shared failure mode is the same as the shared success mode, just inverted. Each of these metrics tried to summarize something complex in a single number that papered over the inputs. As more granular data became available, the gap between what these metrics implied and what the better-built versions said grew too large to ignore.

Three signs a metric is about to be retired

If a metric is still in use but on its way out, the warning signs usually appear two or three years before the formal demotion. The signs below are the ones we watch for.

Sign one: persistent disagreement with newer, better-built metrics. When a player ranks 12th by PER but 25th by BPM and 28th by RAPM, and the pattern repeats across many players, the older metric is the outlier. Disagreement is not by itself fatal — sometimes the older metric is catching something the new one misses — but persistent, systematic disagreement is the first sign that the older metric is measuring an artifact of its formula rather than a stable trait.

Sign two: defenders shift from “it works” to “it’s still useful as a starting point.” Watch the language careful analysts use about a metric. When the framing changes from “the data shows” to “as a rough first pass” to “interesting historically,” the metric is being escorted toward retirement. The shift in framing usually precedes the formal demotion by one to two years.

Sign three: the official broadcast graphics stop leading with it. NFL broadcasts moved passer rating out of the lead QB graphic gradually across 2018-2024. NBA broadcasts moved PER out of the standard player breakdown by 2017. ESPN soccer broadcasts started leading with xG over possession percentage by 2020. Mainstream broadcast positioning lags analytical consensus by several years, so when the broadcast catches up, the demotion is already complete in the analytical community.

A decision framework for adopting new metrics

The table below is the workflow we use before recommending a newer metric to readers. The framework is borrowed from our field guide to sports analytics terms and refined to evaluate durability specifically.

Question to askWhat the answer reveals
Is the formula public and auditable?Opaque formulas rarely survive a generation; published ones can be improved
Does it isolate one specific signal cleanly?Single-purpose metrics age better than do-everything composites
Does it extend naturally with new data?Metrics built only on the box score struggle once tracking data arrives
Does it stabilize over reasonable samples?If it requires 5,000 observations to mean anything, broad adoption is unlikely
Does it occasionally disagree with consensus and turn out to be right?Vindicated disagreement is the strongest signal of analytical value
Are careful analysts still citing it 5+ years after it appeared?The single best predictor of further longevity
Does the metric’s name resist viral simplification?Metrics whose name fits a tweet often get used past their range

A metric that scores well on five or more of these is a candidate for the long-term toolkit. A metric that fails three or more is unlikely to outlast the next wave of public data.

What this means for the rest of the conversation

Most of the metrics that survive the durability test share a common discipline. They were built to answer one question well, not many questions ambiguously. They published their formulas, accepted critique, and incorporated improvements. They were used by analysts who named the limits of the metric inside the same piece that used it. These are habits, not formulas. The habits are what make any given metric ageable.

The framework also explains, retroactively, why some metrics that looked promising in the mid-2010s did not survive. RAPTOR, FiveThirtyEight’s blended NBA model, was excellent for several years and was effectively retired when FiveThirtyEight shut down its NBA coverage in 2023. The model itself was sound. The institutional support around it was not. Durability requires both the analytical structure and the editorial willingness to keep using and improving the metric. Either alone is not enough.

For the related conversation about how regression to the mean and small samples interact with metric design, our regression to the mean piece and the small samples piece are the natural companion reads.

The careful public sources for verifying any of the claims above are Basketball Reference for NBA history, FBref for soccer, and the Pro Football Reference ecosystem for football. Each publishes the methodology and historical adoption notes that let you audit the claims above without taking our word for it.

Frequently asked questions

Are all old metrics bad?

No. The metrics that survive multiple decades — pythagorean expectation, BABIP, pace-adjusted scoring rates, true shooting percentage — are old. Their age is part of the credential. The aging-well test is not “is it new” but “did it earn its place and keep earning it.” Plenty of old metrics still hold up. Plenty of new ones will not.

What about proprietary team metrics?

Most NBA and NFL front offices use internal models that probably outperform their public-facing equivalents. The aging-well framework applies internally too — proprietary metrics get retired and replaced just as often, the public just does not see the transition. The honest move when citing public metrics is to acknowledge that the proprietary frontier is ahead and that the public version is one or two generations behind.

How do I tell if a metric I learned years ago has been demoted?

Cross-check it against newer alternatives in the same sport. If your go-to NBA player evaluation is PER, look up the same players in BPM and EPM and see how the rankings compare. Persistent disagreement of three or more positions is the warning sign. The same check works for soccer (possession percentage vs xG and field tilt), football (passer rating vs QBR and EPA), and baseball (batting average vs wOBA and OPS+).

Is there a metric introduced in the last five years that has clearly aged well so far?

Expected goals on target (xGOT) in soccer is the cleanest candidate. It extends standard xG by valuing shot placement, which fills a known gap, and it has been adopted by FBref, StatsBomb, and Opta in slightly different forms. Five years is not a long enough horizon to declare permanent survival, but xGOT meets every test on the durability framework so far. Whether it survives the next decade depends on whether the next generation of public data renders the placement information redundant in some other way.

The takeaway, in one paragraph

Sports metrics age well when they do one job cleanly, publish their formula, and survive new data being added to the field. They collapse when their architecture cannot extend, their inputs were tied to an era’s play style, or their rankings stop matching the careful analytical consensus. The durability framework is not magic. It is a checklist that, applied honestly, prevents recommending the wrong metric to a reader who will still be quoting it three years from now. For the broader vocabulary this concept sits inside, our sports analytics field guide is the natural next read.