What Makes a Sports Metric Useful, Not Just Popular

A basketball coach giving instructions to players during a huddle, used to illustrate how metrics inform strategic decisions.

In 2007, John Hollinger’s Player Efficiency Rating sat at the top of nearly every NBA conversation that pretended to be analytical. PER got cited on broadcasts, anchored MVP arguments, and showed up in trade-deadline columns as if the decimal had finished the argument. By 2015, the same metric had been quietly demoted to a starting point at best and an embarrassment at worst. Nothing about PER itself had changed. What changed was that the field around it learned to ask harder questions, and PER stopped answering most of them.

That arc, from canonical to optional in under a decade, is the cleanest example of the gap between popular and useful in sports analytics. Popularity is what gets the metric on the screen. Usefulness is what keeps it in the argument once the next better tool arrives. Most public metrics that exist today will fail that second test. A few — EPA in football, true shooting in basketball, expected goals in soccer — have earned the right to stay in the conversation. The difference is not aesthetic. It is structural.

This piece is the editorial frame that sits underneath every metric-focused article on this site. It is the question we ask before we recommend a stat to a reader: not whether the number is impressive, but whether it survives the work it is being asked to do.

The popularity problem

Popularity in sports metrics moves on a different clock than usefulness. A new stat gets attention because it produces a single number that ranks players or teams cleanly. The ranking gets cited. The citation makes the ranking feel definitive. By the time the analytical community has worked out what the metric is actually measuring, the metric is already in a hundred articles, two podcast intros, and a graphic on Sunday Night Football. The reverse-engineering takes years.

This pattern is not new. Bill James spent the 1980s arguing against batting average as a primary hitting metric. He was right by about 1985. The popular version of the conversation needed another twenty years to catch up. The same lag exists today between, for example, the public obsession with on-court plus-minus and the careful version that adjusts for teammate quality and lineup context.

None of this is the fault of the metrics themselves. It is the fault of the speed at which sports media has to publish. A stat that produces a clean number is broadcast-ready. A stat that requires three paragraphs of caveats is not. The popular metric wins the screen. The useful metric wins the argument that happens after the show ends.

Three tests a useful sports metric has to survive

The metrics that age well share a structure. They do three things the popular ones often cannot. The framework below is the one we use internally before recommending any number to a reader.

Test one: stability. A useful metric stabilizes inside a meaningful sample. True shooting percentage starts looking like signal after a few hundred field-goal attempts. Expected goals stabilizes for a team across roughly twenty matches. Pressure rate in the NFL stabilizes for a pass-rusher across a season. A metric that swings wildly inside small samples is, by definition, not yet measuring anything you can argue from. Basketball Reference publishes stabilization curves for most of its core stats, and the careful reader checks them before quoting a number from twelve games.

Test two: falsifiability. A useful metric makes predictions you can check. EPA forecasts which offenses will outscore opponents over the rest of the season, and the forecasts can be graded. xG forecasts which teams will out-score opponents over a stretch of fixtures. RAPM forecasts how a player’s presence will affect his team’s net rating in samples large enough to mean something. A metric that cannot generate a falsifiable claim is not a metric. It is a label dressed as one.

Test three: disagreement. A useful metric occasionally disagrees with the obvious read of the game. This is the test most popular metrics fail. PER usually agreed with the conventional wisdom. So did fielding percentage in baseball. So does Player Impact Estimate in many NBA contexts. A metric that confirms what you already thought is decorative. A metric that tells you a player you assumed was struggling is actually generating possessions efficiently — and turns out to be right — is doing analytical work. The disagreement is the contribution.

How the best sports metrics earn their place

The table below maps the central metrics covered on this site against the three tests. None of them are perfect. All of them survive enough of the framework to be worth recommending.

MetricSportStabilityFalsifiabilityDisagreement
True shooting percentage (TS%)NBA / WNBAStrong: ~300 FGAStrong: efficiency predicts team offenseFrequently identifies efficient role players overlooked by box scores
Expected goals (xG)SoccerStrong for teams: ~20 matchesStrong: predicts league finish within 3 places ~75% of the timeRoutinely flags overperforming or underperforming sides before the table catches up
Expected Points Added (EPA)NFL / college footballStrong: ~250 playsStrong: forecasts offensive efficiency rest of seasonIdentifies process-strong offenses obscured by red-zone luck
DVOANFLStrong: stabilizes over half a seasonModerate: weekly forecasts beat market consensus narrowlyOften diverges from win-loss records when opponent-adjusted
RAPM and on/off splitsNBA / WNBAWeak under 2,000 possessionsModerate: lineup forecasts beat box-score-only modelsSurfaces defensive contributions box scores miss
Possession value models (xT, VAEP)SoccerModerate: full season requiredStrong: better team forecasts than xG aloneHighlights passing chains that produce threat without producing shots

Notice what is missing. PER does not appear, because it stabilizes well but rarely disagrees with the eye test and produces weaker predictive claims than alternatives. Raw plus-minus does not appear because it fails stability in any reasonable sample. Possession percentage in soccer is absent for the same reason. Each of those metrics has been useful at some point in the history of its sport. None of them earn their place in 2026 once the framework gets applied.

Where popular metrics break in real arguments

The three tests get teeth when you watch them fail in public coverage. A few recent cases are worth naming explicitly.

The first is the persistent use of raw plus-minus in NBA Twitter arguments. Plus-minus stabilizes only over many thousands of possessions and is heavily dependent on teammate quality. A backup forward who plays exclusively with the starters will post a plus-minus that looks superstar-adjacent and tells you almost nothing about him. The metric is popular because it is on every box score. It is useful only when paired with on/off splits, lineup minutes, and opponent context. Our garbage-time tax piece walks through how the same problem inflates team net rating.

The second is possession percentage in soccer. A team with 62% of the ball that loses 1-0 to a side that produced the better chances has not “controlled the game.” Possession is a territorial clue, not a quality clue. xG, field tilt, and progressive carries do the actual work. The popularity of possession survived because it produces a clean number that fits a graphic. The usefulness expired the moment public xG became reliable. The full unpacking lives in our possession trap piece.

The third is quarterback rating in football. The formula is opaque, the inputs are partial, and the metric was designed in 1973 against a passing environment that no longer exists. ESPN’s QBR and Football Outsiders’ DVOA-based passing measures have outperformed traditional QB rating in every public comparison published since 2012. The traditional rating persists because broadcasters say it on screen. The useful version requires more words and shows up in fewer graphics.

The fourth is fielding percentage in baseball, which has been irrelevant for analytical purposes since the early 2000s but still appears next to player names on telecast graphics. Defensive runs saved, ultimate zone rating, and Statcast’s outs above average all do better work. The popular metric survives by inertia. The useful ones live in a dropdown menu.

Why this matters for the rest of the site

The reason we run this framework before publishing is that recommending the wrong metric to a reader is, in the long run, worse than recommending none. A reader who learns to cite PER as if it were definitive gets older arguments built into newer ones. The job of an editorial site in this corner of sports media is partly to slow down the metabolism — to make sure the metrics our pieces lean on are the ones that survive five years of arguments, not the ones that win this week’s tweet.

For the broader frame on which all of this sits, our analytics primer covers the conceptual ground. For sport-specific applications of the framework above, the NBA advanced stats field guide, the expected goals explainer, and the EPA explained piece each apply the same structure inside a specific sport.

Frequently asked questions

Is “useful” subjective?

Less than it sounds. The three tests above are quantitative or near-quantitative. Stability is a sample-size question with published answers for most major metrics. Falsifiability is a prediction-grading question. Disagreement is verifiable by checking how often the metric overturns the conventional read and is later vindicated. The judgment lies in weighting those three when they conflict, not in whether they apply.

Does popularity ever indicate usefulness?

Sometimes. xG became popular because it survived scrutiny. EPA became popular for the same reason. The trap is treating popularity as a proxy. A new metric that goes viral has not yet been tested. The work happens in the months and years after the initial wave of citations. A metric still standing five years later, having survived analyst pushback and methodological critique, has earned its popularity. A metric that peaks in attention during its first season and disappears from serious writing two years later has not.

What about proprietary metrics from team analytics departments?

Proprietary metrics often pass all three tests internally but cannot be evaluated publicly. PFF, StatsBomb, and the in-house models at NBA front offices probably outperform their public-facing equivalents. The honest acknowledgment is that public analytics will always lag the proprietary frontier. The useful editorial move is to be explicit about which version of a metric is being cited and what limits the public version has.

How do I know when a metric has “earned” disagreement?

By tracking its predictive record on the cases where it disagreed with consensus. xG flagged Borussia Dortmund’s underperformance in 2023-24 a full season before they changed managers. RAPM has been ahead of public opinion on several role players who later signed major contracts. The disagreement counts only when it is followed by vindication. Otherwise the metric is just generating noise that happens to be contrarian.

The takeaway, in one paragraph

Popular metrics produce clean numbers. Useful metrics produce honest arguments. The three tests — stability, falsifiability, disagreement — are the cheapest way to tell the two apart before a citation makes the gap permanent. Most public metrics fail at least one. The ones that survive all three are the ones worth building a piece around. For the next layer of the editorial frame this site uses, our working primer on sports analytics covers the conceptual scaffolding that holds the framework above together.