Fourth and one, ball on the Eagles’ 27, two minutes left in Super Bowl LVII, and Patrick Mahomes drops back, slides up, and sails the ball into the corner of the end zone where Skyy Moore drags one foot inbounds for a touchdown. The Chiefs lead by three. The stadium does what stadiums do in those moments. And somewhere, in a darker room with three browser tabs open, an analytics writer notes that the pre-snap expected points on that play were +2.4, the play itself generated +4.9 expected points added, and the decision to throw — against a defense that had stuffed the run on third-and-one earlier in the half — was, by every framework that exists, the right call.
That last paragraph is the strange marriage at the center of modern football analytics. The play happened. The play also has a number attached to it. The number does not replace the joy of the catch, the violence of the line of scrimmage, or the cold competence of a coordinator who knew, going in, that the Eagles’ three-receiver dime package was statistically vulnerable to in-breaking routes from the slot. The number describes, with a precision the broadcast cannot, exactly how much that single play moved the probability of a touchdown. It is called Expected Points Added — EPA — and it has, over the last fifteen years, quietly become the single most important measurement in NFL coverage.
I have been writing about football analytics since 2014, mostly for a rotation of team blogs and my own newsletter, and the metric I have spent the most time defending, explaining, and arguing about is the one most fans still do not really understand. Expected Points Added, where it came from, how it works, where it breaks, and how to use it without becoming the worst guy at the bar, is the subject of this article.
The origin: where EPA came from
The seeds of Expected Points Added were planted in academic football economics papers in the 1980s and early 1990s, the most-cited being Virgil Carter and Robert Machol’s 1971 study in Operations Research that tried to model what a yard line was actually “worth” in expected points. The idea was elegant and largely ignored. A first-and-ten at your own 20 has, on average, a different expected outcome than a first-and-ten at midfield. Quantify that difference, and you have a way of evaluating any play by its expected value.
The version most fans encounter today owes its public existence to Brian Burke, who launched Advanced NFL Stats in the mid-2000s and built what was, at the time, the most coherent public-facing EPA model in football. Burke’s contribution was less the math — the math had existed for thirty years — and more the insistence that the public should be reading the game this way. He published win probability charts after every game, ran EPA splits by team and quarterback, and translated decades of operations research into the kind of writing that broke through the talk-radio fog.
By the early 2010s, two parallel developments accelerated EPA’s adoption. The first was the public release of NFL play-by-play data at scale, eventually formalized in the nflfastR R package, which gave every public analyst a clean, free, reproducible dataset. The second was a generation of coaches — Doug Pederson, Sean McVay, Kyle Shanahan — who took the math seriously, internally, and started making decisions on fourth down that looked insane to a Saturday afternoon viewer but were, by EPA’s lights, obvious. Pederson’s Philadelphia Special, his 2018 Super Bowl trick-play call on fourth down, was the moment EPA went mainstream. Not because the play was an EPA play, exactly, but because the league finally understood that the decision to go for it was the entire story.
How it works: EPA in plain language
EPA, at its simplest, is the difference between the expected points before a play and the expected points after it. Every snap in football, given the down, distance, yard line, time remaining, and game state, has an expected point value — a number derived from twenty-plus years of historical NFL data that says, on average, how many points a team in that situation tends to score on the current drive.
Imagine a first-and-ten at the offense’s own 25-yard line. The historical expected points on that snap is roughly +0.4 — meaning, on average, a team in that spot ends up scoring about 0.4 points before they punt or turn over the ball. Now the offense throws a 30-yard completion. They are now first-and-ten at midfield. Expected points: +1.5. EPA on that play: +1.1. The offense improved their expected point output by 1.1 points on a single snap. Run that through every play of every game, and you have a possession-by-possession measurement of who actually moved the needle.
Sacks are massive negative EPA events. So are turnovers, especially in the red zone. So are penalties that move a team out of field-goal range. A six-yard run on third-and-five that converts the down is a small but real positive EPA play — about +0.7. A six-yard run on third-and-twelve that fails to convert is a strongly negative EPA play — closer to -1.6. The metric refuses to treat all yards the same, because the game itself does not.
The single most important thing to understand about EPA is that it is a per-play efficiency metric. It does not care how many plays a team runs. It cares about the average value generated per snap. That distinction is the difference between a stat that flatters teams who slow the game down and a stat that actually tells you whether they’re playing well.
The critical component: success rate alongside EPA
EPA on its own can be deceiving. A single explosive play — a 75-yard touchdown bomb on third-and-fifteen — generates a massive EPA spike that distorts a team’s per-play average for an entire game. A team can post a strong EPA per play number on the back of two big plays while otherwise grinding out three-and-outs.
The standard correction is to read EPA alongside success rate. Success rate is the percentage of plays that generate positive EPA — that is, plays that, by the model, improved the team’s expected scoring position. Read together, EPA per play and success rate tell you two different stories. A team with high EPA per play and low success rate is a feast-or-famine offense, surviving on chunks. A team with high EPA per play and high success rate is the genuinely scary one: they’re moving the ball consistently, not just occasionally.
The 2023 Detroit Lions were the textbook example of an offense that won by success rate. The 2024 Tampa Bay Buccaneers, in the back half of the season, were a feast-or-famine model. Same EPA per play, very different stories on tape. A writer who quotes only EPA, without the success rate alongside, is, in my experience, writing a less honest piece than they realize.

EPA vs the alternatives: a comparison
EPA is not the only advanced metric available to public NFL analysis, and the writers who use it well usually understand where it sits in the broader toolkit. A short comparison:
| Metric | What it measures | Where it shines | Where it breaks |
|---|---|---|---|
| EPA per play | Per-snap expected point change | Coach decisions, QB play, offensive efficiency | Sensitive to small samples and explosive plays |
| DVOA (Football Outsiders) | Opponent-adjusted efficiency vs league average | Season-long team comparison, opponent context | Black-box adjustments, not open-source |
| Success rate | Percentage of plays with positive EPA | Consistency over chunk plays | Treats all positive plays equally, masks magnitude |
| CPOE (Completion %% Over Expected) | QB-specific passing accuracy vs expectation | Quarterback evaluation | Doesn’t capture decision-making or sack avoidance |
| PFF grades | Subjective film-based player grading | Position-level evaluation, blocking, coverage | Proprietary, not reproducible from raw data |
The honest version of NFL analytics writing reads two or three of these in parallel, not just one. A quarterback can post an elite EPA per play while throwing checkdowns to running backs racing into open field — CPOE will catch that. A team’s DVOA might rank them third in the league while their EPA per play tells a quieter story about a soft schedule. Cross-checking is the practice.
What the data needs: inputs
EPA is built on play-by-play data. The minimum inputs for an EPA model are: down, distance, yard line, time remaining, and play outcome. Most public models add quarter, score differential, and timeout state. The most sophisticated proprietary models layer in personnel groupings, formation, and pre-snap motion.
The public version of EPA available through nflfastR uses play-by-play data going back to 1999 to train its model, which means the values are calibrated against twenty-plus years of NFL outcomes. That has implications. The expected point values reflect the league as it was when the data was collected — a league that has shifted dramatically toward more passing, more spread formations, more fourth-down aggression. Older models, trained on pre-2010 data, would under-value the modern passing game. The best public models retrain on a rolling window to account for this drift.
The dirty secret of public EPA is that the model itself is not standardized. Different analysts use different cutoffs for end-of-game garbage time. Different models use different smoothing techniques. Two writers quoting “EPA per play” on the same team’s offense can produce different numbers, depending on which version of nflfastR they pulled and which filters they applied. The difference is usually small, but on a single-game basis it can be misleading. Always check the source.
Building the analysis: a weekly workflow
I will not pretend the workflow most analytics writers use is glamorous, because it is not. It is some version of: pull the play-by-play data for the games that just happened, filter for the situations you care about, group by team or player, sort by the metric you’re using, and start asking why the leaders are the leaders.
A practical weekly workflow during the regular season:
- Pull data on Tuesday morning, after the Monday night game is in the books. nflfastR refreshes once the official play-by-play is posted.
- Calculate EPA per play and success rate on offense and defense, for every team, for the games that week and for the season to date.
- Filter by situation. Early downs (first and second) tell you about base offense. Third down tells you about situational execution. Red zone tells you about the part of the field where math gets distorted.
- Compare to the season-long baseline. A team that had a +0.15 EPA per play game when their season average is +0.05 had an above-baseline week. A team that posted +0.05 when their season average is +0.18 quietly under-performed.
- Cross-reference with the tape. The numbers will tell you which teams over- or underperformed. The film tells you why.
Steps four and five are where most weekly NFL coverage falls apart. The numbers without the film are a leaderboard. The film without the numbers is a hot take. The combination is journalism.
Where this gets weird: common mistakes
EPA, like every metric, has known failure modes. The writers who deploy it well usually name those failures out loud.
Garbage time pollution. A team trailing by 21 points in the fourth quarter is operating in a different statistical regime than a competitive game. Pass attempts in trailing-by-21 situations are highly likely to be intercepted, sacked, or thrown for low yardage, because the defense knows what’s coming. Some EPA filters explicitly remove “garbage time” — usually defined as a win probability gap outside the 5-95% band. Some don’t. Always check.
Small-sample volatility. A single game of EPA per play is not a season. A quarterback who posts +0.45 EPA per play in week one has not become Patrick Mahomes; they have had a good day against a defense that was missing their starting cornerback. Three or four games of stable data is a hint. Eight or ten is an argument. A full season is a verdict.
Offensive line confounds. EPA per play credited to a quarterback is, in reality, a joint product of the QB, the offensive line, the receivers, the play-caller, the opposing defense, and a hundred smaller variables. A QB who plays behind a strong line will look better than the same QB behind a porous one. Separating signal from environment is the hardest unsolved problem in public analytics.
Defensive EPA is undermeasured. Offensive EPA is mature. Defensive EPA is improving but still primarily measures outcomes — yards allowed, points allowed — rather than process. A defense that consistently forces opponents into low-percentage situations but gets unlucky on the conversion rate can look mediocre by EPA while actually playing very well.
When EPA shines: use cases
Where EPA has earned its keep in modern coverage:
Quarterback evaluation across teams. Counting stats are unfair when comparing a quarterback on a high-volume offense to one on a methodical, run-heavy team. EPA per play normalizes that. It is, in my experience, the single most honest QB stat in the public toolkit.
Coach decision-making, especially on fourth down. Every fourth-and-one decision has a calculable EPA differential between going for it and punting. The public models — Ben Baldwin’s 4th Down Calculator is the canonical one — have made it possible to evaluate, in real time, whether a coach is making analytically sound calls. The data has driven a real, measurable shift in NFL fourth-down behavior since 2018.
Offensive efficiency comparisons across eras. With the appropriate caveats about model drift, EPA per play is a reasonable way to ask whether the 2007 Patriots offense was actually as historic as it felt, or whether the 2023 49ers were operating at a level we have not seen before. The answers are sometimes surprising.
Identifying overperforming and underperforming teams. A team’s win-loss record is a function of EPA per play and turnover luck. A 6-3 team with a -0.05 EPA per play and a +6 turnover margin is, in all likelihood, going to regress. A 4-5 team with a +0.08 EPA per play and a -7 turnover margin is, in all likelihood, going to win some games down the stretch. The public models predict this kind of regression with surprising reliability.
The limits: what EPA cannot tell you
The honest version of EPA writing names the limits.
EPA cannot tell you who is going to win on Sunday. It can tell you who has been the better team in the games already played. The translation from one to the other is non-trivial. Football is high-variance, low-sample, and dominated by quarterback health, weather, and a hundred other variables that the model does not see.
EPA cannot tell you why a play worked. It can tell you that the play generated +1.4 expected points. The reason — the cornerback bit on the underneath route, the play action sucked the safety down, the right tackle finally won his rep against the edge rusher — lives on the film, not in the spreadsheet.
EPA cannot, on its own, evaluate individual players in non-quarterback positions reliably. The contributions of an offensive lineman, a slot receiver, or a tight end are real but largely invisible to a metric that credits the player who touches the ball. The combination of EPA with PFF grades and snap-by-snap charting is, currently, the best public toolkit for non-QB evaluation, and it is still imperfect.
EPA cannot replace watching the game. It supplements the watching. It tells you which moments to rewind. It tells you which decisions to interrogate. It does not tell you whether the game was beautiful, which is, in the end, why most of us watch football in the first place.
A working example: Super Bowl LVIII fourth-down decisions
The 2024 Super Bowl, Chiefs versus 49ers, produced a fourth-down case study that EPA writers were going to talk about for years. In overtime, leading by three with the ball at the San Francisco 47-yard line, Andy Reid faced fourth-and-one. The conventional decision — punt and play field position — had a calculable EPA. The aggressive decision — go for it — had a different one. The 4th Down Calculator, in real time, put the conversion attempt at +0.32 EPA over the punt. Reid went for it. Mahomes converted with a sneak. The Chiefs won the game three plays later. The decision, as graded by every public EPA tool, was correct before the play happened. The result, after the play, confirmed it. The conversation on Monday morning was, predictably, about Mahomes’s greatness and the Chiefs’s dynasty. The decision-math conversation was a quieter parallel track that most of mainstream coverage skipped.
That gap — between the decision and the result, between the math and the narrative — is where EPA does its most useful work. Every Sunday produces ten or fifteen comparable decisions that don’t make the highlight reel: a fourth-and-two near midfield in the third quarter of a tight divisional game, a punt-versus-go call at the opponent’s 35 in the second quarter of a blowout. EPA gives coaches, analysts, and fans a common vocabulary for evaluating those decisions in real time. The math has, slowly, changed the way fourth down is played in the NFL. Conversion rates are up across the league. Punts are down. The math worked, and the league responded.
One final note on the limits, less methodological and more practical. EPA, like every metric that becomes part of mainstream coverage, is increasingly used as a rhetorical weapon as much as an analytical tool. The number gets quoted to win arguments rather than understand games. A piece that quotes a quarterback’s EPA per play without naming the sample, the opponent quality, or the success rate alongside is, in 2026, sometimes participating in the cycle the metric was built to escape. The discipline is the same one analytics writers have been arguing for since the 2000s: use the number to teach, not to win.
Frequently asked questions
What is a good EPA per play number for an NFL offense?
League average EPA per play tends to hover near zero, by construction. An elite offense posts a season EPA per play of +0.15 or higher. The 2023 Detroit Lions, 2022 Buffalo Bills, and 2018 Kansas City Chiefs were in that territory. A poor offense lives in negative territory; the worst offenses in any season are around -0.15 EPA per play. The Super Bowl winner usually finishes top-five in offensive EPA per play, with a small number of exceptions where elite defense carried the team.
How does EPA differ from DVOA?
EPA is play-by-play, measuring the expected point change on each snap. DVOA is opponent-adjusted, comparing each play to a league baseline given the same situation against an average opponent. DVOA includes adjustments for strength of opponent that EPA does not, in its base form, include. The two metrics usually agree on which teams are good. They sometimes disagree at the margins, which is where the analytical conversation gets interesting.
Can I trust single-game EPA per play numbers?
Cautiously. A single game is a small sample, and EPA per play in a single game can be inflated by one or two explosive plays. Pair single-game EPA with success rate, and treat the result as a hint rather than a verdict. The numbers stabilize over three to four games.
Where can I see EPA data myself?
The most accessible public source is RBSDM.com, which presents nflfastR-derived EPA data in a clean, filterable interface. For raw data work, the nflfastR package itself is the standard. ESPN’s win probability charts are partially EPA-derived. Pro Football Focus has its own version of EPA-style metrics behind their paywall.
Sources and further reading
- Brian Burke’s Advanced NFL Stats archive — the foundational public-facing EPA work, now historical but still illuminating.
- nflfastR documentation — the open-source R package that powers most modern public EPA analysis.
- RBSDM.com — clean public-facing EPA leaderboards and filters.
- Brian Fremeau and Bill Connelly’s archive at Football Outsiders — the long-running advanced-stats brain trust that paved the way.
- Ben Baldwin’s 4th Down Calculator — real-time EPA-based decision evaluation, and the tool most quoted in Monday-morning coach criticism.
The play that opened this article — Mahomes to Moore, fourth-and-one, two minutes left — was a +4.9 EPA event. Whether you watched the catch as a Chiefs fan, an Eagles fan, or someone who happened to have the broadcast on for the commercials, the number describes what you felt. The metric is not a replacement for the moment. It is the only honest way of comparing this moment to the next one. For the broader frame on how to read sports numbers without becoming insufferable, our working primer on sports analytics is the natural next read.



