February 4, 2018. Fourth-and-goal from the one-yard line, late second quarter of Super Bowl LII. Philadelphia leads New England 15-12. The conventional decision — kick the field goal, settle for the safe four-point lead at the half — has been the orthodox call for sixty years of professional football. Doug Pederson, working off a single laminated card his analytics staff had handed him before the game, instead signals for the offense to stay on the field. The play that follows — backup tight end Trey Burton taking a direct snap, throwing across the field to a quarterback wide open in the end zone — is the Philly Special, and it ends with Nick Foles catching a touchdown that no one in NFL history had ever caught from a Super Bowl quarterback. The Eagles win the championship. The chalkboard wins the era. The play’s pre-snap expected points added, by every public model that existed at the time, had favored going for it by a margin of more than four points. The math had said this for years. Pederson was, on the night, the first head coach to fully trust it on the biggest stage.
The fourth-down revolution in the NFL is the cleanest case study in modern sports of an analytical recommendation becoming, slowly and grudgingly, the new operating consensus. Twenty years ago, the league’s conventional wisdom on fourth down was almost uniformly conservative: kick the field goal when in range, punt almost everywhere else, go for it only in late-game desperation. Twenty years ago, every public model — Brian Burke’s Advanced NFL Stats, the Football Outsiders DVOA team, an academic literature stretching back to the 1970s — had been saying the same thing: that NFL coaches were dramatically too conservative on fourth down, that the math overwhelmingly favored more aggressive decisions, and that the league was leaving points on the field every Sunday. The gap between what the math said and what the coaches did was, for most of the 2000s and 2010s, the most-quoted analytical fact in football. It is also, in 2026, the gap that the league has — slowly, partially, with measurable conviction — actually closed.
I have been writing about football analytics since 2018, and the single trend in NFL coaching that has shaped public coverage most measurably is the one this article is going to unpack. The fourth-down revolution — how analytics rewired coaching decisions, what the math actually said, where the resistance came from, and where the math still has work to do, is the subject of this article.
The origin: where fourth-down math came from
The first serious academic work on NFL fourth-down decision-making dates to a 2006 paper by economist David Romer, titled “Do Firms Maximize? Evidence from Professional Football.” Romer’s argument, dressed in economics-journal language, was straightforward: NFL coaches systematically failed to maximize their expected point output, particularly on fourth down. The data showed coaches kicking field goals from situations where going for it produced higher expected points, and punting from situations where going for it or kicking a field goal both produced higher expected points than the punt. The pattern held across hundreds of fourth-down situations, across multiple seasons of data. The conclusion was that NFL coaches were not rational expected-value maximizers; they were loss-averse decision-makers operating under the social pressure of conservative orthodoxy.
Romer’s paper did not, immediately, change much. NFL head coaches in 2006 were not reading the American Economic Review. The translation work happened over the next decade, primarily through Brian Burke at Advanced NFL Stats and the Football Outsiders team. Burke published an interactive fourth-down decision tool in 2009 that, for the first time, gave fans and writers a public-facing way to evaluate any specific fourth-down situation against the model’s recommendation. The tool was widely cited. The coaches mostly ignored it.
The 2018 Eagles’ Super Bowl run, and Pederson’s increasingly aggressive use of fourth-down conversions throughout that postseason, was the moment the public-facing arguments started arriving in actual game decisions. Within two years, multiple head coaches — Andy Reid, Sean McVay, Kyle Shanahan, Brian Daboll — had visibly increased their fourth-down go rates relative to historical baselines. By 2022, ESPN was using Ben Baldwin’s 4th Down Calculator on broadcasts to grade decisions in real time. By 2024, the league’s average go-for-it rate on fourth-and-short situations had roughly doubled relative to the 2010-2015 baseline. The math had won, mostly, in roughly fifteen years.
How fourth-down math works: in plain language
The core calculation compares the expected points (or expected win probability) of three options: going for it, kicking a field goal, or punting. Each option has a calculable expected value based on historical NFL data for similar situations.
For going for it, the calculation requires two probabilities: the conversion success rate (which varies by yards-to-go) and the expected points conditional on conversion or failure. A fourth-and-one in the opponent’s territory has a historical conversion rate of about 67%. If converted, the team retains possession with first-and-ten, generating expected points of approximately +1.8 from the new position. If unconverted, the opponent takes possession at the spot, with expected points for them of about +1.1. The net expected points of going for it is therefore (0.67 × 1.8) – (0.33 × 1.1) ≈ +0.84.
For the field goal, the expected value depends on the distance. A 40-yard field goal in modern conditions converts at about 88%; if made, it’s worth 3 points minus the opposing team’s expected return value of about 0.5 points after the ensuing kickoff. If missed, the opponent takes over at the spot of the kick, with their expected points being substantial. The net expected value is roughly +2.0 in this scenario.
For the punt, the expected value depends on field position and punter quality. A 45-yard punt from the opponent’s 40-yard line is, mechanically, not viable (the ball would go through the end zone for a touchback). A 35-yard punt from midfield is, depending on coverage, worth roughly +0.3 net expected points.
The decision: take the option with the highest expected value. In the example above, going for it (+0.84) beats both the field goal (+2.0 in a different field position context) and a punt that isn’t physically possible. The math says go.
The single most important insight of the framework is that field-goal value is highly position-dependent and is overweighted by the human eye. A 50-yard field goal feels safer than a fourth-and-one conversion attempt. Statistically, in many specific cases, the conversion attempt is the better expected-value play, especially when the field-goal kicker is below league-average from the relevant distance.
The critical component: win probability vs expected points
The fourth-down math has two parallel framings that occasionally produce different recommendations. The expected-points framework asks which decision generates the highest expected point output. The win-probability framework asks which decision maximizes the probability of winning the game.
Most of the time, the two frameworks agree. But in high-leverage end-of-game situations, they diverge meaningfully. Late in a one-score game, a team’s win probability becomes much more sensitive to specific game-state factors — time remaining, timeouts, the opposing offense’s strength. A decision that maximizes expected points (say, kicking a field goal to extend a lead from one to four points with three minutes left) may slightly decrease win probability (because the alternative, going for it to potentially clinch the game with a touchdown, has a higher probability-of-victory profile even if its expected-point output is slightly lower).
Ben Baldwin’s 4th Down Calculator, the leading public-facing tool, defaults to win-probability optimization, which is generally considered the more sophisticated framing. The expected-points framework is easier to explain and remains the entry-level analytical argument. The win-probability extensions are where the analytical conversation has matured.

Fourth-down decision tools vs the alternatives: a comparison
The major public approaches to fourth-down decision-making:
| Approach | What it does | Where it shines | Where it breaks |
|---|---|---|---|
| Ben Baldwin’s 4th Down Calculator | Real-time win-probability optimization | Public-facing tool; widely cited | Limited situational tailoring (specific QB strength, etc.) |
| EdjAnalytics decision tool | Win-probability with proprietary adjustments | Used inside multiple NFL clubs | Proprietary; less public transparency |
| NFL Game Charge | League-published in-broadcast decision grading | Mainstream coverage integration | Smoothed for general audience; less granular |
| Romer-style EP framework | Expected-points-based recommendation | Conceptual clarity, academic origin | Misses end-game leverage effects |
| Old-school coaching gut | Pattern-matched intuition from years of football | Captures some scheme/personnel context | Systematically too conservative; outperformed by models |
The reality of fourth-down coaching in 2026 is some hybrid of the calculator output and the coaching gut. The hybrid is, in my opinion, the right approach — the calculator captures the broad probability landscape, while the coach captures the specific personnel and matchup considerations that the model doesn’t see. The hybrid coaches (Andy Reid, Sean McVay, Kyle Shanahan, Mike Vrabel) have been the most consistently aggressive fourth-down decision-makers of the modern era.
What the data needs: inputs
The fourth-down decision model requires several layers of historical data. The minimum inputs are play-by-play data for the full universe of fourth-down situations (1999 to present), field-goal accuracy curves by distance for league-average and specific kickers, punt result distributions by field position and punter quality, and conversion success rates by yards-to-go.
The leading public data source is nflfastR, which provides the play-by-play data and most of the contextual variables. Ben Baldwin’s calculator uses nflfastR-derived models, supplemented with his own win-probability framework. Commercial alternatives — PFF’s decision-grading tool, EdjAnalytics’ proprietary version — use similar data with additional weight on specific play personnel.
The harder inputs to acquire are the situational adjustments: a quarterback’s specific short-yardage conversion rate (which can differ from league average by 5-10 percentage points for elite vs poor short-yardage QBs), an opposing defense’s specific stopping rate (which similarly varies), and weather/stadium effects on field-goal accuracy. The fullest models incorporate these; the public versions usually do not, which is part of why the calculator’s recommendations occasionally diverge from optimal coaching decisions when context matters.
Building the analysis: a working framework
The practical workflow for evaluating fourth-down decisions in writing:
- Identify the situation: down, distance, yard line, score, time remaining, timeouts.
- Pull the model’s recommendation from the 4th Down Calculator or equivalent. Note both the win-probability-optimal and expected-points-optimal calls — they sometimes differ.
- Compare to what the coach actually did. A coach who went for it when the model recommended a field goal is a different kind of analytical story than one who punted when the model recommended going for it.
- Adjust for specific context: quarterback short-yardage history, opposing defense’s recent fourth-down stopping rate, weather, kicker reliability. The adjustments are usually small individually but can compound.
- Write the piece around the decision, not the outcome. A coach who made an analytically-sound decision that failed is not a wrong call; the variance just hit. The piece should evaluate the decision based on what was knowable in advance.
Where this gets weird: common mistakes
The pitfalls of fourth-down writing.
Outcome bias. A coach who went for it on fourth-and-one and failed gets blamed; the same coach who succeeded gets credit. The decision is the same in both cases. The model’s expected-value calculation didn’t change based on the result. Outcome-biased coverage is the most common failure mode in fourth-down analysis.
Ignoring specific context. The model’s general recommendation is built on league-average inputs. A team with an elite short-yardage running game (peak Bills, peak 49ers) has higher conversion rates than league average and should go for it more aggressively than the calculator suggests. A team with a struggling short-yardage offense should be slightly more conservative. The careful analysis names these adjustments.
Garbage-time pollution. A coach trailing by 17 with five minutes left makes fourth-down decisions that are essentially forced. Evaluating those decisions against the standard win-probability model can produce misleading conclusions. Filter for competitive game-state when possible.
Treating the calculator as gospel. The model is a strong baseline. It is not infallible. Specific personnel matchups, scheme mismatches, and momentum considerations (which are partially measurable but the model usually doesn’t capture them well) can shift the optimal decision in either direction. A coach who deviates from the calculator’s recommendation with reasonable justification is making analytical judgment calls, not ignoring the math.
The narrative cycle around aggressive coaches. A coach who goes for it often gets credit when it works and disproportionate blame when it doesn’t. The pattern produces media incentives for coaches to be slightly less aggressive than the math recommends, because the asymmetric scrutiny is real. Some of the most analytically sophisticated coaches have explicitly discussed this trade-off in press conferences.
When fourth-down math shines: use cases
The applications where the framework has earned its keep:
Coach evaluation across seasons. A coach’s fourth-down decision-making, evaluated against the calculator’s recommendations across an entire season, produces a more reliable read on their analytical sophistication than any single game’s calls. Andy Reid’s career-long aggressiveness, Sean McVay’s evolution from conservative to aggressive, Bill Belichick’s situational adaptability — these patterns are real in the data.
Identifying coaches who are leaving points on the field. A head coach whose decisions consistently lag the model’s recommendations by 0.3+ expected points per fourth-down opportunity is, over a season, leaving roughly 3-5 expected points on the table. That’s almost a full game’s worth of points across the season. The math identifies these patterns more cleanly than mainstream coverage typically does.
In-game decision support. Modern NFL teams have analysts on the sideline with real-time decision tools. The head coach can ask “what does the chart say” and get a probability-optimal answer within seconds. The integration of analytical tools into in-game decision-making has been one of the most measurable infrastructure changes in the sport over the last decade.
Postseason scenario analysis. Playoff games concentrate the leverage of fourth-down decisions. A coach who is aggressive in the right situations can swing the win-probability of an entire playoff run by several percentage points. The retroactive analysis of Super Bowl decisions — both winning and losing — is one of the more rigorous applications of the framework.
A working example: the 2023 Detroit Lions
The 2023 Detroit Lions are one of the cleanest fourth-down case studies of the modern era. Head coach Dan Campbell went for it on fourth down at a rate well above league average, with several high-profile decisions during the team’s NFC Championship run that the 4th Down Calculator graded as optimal but mainstream coverage second-guessed in real time. The most-discussed sequence was the NFC Championship game against San Francisco, in which Campbell elected to go for it on a fourth-and-three near midfield in the second half, failed to convert, and was widely criticized when the 49ers scored on the ensuing possession.
The post-game analytical writing — at The Athletic, at the various analytics-friendly podcasts, at the calculator-driven Twitter community — was largely supportive of Campbell’s decision. The 4th Down Calculator had recommended going for it. The expected-value math favored the call. The outcome was the variance, not the process. The piece that ran in the New York Times framing Campbell as “reckless” was a textbook case of outcome bias dominating analytical judgment.
The deeper case study is the multi-season pattern. Across 2022, 2023, and 2024, Campbell’s fourth-down go-rate substantially exceeded league average, and his actual conversion success rate was about league average or slightly above. The expected-value gain from his aggressiveness, summed across three seasons, was roughly 18-22 net expected points — the equivalent of one to two extra wins per season. The Lions’ rise from perennial bottom-tier to consistent playoff contender during this period had many causes; the fourth-down arithmetic was one of them.
The limits: what fourth-down math cannot tell you
The honest version of this writing names the limits.
Fourth-down math cannot predict individual play outcomes. The model gives you the probability of conversion, the expected points of each option, and the win-probability impact of the decision. It cannot tell you whether the specific play call will work. That’s why the variance exists.
Fourth-down math cannot fully capture personnel matchups. A team with an elite short-yardage running back facing a defense with a porous interior should go for it more aggressively than the league-average calculator suggests. The model knows the average; the coach knows the specific matchup. The hybrid coaches do better than the calculator alone.
Fourth-down math cannot model emotional and momentum effects, to the extent those are real. The data does not strongly support large momentum effects in football — the literature on this is mixed — but they may exist in specific situations, and the calculator does not try to capture them. Coaches who claim to integrate momentum into their decision-making are working off intuition that may or may not be empirically valid.
Fourth-down math cannot eliminate the asymmetric scrutiny problem. Even an analytically-aggressive coach making technically correct decisions will, when those decisions fail, face coverage that questions their judgment. The math says the decisions are correct; the media incentives say to be slightly more conservative. The tension is structural and the math alone cannot resolve it.
One additional limit: the league-wide adoption of the math has, by 2026, started to close the gap between what the model recommends and what most coaches do. The arbitrage that existed in the 2010s — when bold coaches could exploit conservative orthodoxy across the league — has shrunk meaningfully as more teams hire analytics staffs and use real-time decision tools. The next frontier is probably in situational tailoring rather than general aggressiveness, but the public-facing data infrastructure for that work is still catching up.
Frequently asked questions
What is a “good” fourth-down go-rate for an NFL head coach?
League-average fourth-down go-rate on situations the calculator favors going for it is roughly 60-65% in 2024-25, up from about 30-35% in 2014-15. Elite analytical coaches go for it on 80%+ of model-favored opportunities. The remaining gap between observed and optimal is partly explainable by personnel and matchup considerations the calculator doesn’t see; some of it is residual conservatism the analytics community continues to push back on.
Why did Bill Belichick get a reputation for analytical aggressiveness?
Belichick was, in the 2009 era, one of the first NFL head coaches to publicly go against the conservative orthodoxy. His famous fourth-and-two decision against the Colts that year, in which he went for it on his own 28-yard line and failed, drew widespread criticism. The 4th Down Calculator at the time mildly disagreed with the call (it was a close decision); the cultural reaction far exceeded the model’s mild disagreement. Belichick’s reputation as an analytical coach was earned partly through that and similar high-profile situational choices.
Does the fourth-down math apply to college football?
Yes, with adjustments. The same expected-value framework applies, but the specific conversion rates, field-goal accuracy, and punt outcomes differ between the NFL and college football. College football kickers are less accurate; offenses tend to convert short-yardage at slightly higher rates; the win-probability calculations require their own calibration. The cfbfastR ecosystem has produced public tools modeled on the NFL versions for the college game.
Where can I see fourth-down recommendations in real time?
Ben Baldwin’s 4th Down Calculator, hosted at rbsdm.com, provides real-time recommendations for live NFL games. ESPN’s broadcasts increasingly surface in-game decision grades during the broadcast itself. The NFL’s own Game Charge feature publishes post-game decision evaluations. PFF and EdjAnalytics offer subscription versions with deeper situational adjustments.
Sources and further reading
- Ben Baldwin’s 4th Down Calculator — the public-facing tool that has shaped most of the modern fourth-down conversation.
- Brian Burke’s Advanced NFL Stats archive — the foundational decade-plus of public writing on fourth-down decision-making.
- David Romer’s “Do Firms Maximize?” — the 2006 academic paper that opened the modern analytical conversation about NFL coaching decisions.
- EdjAnalytics — the commercial provider used by multiple NFL clubs for in-game decision support.
- Bill Barnwell’s writing — long-form NFL analysis that consistently integrates fourth-down decision evaluation.
The Philly Special, the laminated card, the 4th-and-1 from the one-yard line that ended in a Super Bowl-winning touchdown — these were not the moments the fourth-down math began. They were the moments the math finally arrived in the broadest cultural conversation. The data had been telling the same story since the early 2000s. The coaches finally listened. The next decade of NFL coaching is going to be about the situational refinements that the next-generation models are starting to produce. For the broader frame on how expected-value thinking applies across football analysis, our guide to EPA is the natural companion piece.



