SP+ Explained: Bill Connelly’s Index and Reading Saturday Chaos Through Math

A college football stadium on a Saturday afternoon, field and crowd visible.

November 1, 2007, a Thursday night in Champaign, Illinois. The Illini, six-point underdogs at home, beat third-ranked Ohio State 28-21, and the BCS title race blows up. A month later, Kansas, undefeated and barely on the national radar, plays Missouri in the Border War for the chance to keep alive a national championship dream that the polls had been quietly underestimating for two months. The era of “you have to actually beat someone good to be considered good” was breaking down in front of a Saturday afternoon audience that did not, mostly, have the vocabulary for what was happening. Somewhere in Lawrence or Tuscaloosa or central Texas, a young writer named Bill Connelly was already a year into the side project that would, fifteen years later, become the most-cited advanced metric in college football: SP+.

The college football season is a story problem with 134 unknowns. Twelve regular-season games each, played against schedules with wildly different difficulties, in a sport where transfer portal volatility, conference realignment, and a 25-percent annual roster turnover make every comparison a sample-size nightmare. The NFL, with sixteen relatively comparable franchises and a hard schedule, looks orderly by comparison. College football, with FBS, FCS, Power Four, Group of Five, and the gradient of competition that runs from Alabama to Akron, has spent decades trying to answer one question: which team is actually the best? The polls had opinions. The computer rankings had algorithms. Bill Connelly built a different kind of tool, and it has, slowly, become the analytical backbone of modern college football coverage.

I have been writing about football analytics since 2014, and the metric I find myself reaching for most often when a college football conversation gets loose is the one this article is about. SP+ — what it is, where it came from, what it does well, where it breaks, and how to read it without falling for the bowl-game noise — is the subject of this article.

The origin: where SP+ came from

SP+ began as the F/+ system at Football Outsiders in the late 2000s, the brainchild of Bill Connelly, who had been blogging college football for SB Nation and writing for various corners of the football-analytics community. The F+ system combined Brian Fremeau’s drive-based FEI ratings with Connelly’s own play-by-play S&P system, producing a hybrid that captured both per-drive efficiency and per-play execution.

Connelly’s specific contribution was the S&P side — a play-by-play metric built on three core components: Success Rate (the percentage of plays that meet a down-and-distance success threshold), Explosiveness (the average yardage of successful plays), and Equivalent Points (a college-football analog of the NFL’s expected points model). Combined and opponent-adjusted, S&P became the leading public play-by-play advanced metric for college football.

The “SP+” name emerged when Connelly moved the system to ESPN in the late 2010s. The metric was renamed, refined, and expanded to include preseason projections built on returning production, recruiting, transfer-portal additions, and recent-history baselines. By the time Connelly was writing the system into ESPN columns and SB Nation deep dives, SP+ had become the closest thing college football had to a public-facing DVOA — a single number that captured a team’s per-play quality, adjusted for opponent strength, and that updated weekly as the season unfolded.

The Football Outsiders era ended with the site’s pause, but SP+ continued under ESPN’s stat database. Connelly’s own newsletter, Study Hall, and ESPN’s college football coverage now serve as the primary public outlets for the metric and the writing that interprets it.

How it works: SP+ in plain language

SP+ is, structurally, an opponent-adjusted per-play efficiency rating, expressed in points-per-game equivalent. The simplest way to read it: if Team A’s SP+ is +20.0 and Team B’s is +5.0, the model thinks Team A is, on a neutral field, a 15-point favorite. The number folds together offensive efficiency, defensive efficiency, and special teams performance, weighted by their predictive value.

The components, in order of impact:

Offensive SP+. The opponent-adjusted offensive rating. A team’s offense generates Success Rate, Explosiveness, and Equivalent Points on every snap. The model takes the raw output, adjusts for the quality of the defenses faced, and produces a per-play efficiency number expressed as points above league average. Elite offenses (Oregon 2024, Georgia 2022) post +20 or higher. The worst Power Four offenses live below +5.

Defensive SP+. Same structure, opposite direction. A team’s defense allows Success Rate, Explosiveness, and Equivalent Points; the model adjusts for offensive quality faced. Elite defenses (Michigan 2023, Georgia 2021) suppress opponents to a defensive rating of -15 or lower (lower is better here). Bad Group of Five defenses can post defensive SP+ in the high single digits or above.

Special Teams SP+. Field-goal accuracy, kickoff and punt return value, punt accuracy. A smaller component, but in close games it can matter. The math gives special teams roughly one-tenth the weight of offense and defense combined.

Combine the three with appropriate weights, and the result is a single SP+ number that lets you compare Alabama to Akron, the SEC to the MAC, and 2007 Hawaii to 2024 Boise State, all on the same scale.

The critical component: opponent adjustment

The single most important feature of SP+ — and the feature that separates it from any per-play stat you can pull off a stats page — is opponent adjustment. College football is not the NFL. A team’s raw offensive output is almost meaningless without knowing which defenses they played against. A team can post 45 points per game on the surface and be playing the eighth-worst slate of defenses in FBS, in which case their offensive quality is far more pedestrian than the box score suggests.

The opponent adjustment is iterative. The model assigns each team an initial rating based on their per-play performance. It then re-rates every team in the country based on the new ratings of their opponents, which produces a new round of opponent quality estimates. The process repeats until the ratings stabilize. The output is a set of team ratings in which each team’s quality has been re-evaluated against the actual quality of the teams they faced, recursively, across the entire season.

Wide view of a college football stadium on a Saturday afternoon, field markings and crowd visible
College football’s structural chaos — 134 FBS teams, wildly varied schedules — is exactly the problem SP+’s opponent adjustment was built to solve.

SP+ vs the alternatives: a comparison

Public college football has more advanced metrics now than at any point in history. A short comparison:

MetricWhat it measuresWhere it shinesWhere it breaks
SP+ (Connelly)Opponent-adjusted per-play efficiencyTeam comparison, predictive value, bowl projectionsSlow to react to mid-season identity shifts
FEI (Fremeau)Opponent-adjusted per-drive efficiencyPacing-neutral team strengthLess reactive to single-play volatility
FPI (ESPN)Predictive rating with returning productionPreseason projections, in-season win probabilitiesLess transparent than public-facing alternatives
EPA per playPer-snap expected point changeGame-level efficiency, QB evaluationNot opponent-adjusted by default
Massey CompositeAverage of many computer ratingsQuick consensus checkSmoothed; loses methodological distinction

The honest reading of a college football team’s season uses two or three of these in parallel. SP+ is the analytics-conversation default, but the agreements and disagreements between SP+, FEI, and FPI are themselves informative. A team that ranks 8th in SP+, 11th in FEI, and 6th in FPI is, by consensus, a top-ten team. A team that ranks 8th in one and 25th in another deserves a longer conversation about why.

What the data needs: inputs

SP+ runs on play-by-play data, which is, in college football, a notoriously messier dataset than the NFL’s. The official statistics provided by NCAA, schools, and conference networks have known inconsistencies — pass-versus-rush labeling, sack accounting, garbage-time delineation. The serious advanced-stats community uses cleaned versions of this data, often via the cfbfastR package, which mirrors the NFL’s nflfastR and provides the closest thing college football has to a clean public play-by-play feed.

The model’s inputs, by component:

For the per-play components: down, distance, yard line, score, time remaining, play type, play outcome. The same skeleton that powers NFL EPA, applied to college data.

For the opponent adjustment: the full schedule and outcomes of every FBS team, with iterative ratings that converge over multiple passes.

For the preseason and early-season variants: returning production (the percentage of a roster’s production from the prior season that returns), recruiting class composite ratings (typically 247Sports or Rivals composites), transfer portal additions and subtractions, and a recent-history baseline (usually three to five prior seasons of SP+ for the program).

The result is a metric that updates weekly as games are played and the model rebalances. Early-season SP+ is partially preseason projection; late-season SP+ is almost entirely play-by-play data. By bowl season, the projections have fully converged.

Building the analysis: a working framework

The practical workflow for using SP+ in college football writing:

  1. Start with the current SP+ rankings. ESPN publishes them weekly. Connelly’s Study Hall newsletter publishes deeper breakouts.
  2. Look at offensive and defensive splits separately. A 9-2 team that ranks +25 in offensive SP+ and +5 in defensive SP+ is a different team than a 9-2 team that ranks +12 in offense and +18 in defense. Both win games; they win them differently, and they project differently against an elite opponent.
  3. Check the schedule-adjusted record. SP+ implicitly contains a “second-order win total” — how many games a team should have won, given their per-play efficiency. A team whose actual record is meaningfully better than their second-order record is, on average, going to regress.
  4. Cross-reference with returning production and recruiting. A team whose SP+ is buoyed by returning production from a senior-heavy roster has a different trajectory than a team whose SP+ is built on freshman impact.
  5. For matchups, look at offensive SP+ vs defensive SP+. A high-offensive-SP+ team against a high-defensive-SP+ defense is the classic strength-on-strength game. The math has reasonably good predictive value here, especially for spread and total betting markets.

Where this gets weird: common mistakes

SP+ has known failure modes, and the writers who quote it well usually name them.

Mid-season identity shifts are real and the model is slow. A team that fires its offensive coordinator after week six and rebuilds its scheme is, by week ten, a different team. SP+ weights all weeks roughly equally over the course of the season (with some recency adjustments), which means a team that has fundamentally improved in November will look worse in the rankings than the eye test suggests. The lag is small but real.

Garbage-time pollution is everywhere in college football. A 56-7 Saturday afternoon win by an SEC powerhouse over a Group of Five opponent generates a meaningful chunk of play data that is, structurally, not predictive of what the team will do against a peer. The major SP+ implementations filter for garbage time, but the cutoffs vary, and the model can still be flattered or punished by lopsided games. Always check.

Quarterback injuries can render ratings meaningless mid-season. College football has no NFL-equivalent of backup quarterback continuity. A team whose starter goes down can drop two SP+ ratings tiers in a week, and the model takes several games to catch up. Late-season ratings are most reliable when the same quarterback has played most of the snaps.

The Group of Five conferences are still underweighted. The opponent-adjustment math depends on cross-conference games to calibrate quality differentials. When a Group of Five team plays few-to-no Power Four opponents in a season, their rating is anchored to a smaller, noisier set of comparison points. A 12-0 American Conference team is, by SP+, almost always rated lower than their record suggests, and the gap is wider than fans often want to acknowledge.

When SP+ shines: use cases

The strongest applications:

Bowl game and CFP projections. SP+ has produced more accurate bowl-game projections than the betting markets in multiple years of post-hoc analysis. The opponent adjustment is particularly powerful when teams from disparate conferences meet on a neutral field. The math holds up.

Strength-of-schedule arguments. The College Football Playoff selection process now leans more heavily on opponent quality than the old BCS did. SP+ provides the cleanest single-number summary of how good a team’s opponents actually were, and it has become a fixture in CFP committee deliberation talk.

Identifying overperforming and underperforming teams. A team whose record exceeds their SP+ rating projection is, on average, going to regress. A team whose record undershoots their SP+ is, on average, going to win more games down the stretch. The model’s correlation with future performance has, over more than a decade, been consistent.

Cross-era and cross-conference comparison. The 2022 Georgia Bulldogs vs the 2009 Alabama Crimson Tide is a debatable comparison in the abstract. SP+ provides a defensible scale on which to make the argument. Same with comparing the SEC to the Big Ten in any given year — the conference-level SP+ averages produce arguments that hold up better than gut-feel rankings.

The limits: what SP+ cannot tell you

The honest version of this writing names the limits.

SP+ cannot tell you who is going to win on Saturday. It can tell you who has been the better team in the games already played, and what the projection looks like for the next game. The translation is non-trivial, especially in a sport where weather, injuries, and emotional factors regularly drive single-game variance.

SP+ cannot capture coaching adjustments in real time. A team that has changed coordinators, schemes, or starting quarterbacks in the last three weeks will not have those changes reflected in the rating until the data catches up. Mid-season turnarounds are real. The metric lags them.

SP+ cannot account for matchup-specific issues that don’t show up in the broader profile. A team whose offense relies on a power-running game can post elite SP+ all season and run into a defense built specifically to stop power running. The matchup math underneath the headline rating is sometimes the more interesting story.

SP+ cannot replace watching the games. It tells you which teams to take seriously. It tells you which matchups to circle. It does not capture the texture of a Saturday afternoon in Tuscaloosa or a night game in Eugene, and it is not trying to. The metric is a translation tool. The translation needs the original to be worth anything.

A working example: Michigan’s 2023 national title run

Michigan’s 2023 College Football Playoff title run, capped by the win over Washington in Houston, is a useful test case for SP+. The Wolverines entered the playoff at +28.6 in SP+, ranking second nationally; Washington entered at +23.2, ranking fifth. The model favored Michigan by roughly five and a half points on a neutral field, with a win probability in the 65-68% range. The Wolverines won the title game by 21. The point spread covered. The model’s favorite was correct. The margin was wider than the projection, which the model attributed primarily to Washington’s special teams and red-zone offense both performing below their season averages.

The more interesting SP+ analysis of that postseason was around the matchup that didn’t happen: a hypothetical Georgia versus Michigan title game, which had been the consensus expectation through October. SP+ had Georgia at +29.4 and Michigan at +28.6 — a near-tie, with the model effectively unable to separate them. The committee’s decision to leave Georgia out, in favor of Florida State and a one-loss Washington, was, in SP+ terms, a coin flip among elite contenders. The model’s neutral-field projections for the games that did happen turned out to be reasonably accurate. The deeper lesson was that in any given playoff cycle, the difference between the second and fifth best team in the country, by SP+, is often smaller than the seeding suggests.

One more limit to name: SP+ depends, like all rating systems, on the integrity of the underlying play-by-play data. College football’s data ecosystem remains messier than the NFL’s, with periodic inconsistencies in how garbage time is defined, how sacks are categorized, and how overtime possessions are weighted. The model is robust to most of this noise, but in a season where data-feed issues are unusually pronounced, the ratings can drift in ways that aren’t obvious from the leaderboard alone. The serious analytics community has been cleaning this up year over year, and the data is meaningfully better now than it was in 2015. It is still not perfect.

The deeper limit is that college football itself, as a structural product, is changing faster than the model can adapt. NIL deals, the transfer portal, and conference realignment have made year-to-year continuity weaker than in any other major American sport. A program that ranks +18 in SP+ entering a season can lose its starting quarterback to the portal in March and find itself rebuilt by August. SP+’s preseason projections account for transfer additions and subtractions, but the depth of roster transformation in 2026 is testing the boundaries of what any predictive model can do. The number is still useful. The number is also living through a period of unusual structural volatility, and the careful writer names that out loud.

Frequently asked questions

How is SP+ different from ESPN’s FPI?

FPI is ESPN’s predictive rating system, which incorporates SP+ components but also adjusts for variables like roster experience, returning starters, and home-field advantage. SP+ is purer in its derivation — it’s a measurement of play-by-play efficiency, adjusted for opponent strength. FPI is more explicitly predictive; SP+ is more descriptive. The two are usually in agreement on the top of the rankings and can diverge on the middle of the FBS landscape.

Can I trust early-season SP+ ratings?

Cautiously. Early-season SP+ is heavily weighted toward preseason projections, which fold in returning production, recruiting, and prior-season ratings. By week six or seven, the play-by-play data dominates. Mid-season ratings are usually reliable for the top 25; ratings in the 50-90 range can shift meaningfully week to week as opponent adjustments propagate.

How well does SP+ predict bowl games?

SP+ has historically beaten the closing point spread in 53-55% of bowl games when the model and the market disagree by more than 3 points. That’s a real edge but not a guarantee. The model is most reliable when the matchup involves teams from disparate conferences (where opponent adjustment is most valuable). It’s less reliable when injuries, opt-outs, or motivation issues are heavily weighted in the matchup.

Where can I see SP+ ratings myself?

ESPN’s college football SP+ page publishes the current ratings, updated weekly. Bill Connelly’s Study Hall newsletter on Substack provides deeper analytical context. The cfbfastR R package lets you build your own play-by-play analysis on the same data SP+ uses.

Sources and further reading

The Illinois upset of Ohio State in 2007 — the moment that fractured the BCS title race and launched a thousand “what’s wrong with college football’s selection system” columns — is, in retrospect, exactly the kind of game SP+ was built to contextualize. The Buckeyes were a +14 SP+ team that year. Illinois was around +5. On a neutral field, that’s a one-score game. At home, against a fired-up Big Ten opponent on a Thursday night, the math says Illinois had a 25-30% chance to win. It happened. The poll voters acted as though it should never have. The analytics community, in the years that followed, built a vocabulary that could explain why it did. For the conceptual frame on reading all of these metrics together, our primer on sports analytics is the natural next read.