ProFootballLogic
ARTICLES	TEAMS	STATS	RATINGS	STANDINGS	GAMES	SCHEDULE	PLAYERS	METHOD	SPORTS

The ProFootballLogic Method

Thursday, July 5, 2012

The goal of ProFootballLogic.com is to provide football fans with the most accurate and insightful description possible of why football games are won or lost. Conventional football wisdom offers a variety of explanations on this topic, but is often littered with contradictions and unsubstantiated claims. Highly regarded individuals in the sport or media will say that "establishing the run and playing strong defense" is the key to success one week, then state that "it's a quarterback driven league" the next. They will say that last week's winners "carried momentum into the game and always come through in the clutch," then fail to predict next week's winners based on the same reasoning. So where are the facts amidst all the claims? By analyzing the data, certainly we can come to more logical and useful conclusions than conventional wisdom provides.

Even the most traditional football opinions are based at least partially on stats. Basic stats are everywhere when it comes to football news and analysis, but fail to give us a full picture because of the many different categories of stats. We can use intuition to roughly estimate the impact of any given play on a football game's outcome, and can even more roughly use basic stat totals to get a general sense of a game or a team. But it's very difficult to estimate the relative importance of yards, touchdowns, field goals, punts, sacks, or turnovers, and we don't know whether 5 yards may have come on 3rd and 4 or 3rd and 6.

Using EPA to Universally Estimate Success of Each Play

See Full Stats Explained Article

The best way to analyze football stats is instead to convert each play into the same unit of measurement -- points. To do this, we must assign every down/distance/yard-line combination a value in terms of the average number of net points a team can expect to gain from their current position, based on a league average.

For instance, a team with 1st and goal on an opponent's 1 yard line can expect to gain almost 7 points as they are very likely to score a touchdown (the expected points on scores are also a fraction lower because a team receiving a kickoff is in slightly better than neutral position). Likewise, a team near their own endzone on a late down can be expected to net negative points, as their opponent is more likely to score next.

Then, a play's success can be measured by the number of expected points gained or lost during the play. This technique is rare but not unique to ProFootballLogic, and is generally regarded as EPA, or expected points added. With this new stat, we can see a better representation of how a game was won by looking at which types of plays a team really benefitted from the most. At the end of a game, the difference in the two teams' total EPA will equal the difference in real points, but provides much more insight.

First, we can immediately tell how successful each team was with the ball, and whether the real life points they scored were a result of offensive success or defensive efficiency creating turnovers and good field position. Further, we can see on exactly what type of plays teams had success, whether while passing, running, punting, forcing turnovers etc.

EPA stats do a great job of explaning how games are won, but we need to dig deeper to understand why they are won, and what they can tell us about the future. Was that huge passing performance a great job by the offense or a terrible job by the defense? And does that win generated by a pick-six and punt return touchdown really mean that team will win in the future?

How Random Variation Affects Predictability

To answer these questions, and come up with the most accurate rating system possible, we need to study perhaps the most misunderstood aspect of football, and sports in general -- random variation. As humans, we are inclined to draw simple connections between events. That quarterback must have thrown that interception because he doesn't have enough experience in the playoffs. And that running back must have had a huge day because of the "bulletin board material" the defense provided prior to the game. But throw 10 darts at a dartboard and try to come up with an explanation as to why some were more accurate than others.

Surely, there is a physical reason why each went where it did if you were able to examine closely enough, but for our purposes, we have to assign such things we can not predict better than a coin flip as "random" or "luck". Add other complicating factors such as varying play calls, close referee decisions, and the fact that each play is the combination of up to 22 players' own variation, and it should be easy to see that there is a better explanation than conventional wisdom for why the better team doesn't simply win every game they play.

So what is the best way to model something that we know we can not know for certain? The answer is a normal distribution estimate of probability. The normal distribution, or bell curve, is an estimate of how likely a certain result is based on how much variation is involved. Flip a coin 10 times, count the number of heads, repeat the process 100 times, and what will result is a normal distribution showing the odds for each amount of heads.

Just about all things in nature display a normal distribution, from heights among a human population to how close you throw that dart to the bullseye. This is because complex things are often governed by a combination of many small random variations. And sure enough, the normal distribution can accurately model just about everything in football, from pass yards, to interceptions, to wins.

The Formation of Predictive Ratings

See Full Ratings Explained Article

EPA stats follow the normal distribution as well, and because certain aspects of the game show more random variation than others, we can study them to estimate the difference between past success and real quality going forward.

Consider the scenario of a team fumbling a snap, and the defense recovering the ball. This single play can typically net the defense about a 4 point advantage in the game, and can often lead to a win that otherwise would not have been gained. But if that team does win the game by 4 points, does that really mean they are a better team? Certainly, one wouldn't think so, and the stats back it up. Because these plays are so rare, it's barely detectable that offenses that have lost fumbled snaps in the past are more likely to do so again in the future. And despite the media certainly claiming that "the pass rusher caused the fumble because the quarterback was concentrating on him instead of the ball," defenses recovering lost fumbled snaps are no more likely to do so again.

This is a more extreme case of "past success does not mean future success," but it turns out that most facets of the game display this effect to some extent. No team that rushes for 200 yards in the first week of the season can be expected to average 200 yards per game. In general this effect can be considered the "small sample size" problem. By finding which types of plays offer more repeatable results, we can figure out where to more effectively assign this explanation to come up with the most accurate ratings of teams.

ProFootballLogic has analyzed years of data to estimate the most likely normal distributions governing variations among team quality and a game's random variation for many different play types. These distributions can be used to tell exactly how likely any team's success or failure in any given play category is to be repeated.

With this information, ratings are then estimated for all teams at once by optimizing for the most likely scenario, while factoring in all games' results, strength of opponent, and applicable home field advantage. Ratings indicate the average result expected in EPA from the team for a given play type against a league average opponent at a neutral site, and the sum of all play types equals a total rating for each team. Because net EPA equals point difference, the difference between any two teams' ratings simply represents the expected point spread should they play each other at a neutral site.

Recent Articles
If 2021 Had 16 Games - 1/10/22
Wk 18 Playoff Scenarios 2021 - 1/8/22
Wk 17 Playoff Scenarios 2020 - 1/1/21
Wk 17 Playoff Scenarios 2019 - 12/27/19
2 Week Playoff Scenarios 2019 - 12/21/19
3 Week Playoff Tiebreakers 2019 - 12/11/19
NFL Injury Point Value - 6/18/19
How Teams Value Draft Picks - 4/25/19
Analyzing The Zion Injury - 3/21/19
Week 17 Playoff Scenarios 2018 - 12/27/18

Fundamental Articles
Site Summary and Features - 7/5/12
The ProFootballLogic Method - 7/5/12
Stats Explained - 7/5/12
How Variation Affects Outcomes - 7/5/12
Ratings Explained - 7/5/12
General Play Type Analysis - 7/5/12
London and Home Field Disadvantage - 8/21/12
Ranking College Conferences - 1/11/13
Draft Position and Player Quality - 4/25/13
The Changing Landscape of the NFL - 5/21/13
College Yearly Rating Regression - 8/28/13
How College Basketball Rankings Fail - 2/28/14
Franchise Tag Position Problems - 3/7/14
Developing a World Cup Model - 6/9/14
Optimizing College Playoff Selection - 11/26/14
The Science of Football Deflation - 1/27/15
2015 College Football Model - 9/3/15
2015 NFL Team Ratings Model - 10/20/15
NFL Injury Rate Analysis - 2/22/16
2016 NCAA Basketball Model - 3/16/16
How Good Are Pro Bowl Teams? - 1/28/17
Do NCAA Basketball Teams Get Hot? - 3/15/17
NFL Census 2016 - 4/19/17
Tom Brady Is Not The G.O.A.T. - 8/3/17
Grading NFL Franchises - 9/28/17
Analyzing Loyola-Chicago - 3/31/18
NFL Draft Pick Value - 4/26/18
CFB Preseason Rank Analysis - 8/29/18
How Teams Value Draft Picks - 4/25/19
NFL Injury Point Value - 6/18/19