Explanation of FO's stats
DVOA EXPLAINED
DVOA is a method of evaluating teams, units, or players. It takes every single play during the
NFL season and compares each one to a league-average baseline based on situation. DVOA measures not just yardage, but yardage towards a first down: five yards on third-and-4 are worth more than five yards on first-and-10 and much more than five yards on third-and-12. Red zone plays are worth more than other plays. Performance is also adjusted for the quality of the opponent. DVOA is a percentage, so a team with a DVOA of 10.0% is 10 percent better than the average team, and a quarterback with a DVOA of -20.0% is 20 percent worse than the average quarterback. Because DVOA measures scoring, defenses are better when they are negative. For more detail, read below.
The majority of the ratings featured on FootballOutsiders.com are based on
DVOA, or Defense-adjusted Value Over Average. DVOA breaks down every single play of the NFL season to see how much success offensive players achieved in each specific situation compared to the league average in that situation, adjusted for the strength of the opponent.
The NFL determines the best players by adding up all their yards no matter what situations they came in or how many plays it took to get them. Now why would they do that? Football has one objective-to get to the end zone-and two ways to achieve that, by gaining yards and getting first downs. These two goals need to be balanced to determine a player's value or a team's performance.All the yards in the world aren't useful if they all come in eight-yard chunks on third-and-10.
The popularity of fantasy football only exaggerates the problem. Fans have gotten used to judging players based on how much they help fantasy teams win and lose, not how much they help real teams win and lose. But fantasy scoring skews things by counting the yard between the one and the goal line as 61 times more important than all the other yards on the field. Let's say, for example, that Anquan Boldin catches a pass on third-and-15 and goes 50 yards but gets tackled two yards from the goal line, and then Tim Hightower takes the ball on first-and-goal from the two-yard line and plunges in for the score. Or, let's say that the Cardinals are playing the Falcons. The Falcons take a touchback on the opening kickoff, and the Carolina defense stuffs the Falcons running game twice, and on third-and-10 Matt Ryan throws the ball into the arms of
Adrian Wilson, who gets taken down by Michael Turner at the two-yard line. Then on the ensuing first-and-goal, Hightower scores a touchdown.
Has Hightower done something special? Not really. When an offense gets the ball on first-and-goal at the two-yard line, they are going to score a touchdown five out of six times. In the first situation, Hightower is getting the credit that primarily belongs to the passing game. In the second situation, Hightower is getting the credit that primarily belongs to the defense.
DVOA does a better job of distributing credit for scoring points and winning games. It uses a value based on both total yards and yards towards a first down, based on work done by Pete Palmer, Bob Carroll, and John Thorn in their seminal book,
The Hidden Game of Football. On first down, a play is considered a success if it gains 45 percent of needed yards; on second down, a play needs to gain 60 percent of needed yards; on third or fourth down, only gaining a new first down is considered success.
We then expand upon that basic idea with a more complicated system of "success points." A successful play is worth one point, an unsuccessful play zero points. Extra points are awarded for big plays, gradually increasing to three points for 10 yards, four points for 20 yards, and five points for 40 yards or more. There are fractional points in between. (For example, eight yards on third-and-10 is worth 0.63 "success points.") Losing four yards is -1 point, losing 12 yards is -1.8 points, an interception is -6 points, and a fumble is worth anywhere from -1.70 to -3.98 points depending on how often a fumble in that situation is lost to the defense - no matter who actually recovers the fumble. Red zone plays are worth 20 percent more, and there is a bonus given for a touchdown.
(The system is a bit more complex than the one in
Hidden Game thanks to a number of improvements since we launched the site in 2003.)
Every single play run in the NFL gets a "success value" based on this system, and then that number gets compared to the average success values of plays in similar situations for all players, adjusted for a number of variables. These include down and distance, field location, time remaining in game, and current scoring lead or deficit. Teams are always compared to one standard, as the team made its own choice whether to pass or rush. However, when it comes to individual players, rushing plays are compared to other rushing plays, passing plays to other passing plays, tight ends get compared to tight ends and wideouts to wideouts.
Imagine two running backs who each gain three yards. Player A gains three yards under a set of circumstances where the average NFL running back gains only two yards (for example, third-and-1), it can be argued that Player A has a certain amount of value above others at his position. Likewise, if Player B gains three yards on a play where, under similar circumstances, an average NFL back would be expected to gain five yards (for example, second-and-15), it can be argued that Player B has negative value relative to others at his position.
Once we have all our adjustments, we can find the difference between this player's success and the expected success of an average running back in the same situation (or between this defense and the average defense in the same situation, etc.). Add up every play by a certain team or player, divide by the total baseline for success in all those situations, and you get VOA, or Value Over Average.
Of course, the biggest variable in football is the fact that each team plays a different schedule. By adjusting each play based on the defense's average success in stopping that type of play over the course of a season, we get DVOA, or Defense-adjusted Value Over Average. Rushing and passing plays are adjusted based on down and location on the field; receiving plays are also adjusted based on how the defense performs against passes to running backs, tight ends, and wide receivers. Defenses are adjusted based on the average success of the offenses they are facing. (Yes, this is still called DVOA, for the sake of simplicity.)
The biggest advantage of DVOA is the ability to break teams and players down to find strengths and weaknesses in a variety of situations. In the aggregate, DVOA may not be quite as accurate as some of the other, similar "power ratings" formulas based on comparing drives rather than individual plays, but, unlike those other ratings, DVOA can be separated not only by player but also by down, or by week, or by distance needed for first down. This can give us a better idea of not just which team is better but why, and what a team has to do in order to improve itself in the future. You will find DVOA used by Football Outsiders in a lot of different ways. Because it takes every single play into account, it can be used to measure a player or a team's performance in any situation. All Minnesota third downs can be compared to how an average team does on third down. JaMarcus Russell or David Garrard can each be compared to how an average quarterback performs in the red zone, or with a lead, or in the second half of the game.
Since it compares each play only to plays with similar circumstances, it gives a more accurate picture of how much better a team really is compared to the league as a whole. The list of top DVOA offenses on third down, for example, is more accurate than the conventional NFL conversion statistic because it takes into account that converting third-and-long is more difficult than converting third-and-short, and that a turnover is worse than an incomplete pass because it doesn't provide the opportunity to move the other team back with a punt on fourth down.
One of the hardest parts of understanding a new statistic is grasping the idea of what numbers represent good performance or bad performance. We try to make that easy with DVOA, because it gets compared to average. Therefore, 0% always represents league-average. A positive DVOA represents that the offense is more likely to score, and a negative DVOA represents that the defense is more likely to stop them. This is why the
best offenses have positive DVOA ratings and the
best defenses have negative DVOA ratings.
Ratings for teams and starting players generally follow that scale, with the best being around 30% and the worst being around -30% (opposite for defense). However, because the baseline represents four years of play (2002-2005) no year will average exactly 0%. Over the past four years, offensive levels have bounced back and forth, so in 2002 and 2004 the league average was positive, and in 2003 and 2005 it was negative. In 2006 it was at 0%, and in 2007 it was positive again.
Team DVOA totals combine offense and defense, and the team total is given by offense minus defense to take into account that better defenses are more negative. (Special teams performance is also added, as described below.)
DPAR EXPLAINED
(Note: We still have yet to update this article to reflect the change in our stats from Points to Yards Above Replacement. Those updates are coming soon, but the basic ideas behind DPAR are the same as the basic ideas behind the current stat, DYAR.)
After using DVOA for a few months, we came across a strange phenomenon: well-regarded players, particularly those known for their durability, had DVOA ratings that came out around average. The reason is that DVOA, by virtue of being a percentage or rate statistic, doesn’t take into account the cumulative value of having a player producing at a league-average level over the course of an above-average number of plays. By definition, an average level of performance is better than that provided by half of the league and the ability to maintain that level of performance while carrying a heavy work load is very valuable indeed. In addition, a player who is involved in a high number of plays can draw the defense’s attention away from other parts of the offense, and, if that player is a running back, he can take time off the clock with repeated runs.
Let's say you have a running back who carries the ball 300 times in a season. What would happen if you were to remove this player from his team's offense? What would happen to those 300 plays? Well, the player would not be replaced by thin air. This is why you have to compare performance to some kind of baseline; two yards is not two yards better than the alternative. On the other hand, while comparing players to the league average works on a per play basis, it doesn't work on a total basis because a player removed from an offense is not generally replaced by a similar player. Those 300 plays will generally be given to a significantly worse player, someone who is the backup because he doesn't have as much experience and/or talent.
To take this into account, we borrowed the concept of
replacement level from
Baseball Prospectus. Using a scale similar to the scale BP uses to determine
baseball's replacement level, we've determined that a replacement level player has a DVOA of roughly -13.3%. (If you want to know why, it is explained in
the original article introducing PAR.) Instead of determining value by comparing each play's "success value" to the average, as in DVOA, each play is instead compared to a number roughly 13.3% below the average success value of similar plays. That gives us value over a replacement level player, a better representation of a player's total contribution to his team on all his plays.
Actually, while in general replacement level is -13.3%, technically it is different for each position depending on whether we are measuring passing, rushing, or receiving. And, of course, the real replacement player is different for each team in the NFL. (Kansas City started 2005 with Larry Johnson as the backup running back, while Houston had
Vernand Morency. Big difference there.) No starter can be blamed for the poor performance of his backup, so we create a general replacement level for use across the league.
Of course, giving a number of "success value points over replacement level" would be fairly useless to the average fan and even the non-average fan. Ben Roethlisberger was worth 119.5 success value points over replacement in 2005, you would have no idea what the heck we were talking about. So we translate those success value points into a number that represents actual points. After working through statistics from the past five seasons, our best approximation is that a team made up entirely of replacement-level players would be outscored 407 to 260, finishing with a 4-12 record. Conveniently, this is close to the average record of the last four expansion teams. But part of the reason this team gives up so many more points than it scores is that it has replacement-level special teams. Those replacement level special teams are worth -27 points, making the actual baseline for determining offensive value 274 points (the baseline for defensive value is 394 points).
With a bit of math, it works out that each "success value point" over replacement level is worth about .48 actual points above this offensive baseline. We also adjust this number for the strength of the opponents each player has faced. Now I can tell you that Ben Roethlisberger was worth 57.4 points more than a replacement level quarterback in 2005, or 57.4 DPAR (Defense-adjusted Points Above Replacement). Tom Brady was worth 104.0 DPAR, Kyle Orton was worth -38.9 DPAR, and so on.
HOW CAN A 16-GAME SEASON BE SIGNIFICANT?
Football statistics can't be analyzed in the same way baseball statistics are. After all, there are only 16 games in a season. Baseball has ten times more, and even the NBA offers five times more. The more games, the more events to analyze, and the more events to analyze, the more statistical significance.
That is true, but the trick is to consider each play in an NFL game as a separate event. For example, Eli Manning played only 16 games in 2005, but in those 16 games he had 586 passing plays (including sacks) and 29 rushing plays (including scrambles) for a total of 615 events. Manny Ramirez in 2005 played in 152 games and had 650 plate appearances. For the most part, a quarterback who plays a full season will have almost the same number of plays as a baseball hitter who plays in most of his team's games.
A running back will have fewer plays than a quarterback, and wide receivers and tight ends will have even fewer. But there should still be enough plays with most starting running backs and receivers to allow for analysis with some significance. As an example, LaDanian Tomlinson ran the ball 339 times in 2005, and was the target of 77 pass targets (including incompletes), for a total of 416 plays. In general, a starting running back will have 375-450 plays over 16 games. Receivers are used a bit less, and therefore their stats are likely not as accurate. In general, starting wide receivers have 75-150 pass targets over a full season.
ISSUES WITH DVOA/DPAR
You need to have the entire play-by-play of a season in order to compute it, so it is useless for comparing players of today to players of history. As of this writing, we have processed nine seasons, 1997-2005.
DVOA is limited by what's included in the official NFL play-by-play, so we can't say which teams have the best offensive DVOA when play-faking, or the best defensive DVOA against three-receiver sets. Since play-by-play lists tackles, sacks, and interceptions, but not attempted tackles, or attempted sacks or interceptions, we don't have individual DVOA or DPAR for defensive players at this point. We're working on these issues with the Football Outsiders game charting project.
DVOA is still far away from the point where we can use it to represent the value of a player separate from the performance of his ten teammates that are also involved in each play. That means that when we say, "
Larry Johnson has a DVOA of 27.6%," what we are really saying is "Larry Johnson, playing in the Kansas City offensive system with the Kansas City offensive line blocking for him and Damon Huard selling the fake when necessary, has a DVOA of 27.6%."
With fewer situations to measure, the numbers spread out a bit more, so you'll see more extreme DVOA ratings for part-time players and for measurements of teams in more specific situations (for example, passing on third downs). The charts listing players in order of DVOA have cut-offs for number of attempts, because players with just a handful of plays end up with absurd VOA and DVOA numbers. (In 2002, for example, Henry Burris had a -103% passing DVOA.)
Passing statistics include sacks as well as fumbles on aborted snaps. Receiving statistics include all passes intended for the receiver in question, including those that are incomplete or intercepted. At some point, we hope to be able to determine just how much impact different receivers have on completes vs. incomplete passes, but various regression analyses make it clear that both quarterback and receiver have an impact on whether a pass is complete or not. The word
passes refers to both complete and incomplete pass attempts.
Unless we say otherwise, all references to third down also include the handful of rushing and passing plays that take place on fourth down (primarily fourth-and-1).
DVOA FOR SPECIAL TEAMS
The problem with a system based on measuring both yardage and yardage towards a first down, of course, is what to do with plays that don't have the possibility of a first down. Special teams are an important part of football and we needed a way to add that performance to the team DVOA ranking. Our special teams metric includes five separate measurements: field goals (and extra points), net punting, punt returns, net kickoffs, and kick returns.
The foundation of most of these special teams ratings is the concept that each yard line has a different value based on how the likelihood of scoring changes with better field position. In
Hidden Game, the authors suggested that the value of field position for the offense existed on a straight line with your own goal line being worth -2 points, the 50-yard line 2 points, and the opposing goal line 6 points. (-2 points isn't just the value of a safety; it also reflects the fact that when you are backed up in your own zone, you are likely going to see your drive stall, and you'll need to punt and give the ball to the other team in good field position. Thus, the defense is more likely to score next.) We use a more refined set of values based on our research, but the idea is the same.
The special teams ratings compare each kick or punt to the league average for based on the point value of field position at the position of each kick, catch, and return. We've determined a league average for how far a kick goes based on the yard line from where the kick occurs (almost always the 30-yard line for kickoffs, variable for punts) and a league average for how far a return goes based on both the yard line where the ball is caught and the distance that it traveled in the air.
The kicking or punting team is rated based on net points compared to average, taking into account both the kick and the return if there is one. Because the average return is always positive, punts that are not returnable (touchbacks, out of bounds, fair catches, and punts downed by the coverage unit) will rate higher than punts of the same distance which are returnable. (This is also true of touchbacks on kickoffs.) There are also separate individual ratings for kickers and punters that are based only on distance and whether the kick is returnable, otherwise assuming an average return in order to judge the kicker separate from the coverage. For the return team, the rating is only based on how many points the return is worth compared to average, based on the location of the catch and the distance the ball traveled in the air. Return teams are not judged on the distance of kicks, nor are they judged on kicks that cannot be returned.
Field goal kicking is measured differently. Measuring kickers by field goal percentage is a bit absurd, as it assumes that all field goals are of equal difficulty. In our metric, each field goal is compared to the average number of points scored on all field goal attempts from that distance. The value of a field goal increases as distance from the goal line increases.
Kickoffs, punts, and field goals are then adjusted based on weather and altitude. It will surprise no one to learn that it is easier to kick the ball in Denver or a dome than it is to kick the ball in Buffalo in December. Because we do not yet have enough data to tailor our adjustments specifically to each stadium, each one is assigned to one of four categories: Cold, Warm, Dome, and Denver/Mexico. Beginning this year, there's an additional adjustment dropping the value of field goals in Florida and raising the value of punts in San Francisco.
Once we've totaled how many points above or below average can be attributed to special teams, another formula then transforms these numbers from points to DVOA so the ratings can be added to offense and defense to get total team DVOA.
There are three aspects of special teams that don't show up in our numbers because a team has little or no influence on them -- and yet, these plays do have an impact on wins and losses. The first is the length of kickoffs by the opposing team, because no matter how strong your return man is, you can't make the other guy kick it shorter. The other two are field goals against your team, and punt distance against your team. Research shows no indication that teams can influence the accuracy or strength of field-goal kickers and punters, except for blocks. And although blocked field goals and punts are definitely skillful plays, they are so rare that they have no correlation to how well teams have played in the past or will play in the future. Special teams ratings also do not include two-point conversions or onside kick attempts, which like blocks are so infrequent as to be statistically insignificant in judging future performance.
ADJUSTED LINE YARDS EXPLAINED
(Note: The Adjusted Line Yards formula was substantially overhauled in the summer of 2005. Adjusted Line Yards in articles from 2003 and 2004 are based on a different formula and will look smaller.)
One exception to the use of DVOA/DYAR, and the use of "play success" instead of raw yardage, is the rating system for offensive and defensive lines. Actually, these are only measures of running plays, and of course the defensive numbers don't measure
just the defensive line, but the whole front seven against the run.
One of the most difficult goals of statistical analysis in football is somehow isolating how much responsibility for a play lies with each of the 22
men on the field. Nowhere is this as obvious as the running game, where one player runs while up to nine other players -- including wideouts, tight ends, and fullback -- block in different directions. None of the statistics we use for measuring rushing -- yards, touchdowns, yards per carry -- differentiate between the contribution of the running back and the contribution of the offensive line. Neither do our advanced metrics DVOA and DYAR.
We have enough data amassed that we can try to separate the effect that the running back has on a particular play from the effect of the offensive line (and other offensive blockers) and the effect of the defense. A team might have two running backs in its stable: RB A, who averages 3.0 yards per carry, and RB B, who averages 3.5 yards per carry. Who is the better back? Imagine that RB A doesn't just average 3.0 yards per carry, but gets exactly 3 yards on every single carry, while RB B has a highly variable yardage output: sometimes 5 yards, sometimes -2 yards, sometimes 20 yards. The difference in variability between the runners can be exploited to not only determine the difference between the runners, but the effect the offensive line has on every running play.
We know that at some point in every long running play, the running back has gotten past all of his offensive line blocks. From here on, the rest of the play is dependent on the runner's own speed and elusiveness, combined with the speed and tackling ability of the defensive players. If Tiki Barber breaks through the line for 50 yards, avoiding tacklers all the way to the goal line, his offensive line has done a great job -- but they aren't responsible for most of that run. How much are they responsible for?
For each running back carry, we calculated the probability that the back involved would run for the specific yardage on that play, based on that back's average yardage per carry and the variability of their yardage on every play. We also calculated the probability that the offense would get the yardage based on the team's rushing average and variability without the back involved in the play, and the probability that the defense would give up the specific amount of yardage based on its average rushing yards allowed per carry and variability. For example, based on his rushing average and variability, the probability in 2004 that Tiki Barber would have a positive carry was 80% while the probability that Giants would have a positive carry without Barber running was only 73%.
Yardage ends up falling into roughly the following combinations: Losses, 0-4 yards, 5-10 yards, and 11+ yards. In general, the offensive line is 20% more responsible for lost yardage than it is for yardage gained up to four yards, but 50% less responsible for yardage gained from 5-10 yards, and not responsible for yardage past that. Thus, the creation of Adjusted Line Yards.
Adjusted Line Yards take every carry by a running back and apply those percentages. (We don't include carries by receivers, which are usually based on deception rather than straight blocking, or carries by quarterbacks, which are generally busted passing plays except in Atlanta.) Those numbers are then adjusted based on down, distance, and situation as well as opponent (similar to DVOA) and then normalized so that the league average for Adjusted Line Yards per carry is the same as the league average for RB yards per carry (currently, we use 4.08).
Runs are listed by the NFL in seven different directions: left/right end, left/right tackle, left/right guard, and middle. Further research showed no statistically significant difference between how well a team performed on runs listed middle, left guard, and right guard, so we also list runs separated into five different directions. Note that there may not be a statistically significant difference between right tackle and middle/guard either, but until we can research further (and for the sake of symmetry) we do still split out runs behind the right tackle separately.
The system is far from perfect. We don't know when a guard is pulling and when a guard is blocking straight ahead. We know that some runners are just inherently better going up the middle, and some are better going side to side, and we can't measure how much that impacts these numbers. We have no way of knowing the blocking contribution made by fullbacks, tight ends, or wide receivers.
Other numbers we use to measure the running game:
10+ Yards gives the percentage of the team's rushing yards that come from double-digit runs, past the first 10 yards of each run. So for a 15-yard run, five yards are counted; for an 80-yard run, 70 yards are counted. This number gives you an idea of how much of a team's running game was based on the breakaway speed of the running backs -- not to mention the opportunity provided by getting past the front seven with a lot of field in front of you. After all, you can only run 80 yards if you're on your own 20. This number is not adjusted in any way.
Power success measures the success of specific running plays rather than the distance. This number represents how often a running attempt on third or fourth down, with two yards or less to go, achieved a first down or touchdown. Since quarterback sneaks, unlike scrambles, are heavily dependent on the offensive line, this percentage does include runs by all players, not just running backs. This is the only stat given that includes quarterback runs. It is not adjusted based on game situation or opponent.
Stuffed measures the percentage of runs that result in (on first down) zero or negative gain or (on second through fourth down) less than one-fourth the yards needed for another first down. Note that this is slightly different from the definition of "stuffed" used by STATS, Inc.
DRIVE STATS
The stats section of our website also features
drive stats compiled by Jim Armstrong. These stats are computed from NFL Drive Charts and are not adjusted for strength of schedule or situation. Take-a-knee drives at the end of a half are discarded. Drive stats are generally self-explanatory, giving each team's total number of drives as well as average yards per drive, points per drive, touchdowns per drive, punts per drive, and turnovers per drive, interceptions per drive, and fumbles lost per drive. LOS/Drive represents average starting field position (line of scrimmage) per drive from the offensive point of view. Drive stats are given for offense and defense, with NET representing simply offense minus defense.
A NOTE ON PLAY-BY-PLAY DATA
Our data may differ slightly from official NFL numbers due to discrepancies in different play-by-play reports. In addition, we've adjusted clock plays, with kneels no longer counting as rush attempts and spikes no longer counting as pass attempts. We also count most aborted snaps as passing plays, not rushing plays, unless the play-by-play specifies that the play was an aborted handoff.