redman
Footballguy
I'm a pretty poor number cruncher, so I pose this question to the forum.
Most people (me included) jump all over people who point out that "such-and-such-player would only have this yards per carry/reception/attempt average if you removed this big play." The logic in the criticism is that, of course, you can't simply ignore big plays because they count towards the player's statistics of course and they indicate the ability of the player to break a big play every once in a while, i.e. "past performance may be indicative of future performance".
In statistics, however, there are such things as statistical outliers, meaning unusually large or small numbers that form the exception to the data being studied. In addition, it can be useful to try to figure out what amount of yards per carry, for example, are the most indicative of the "average run" that a RB has. For example, Terrell Davis averaged 4.7 yards per carry in 1995 while Barry Sanders averaged 4.8, however I'm sure that a close examination of their carries would reveal that Barry had a greater proportion of runs for lost yards and also of runs over 20 yards than Davis did. They were very different RB's.
Certain methodologies allow you when analyzing stats to remove both the largest and the smallest numbers from your data set before looking for the mean. However, taking a RB's longest and shortest runs out of the equation is problematic because a RB can gain far more yards past the line of scrimmage than he can lose behind the line of scrimmage on the average carry so, while the impact of a long run on his average as a statistical outlier would be diminished, it would still be there.
The question, then, is how do you go about figuring out what length of run is the most indicative of a RB's carries? Taking the median would seem like a logical way, except that rushing yards are measured in whole numbers on each carry by the NFL leading to pretty homogenous results when comparing RB's, and anyway I don't know of a source that compiles, orders and lists all of a RB's runs each game by distance. That seems like a lot of work. Any ideas here?
Most people (me included) jump all over people who point out that "such-and-such-player would only have this yards per carry/reception/attempt average if you removed this big play." The logic in the criticism is that, of course, you can't simply ignore big plays because they count towards the player's statistics of course and they indicate the ability of the player to break a big play every once in a while, i.e. "past performance may be indicative of future performance".
In statistics, however, there are such things as statistical outliers, meaning unusually large or small numbers that form the exception to the data being studied. In addition, it can be useful to try to figure out what amount of yards per carry, for example, are the most indicative of the "average run" that a RB has. For example, Terrell Davis averaged 4.7 yards per carry in 1995 while Barry Sanders averaged 4.8, however I'm sure that a close examination of their carries would reveal that Barry had a greater proportion of runs for lost yards and also of runs over 20 yards than Davis did. They were very different RB's.
Certain methodologies allow you when analyzing stats to remove both the largest and the smallest numbers from your data set before looking for the mean. However, taking a RB's longest and shortest runs out of the equation is problematic because a RB can gain far more yards past the line of scrimmage than he can lose behind the line of scrimmage on the average carry so, while the impact of a long run on his average as a statistical outlier would be diminished, it would still be there.
The question, then, is how do you go about figuring out what length of run is the most indicative of a RB's carries? Taking the median would seem like a logical way, except that rushing yards are measured in whole numbers on each carry by the NFL leading to pretty homogenous results when comparing RB's, and anyway I don't know of a source that compiles, orders and lists all of a RB's runs each game by distance. That seems like a lot of work. Any ideas here?
Last edited by a moderator: