What's new
Fantasy Football - Footballguys Forums

Welcome to Our Forums. Once you've registered and logged in, you're primed to talk football, among other topics, with the sharpest and most experienced fantasy players on the internet.

Regression to the Mean (1 Viewer)

Read this article yesterday Chase and thought it was well worth the read, even though I am familiar with the concept. Those who are not familiar should be able to wrap their heads around it after reading. You don't have to be a math wiz to understand the philosophy behind it, although it is one that I rarely bring up for conversation with average fantasy gamers.

 
There's a piece I really think you should emphasize more, which is that the better player is still the better player. Because of regression to the mean, you won't project Brees to have 5400 yards again, but you're more likely to project him for 4800 than almost anyone else. A lot of people fallaciously believe that the fact that Adrian Peterson's stats are likely to regress to the mean suggest that he shouldn't be the 1.01 overall pick.

 
There's a piece I really think you should emphasize more, which is that the better player is still the better player. Because of regression to the mean, you won't project Brees to have 5400 yards again, but you're more likely to project him for 4800 than almost anyone else. A lot of people fallaciously believe that the fact that Adrian Peterson's stats are likely to regress to the mean suggest that he shouldn't be the 1.01 overall pick.
This is a good point. After all, Brees regressed in 2012 and still led the league in passing yards and passing touchdowns.

 
At what point do you think you know 'the mean' for an individual player though?

A small sample size is more likely than a big one to be wrong, but a small sample size is also more likely to be close to right than completely wrong.

I see Chase had a go at this question for Cecil Shorts using historical data (though I disagree that his draft position is a negative -- since he came out of a DIII school), but when do you start to trust that the actual on-field performance data reflects the player's overall ability?

 
Last edited by a moderator:
When you want to know if someones way works, look at their track record. I'm sure Chase uses this, if it works for him, maybe it can work for you. But many people are lacking the intellect to understand what you are writing about though.

 
At what point do you think you know 'the mean' for an individual player though?A small sample size is more likely than a big one to be wrong, but a small sample size is also more likely to be close to right than completely wrong.I see Chase had a go at this question for Cecil Shorts using historical data (though I disagree that his draft position is a negative -- since he came out of a DIII school), but when do you start to trust that the actual on-field performance data reflects the player's overall ability?
You never known the mean for a player, because a player's mean is constantly changing based on situation (i.e., supporting cast, age, coaching philosophy, etc.). But that doesn't mean you can't get a good idea of a range of means, and more importantly, know when something is far outside the range of what the player's true mean is.

Over his first two seasons, Felix Jones averaged 6.5 yards per carry on 146 carries. That obviously wasn't his mean, and he averaged 4.3 yards per carry in his third year on 185 carries.

Adrian Peterson averaged 6.0 yards per carry last year. That's not his mean, and I wouldn't project anything near 6.0 YPC for 2013. Ditto C.J. Spiller, although I personally would project him for a higher YPC average than Peterson (of course, to the extent Spiller receives fewer carries, he's more likely to achieve extreme results in YPC than Peterson in both directions. So he's probably more likely to average under 4.0 and over 5.0 YPC than Peterson if he winds up with 50-100 fewer carries.)

Matt Stafford threw 727 passes last year. That's not his mean, either.

Regarding Cecil Shorts, I think you're talking more about small sample size questions than regression to the mean. Draft status certainly is relevant to predicting a player's career, and that doesn't go away after one or two seasons. But when I say draft status matters, I mean that a group of 40 players selected in the 4th round will, on average, be worse than a group of 40 players selected in the 3rd round. For any one individual player, it might not mean very much. On the other hand, I think it's still very much a question whether or not Shorts is a very talented wide receiver or whether he's a receiver who simply benefited from being in a good situation. Is he Kevin Johnson/Jacquez Green/Johnny Knox, or is he a future Pro Bowl receiver? It's hard to say over a 12-game sample.

 
Man, between this and that other thread on defenses and their ADP value or whatever, this seems like more of a math class (and I had a minor in math after my freshman year of college, so not like I am afraid of a little math). I would rather spend my time evaluating the players and their situations than going over all these numbers and trying to calculate mythical numbers that are only going to be based on my (or someone else's) ranking/projections anyway.

I would say the best fantasy players are the ones who look at these SMALL sample sizes anyway. If you have a player with a large sample size who does well, he is going to get drafted high. If you can decipher which players will be good going forward out of the crop of guys with small sample sizes, then you are gonna have success since you can get a lot of those guys later in the draft, and the later picks are the ones that win leagues in both redraft and dynasty. Numbers alone aren't going to come close to helping you do that.

But hey, if some of you out there are able to combine both your views of players along with these numbers to make final decisions, more power to ya. I just can't help but think that the majority of the number crunching is just a waste of time. But if it works for YOU, well, keep doing it.

 
Man, between this and that other thread on defenses and their ADP value or whatever, this seems like more of a math class (and I had a minor in math after my freshman year of college, so not like I am afraid of a little math). I would rather spend my time evaluating the players and their situations than going over all these numbers and trying to calculate mythical numbers that are only going to be based on my (or someone else's) ranking/projections anyway.

I would say the best fantasy players are the ones who look at these SMALL sample sizes anyway. If you have a player with a large sample size who does well, he is going to get drafted high. If you can decipher which players will be good going forward out of the crop of guys with small sample sizes, then you are gonna have success since you can get a lot of those guys later in the draft, and the later picks are the ones that win leagues in both redraft and dynasty. Numbers alone aren't going to come close to helping you do that.

But hey, if some of you out there are able to combine both your views of players along with these numbers to make final decisions, more power to ya. I just can't help but think that the majority of the number crunching is just a waste of time. But if it works for YOU, well, keep doing it.
Agreed.

I see stats used non-stop in football discussions (FF and otherwise). Stats and math are used way too much and used incorrectly a large chuck of the time. In FF it is even worse since winning and losing is based off of end game numbers. The reason they are over used is because most people either haven't seen enough of players or don't know enough about football/positions/coaching etc. so they use stats.

I'm not saying stats and specifically, regression to the mean, don't have value and a place in our projections but it should be used as extra info not as a core piece of the puzzle.

" But if it works for YOU, well, keep doing it. "

PS: This wasn't directed at the OP or even anyone on this forum, mostly just a general problem that gets under my skin.

 
Last edited by a moderator:
Man, between this and that other thread on defenses and their ADP value or whatever, this seems like more of a math class (and I had a minor in math after my freshman year of college, so not like I am afraid of a little math). I would rather spend my time evaluating the players and their situations than going over all these numbers and trying to calculate mythical numbers that are only going to be based on my (or someone else's) ranking/projections anyway.

I would say the best fantasy players are the ones who look at these SMALL sample sizes anyway. If you have a player with a large sample size who does well, he is going to get drafted high. If you can decipher which players will be good going forward out of the crop of guys with small sample sizes, then you are gonna have success since you can get a lot of those guys later in the draft, and the later picks are the ones that win leagues in both redraft and dynasty. Numbers alone aren't going to come close to helping you do that.

But hey, if some of you out there are able to combine both your views of players along with these numbers to make final decisions, more power to ya. I just can't help but think that the majority of the number crunching is just a waste of time. But if it works for YOU, well, keep doing it.
Agreed.

I see stats used non-stop in football discussions (FF and otherwise). Stats and math are used way too much and used incorrectly a large chuck of the time. In FF it is even worse since winning and losing is based off of end game numbers. The reason they are over used is because most people either haven't seen enough of players or don't know enough about football/positions/coaching etc. so they use stats.

I'm not saying stats and specifically, regression to the mean, don't have value and a place in our projections but it should be used as extra info not as a core piece of the puzzle.

" But if it works for YOU, well, keep doing it. "

PS: This wasn't directed at the OP or even anyone on this forum, mostly just a general problem that gets under my skin.
I love when people use all kinds of statistical analysis when trying to offer me trades. I guess the main reason would be, WHY are you offering me the deal if you think your side is better??

 
At what point do you think you know 'the mean' for an individual player though?
You will never know a player's individual mean, but that's not really what regression to the mean is about. (If you knew a player's individual mean, you wouldn't have to do any regression: you'd just project him to achieve his individual mean.) The mean we're talking about is really the mean of all the players who, all things considered, seem most similar to the player in question in terms of physical talent, team situation, etc. (And since our initial expectations about a player will be determined by how players similar to him have performed, you can think of "regression to the mean" as "regression to our initial expectations.") To keep things somewhat simple so that we can use an example straight from Chase's article, let's say we're doing Year N projections for a quarterback that we really don't know anything about. All we know about him is that he's an NFL quarterback, so our initial expectation is that he'll be an average NFL quarterback. (This is a reasonable expectation because most NFL quarterbacks are somewhere around average, and anyway there are about as many significantly above average QBs as significantly below average QBs, so the two latter groups generally cancel each other out.) In Chase's article, go down to the table where he lists 12 NFL QBs (Brew Drees through QB12). If the universe of NFL quarterbacks consist only of those 12 QBs, our projection for any randomly selected QB that we know nothing about (and is therefore equally likely to be any of the listed QBs) would be 3,920 yards. That's the (equally) weighted average of all 12 QBs' expected performances, and so that's our Year N projection. Now suppose that in Year N, our mystery QB actually ends up in the 4,800-4,959 yard range. As Chase states in the article,

One time it will be QB6 (4800), and we'll project 4,000 yards the next year. Four times it will be QB3 (4880), and we'll project 4480 for the next season. And six times it will be Drees, and we'll project 4800 yards the next year.So what does that mean? In Year N, the 11 QB seasons that landed in the 4800-4959 range averaged 4,829 yards. In Year N+1, we'd project a weighted average of 4,610 yards for those quarterbacks.

So our initial expectation for Year N was 3,920 yards. The QB actually threw for 4,829 yards in Year N, and our projection for him in Year N+1 will be 4,610 yards. Note that our new projection (4,610) is in between our previous expectation (3,920) and his actual performance (4,829). That's what regression to the mean is about. (Remember that "the mean" is our prior expectations.) Regression to the mean says that, if we're just going by stats (and not taking into account information that doesn't show up in the stats, like a player was playing on a broken leg last year but will be healthy this year), our projection for this season should be somewhere in between our projection for last season and his actual performance last season. And when we take into account stuff like the typical variance of results over the sample size that we've observed, we can calculate precisely where in between those numbers we should be — i.e., how much regression is warranted. If all this stuff — updating our projections using a combination of previous expectations and actual observed data — reminds you of Bayes' theorem, you win a gold star. Regression to the mean, as we're using the phrase here, is a straightforward consequence of Bayes' theorem. (None of this accounts for the fact that, in the NFL, numerous factors that affect a player's performance will not show up in the stats. That's a separate issue. It suggests that regression to the mean isn't everything when it comes to doing projections — and of course it's not. Football players are not dice. A significant part of doing good projections has nothing to do with highfalutin mathematical concepts like regression to the mean; but understanding such concepts can be at least be a small help.)

A small sample size is more likely than a big one to be wrong, but a small sample size is also more likely to be close to right than completely wrong.
That depends on how big the sample size is, and how far away from the mean it is. It's a sliding scale. For any given sample size, the further away it is from the mean, the more likely it is to be anomalous. And for any given result, the smaller the sample is, the more likely it is to be anomalous.To take an extreme example, suppose we expect Giovani Bernard to average 4.2 yards per carry. On his first carry, he rushes for 50 yards. That is very likely quite anomalous: we should update our expectation from 4.200000000 only to 4.20000001 or so. In other words, a sample size that small and that far away from our previous expectation is not very likely at all to be close to right rather than completely wrong.

 
Last edited by a moderator:
To take an extreme example, suppose we expect Giovani Bernard to average 4.2 yards per carry. On his first carry, he rushes for 50 yards. That is very likely quite anomalous: we should update our expectation from 4.200000000 only to 4.20000001 or so. In other words, a sample size that small and that far away from our previous expectation is not very likely at all to be close to right rather than completely wrong.
It's worth pointing out something here which many people miss. The ypc expectation for future carries is only marginally adjusted upwards. The ypc expectation for all his carries, including the 50-yarder he just ripped off, goes up by 50 yards. So if you project him for 250 carries and 4.2 ypc (1050 yards), after the 50-yarder you should probably project him to get another 249 carries at 4.2 ypc, for 1096 yards and a season total of 4.38 ypc.

This will come up in the subscriber contest again this year, where someone would be complaining that Bernard "wasted" his 50-yarder in the first week.

 
I don't look at NFL averages when I'm figuring out what an individual player's projection should be. I use players I think are his historic comparables and try to extrapolate to his situation. So I was trying to suggest that if I've got Jamaal Charles with an expected career mean of 4.7 after his rookie season and he rips off a 6.0 season in his 2nd year there's no way for me to know, mathwise, how I should handle that 6.0. It still comes down to my judgment in the end. Is he really an uber elite player or did he just have a fluky year? IOW, regressing top end and elite players against the NFL mean, instead of the mean expected from a player of their ability, is going to understate their future performance (errr.... right?). That's what I was trying to point out with Cecil Shorts. Based on his best comps, I thought he was a good prospect coming into the league. Now that I see what he can do with opportunity I'm convinced. You don't have to be, but I am. So I don't think mixing him in with a bunch of marginally comparable players and using their performance as a guide for Shorts is particularly helpful. If you're going to regress his performance you need to use comparable players. Which comes down to judgment again. Always possible I'm mixing and matching concepts here. So if one of the true stats guys wants to correct that have a go at it.

ETA: tried to clean that up for clarity.

 
Last edited by a moderator:
To take an extreme example, suppose we expect Giovani Bernard to average 4.2 yards per carry. On his first carry, he rushes for 50 yards. That is very likely quite anomalous: we should update our expectation from 4.200000000 only to 4.20000001 or so. In other words, a sample size that small and that far away from our previous expectation is not very likely at all to be close to right rather than completely wrong.
It's worth pointing out something here which many people miss. The ypc expectation for future carries is only marginally adjusted upwards. The ypc expectation for all his carries, including the 50-yarder he just ripped off, goes up by 50 yards. So if you project him for 250 carries and 4.2 ypc (1050 yards), after the 50-yarder you should probably project him to get another 249 carries at 4.2 ypc, for 1096 yards and a season total of 4.38 ypc. This will come up in the subscriber contest again this year, where someone would be complaining that Bernard "wasted" his 50-yarder in the first week.
Yes, good point.

 
Last edited by a moderator:
I don't look at NFL averages when I'm figuring out what an individual player's projection should be. I use players I think are his historic comparables and try to extrapolate to his situation. So I was trying to suggest that if I've got Jamaal Charles with an expected career mean of 4.7 after his rookie season and he rips off a 6.0 season in his 2nd year there's no way for me to know, mathwise, how I should handle that 6.0. It still comes down to my judgment in the end. Is he really an uber elite player or did he just have a fluky year? IOW, regressing top end and elite players against the NFL mean, instead of the mean expected from a player of their ability, is going to understate their future performance (errr.... right?). That's what I was trying to point out with Cecil Shorts. Based on his best comps, I thought he was a good prospect coming into the league. Now that I see what he can do with opportunity I'm convinced. You don't have to be, but I am. So I don't think mixing him in with a bunch of marginally comparable players and using their performance as a guide for Shorts is particularly helpful. If you're going to regress his performance you need to use comparable players. Which comes down to judgment again. Always possible I'm mixing and matching concepts here. So if one of the true stats guys wants to correct that have a go at it. ETA: tried to clean that up for clarity.
Yes, you've got the basic concepts right, though I can add a few details tomorrow when I'm back on my desktop.
 
Has anyone actually compared these numbers to a more individual projection that uses the last 3 seasons of a players performance (or whole career or other methods) for their expected mean with the method described in Chase's article?

I am not convinced averaging Adrian Peterson to an average RB in the league is the best way to predict what he might do next season.

I am thinking to myself surely this has been tested hasn't it?

If it hasn't I am not sure why we should expect it to be any better, and the results for outlier players (top performers) are perhaps going to be worse but I haven't tested this either.

I like looking at MTs numbers after I finish mine. His projections for the most part are lower than what I projected. I think using regression to the mean is the main reason for this. Especially in an NFL that keeps adding more total offensive plays on a pretty steady trend for some time now, I am not sure how this accounts for that. It seems like it would be behind the trend a few seasons depending on how many seasons you use for that time frame.

 
Last edited by a moderator:
I am not convinced averaging Adrian Peterson to an average RB in the league is the best way to predict what he might do next season.
Note post #14 by Maurile where he says things like:
The mean we're talking about is really the mean of all the players who, all things considered, seem most similar to the player in question in terms of physical talent, team situation, etc. (And since our initial expectations about a player will be determined by how players similar to him have performed, you can think of "regression to the mean" as "regression to our initial expectations.")
The mean of an average RB in the league is not what you would be using dealing with Adrian Peterson. Unless of course you believe the total of Peterson and his situation is that of an average RB in the league.The easiest example with Peterson (also mentioned in MT's post) is Peterson's 2012 6.0 ypc. Do you think that's average production for someone like Peterson who was in Peterson's situation? Or do you think it was more of an outlier, even for an Adrian Peterson?If you think the latter, then even if you think Peterson's 2013 situation is exactly the same, you should probably project a lower ypc for him this year.
 
I am not convinced averaging Adrian Peterson to an average RB in the league is the best way to predict what he might do next season.
Note post #14 by Maurile where he says things like:

>The mean we're talking about is really the mean of all the players who, all things considered, seem most similar to the player in question in terms of physical talent, team situation, etc. (And since our initial expectations about a player will be determined by how players similar to him have performed, you can think of "regression to the mean" as "regression to our initial expectations.")
The mean of an average RB in the league is not what you would be using dealing with Adrian Peterson. Unless of course you believe the total of Peterson and his situation is that of an average RB in the league.The easiest example with Peterson (also mentioned in MT's post) is Peterson's 2012 6.0 ypc. Do you think that's average production for someone like Peterson who was in Peterson's situation? Or do you think it was more of an outlier, even for an Adrian Peterson?

If you think the latter, then even if you think Peterson's 2013 situation is exactly the same, you should probably project a lower ypc for him this year.
I do not think or believe any of those things you suggest.

I think it is very obvious that 6ypc is an outlier and not a realistic expectation. If anything I tend to be very conservative and close to league averages with most players in projected ypc unless they have a recent track record higher than that average to make me project them higher in that category.

That is not what Chase has done when creating these projections however: http://subscribers.footballguys.com/apps/article.php?article=stuart_starting_point_rb_proj

And while I could be wrong it seems to me that MT does quite a bit more than match some players by similarity scores when making a projection. I also do not know if doing this will create more accurate projections than a last 3 year method would.

 
I like looking at MTs numbers after I finish mine. His projections for the most part are lower than what I projected. I think using regression to the mean is the main reason for this.
I think staff members' projections are generally lower because they don't remove injuries. Projections should be made on a per game basis, since we choose starters that way. When people use injuries as a reason for regression to the mean, I don't think they are looking at things the right way, at least for the purpose of FF.

 
I like looking at MTs numbers after I finish mine. His projections for the most part are lower than what I projected. I think using regression to the mean is the main reason for this.
I think staff members' projections are generally lower because they don't remove injuries. Projections should be made on a per game basis, since we choose starters that way. When people use injuries as a reason for regression to the mean, I don't think they are looking at things the right way, at least for the purpose of FF.
I respect MT's work. Have for a long time now. That is why I like to check my numbers against his once I finish mine, as I usually have to come back to earth quite a bit. On the flip side of that my projections tend to be higher across the board than most. I think that is ok since that is happening with all of my player projections. Part of that is I have been bumping passing yardage because of the league trending upwards in passing yardage for quite some time now. I am wondering if this won't be reaching it's peak soon however and we may have a year where numbers fall back again before stabilizing. I do not think the plays and passing yardage can continue going up indefinitely. How many more plays will coaches like Chip Kelly add to that over next few seasons? At some point soon I think this will plateau but I do not think we are quite to that point yet.

As far as injuries go, I will not project for those but players who have a risky history get passed over until they are the only players remaining from their tier before I would consider them.

I would always prefer projections to be in the base stats. Number of rushing attempts, ypc, TD ect. having these numbers already converted into PPG by one scoring system makes them MUCH less useful than the raw numbers would be. You can convert the raw numbers into PPG for your scoring system then if you want to. You cannot do that if you do not have the base information. For example from Chase's article I linked I know he has AD projected for 285 FP by a scoring system of his choosing, but I have no idea how many rushing yards or TD or catches or any of that is. It isn't useful to me. I am not able to take that and convert it for non PPR league for example.

I think when you work with condensed numbers/advanced stats that you need to be very careful because any of the errors can very easily throw your whole sample out of whack.

Now if one could do their game by game projections in advance of the season or in May?

 
Last edited by a moderator:
I would always prefer projections to be in the base stats. Number of rushing attempts, ypc, TD ect. having these numbers already converted into PPG by one scoring system makes them MUCH less useful than the raw numbers would be. You can convert the raw numbers into PPG for your scoring system then if you want to. You cannot do that if you do not have the base information. For example from Chase's article I linked I know he has AD projected for 285 FP by a scoring system of his choosing, but I have no idea how many rushing yards or TD or catches or any of that is. It isn't useful to me. I am not able to take that and convert it for non PPR league for example.
I agree here. I was just saying I think most of their projections look at end of year stats rather than per game stats.

 
I don't look at NFL averages when I'm figuring out what an individual player's projection should be. I use players I think are his historic comparables and try to extrapolate to his situation.
This raises an important point that often goes unstated.Regression analysis is a mathematical tool. We feed a bunch of data into a computer, we make it buzz and whir for the better part of a few milliseconds, and it returns to us some statistical projections based on a bunch of fancy math formulas.Whenever we're using any mathematical tool, we have to be cognizant of what data we're feeding into it and what we're leaving out. The more useful data that we feed into it, assuming we're doing things correctly, the better the output will be. But real life is extremely complicated in all kinds of ways, so our mathematical models will necessarily be gross simplifications. We simplify in part by leaving out a lot of data that it's not practical to include. A simplified model will never be perfect, but general insights can be gained from such models that are likely to apply to more complicated scenarios as well; so they can be quite useful as tools.Certain principles are much easier to understand in the context of simplified situations. They are certainly much easier to explain in the context of simplified situations. So when trying to explain a certain concept, we might often say something like: "Pretend that all you know about a player is that he caught 55 passes for 979 yards last season." The statement means that that's all we're feeding into our mathematical tool, which will therefore remain ignorant of the player's height, his speed, his route-running skills, his quarterback's accuracy, and what he had for breakfast this morning. There's a lot we're not telling our computer about him, but nonetheless "55 catches for 979 yards" is at least something to go on, and it's definitely enough to demonstrate the essence of a concept like regression to the mean.Regression analysis can have lots of parameters or just a few. It's much easier to perform when it has just a few, so that's how it's often done, especially when it's done for expositional purposes.A very simplistic version of regression to the mean might go like this: for any WR who caught at least 35 passes in any season during the past decade, we feed into our computer a list of his receptions and receiving yards that season, and his receptions the following season. Then we ask our computer, "Computer, what formula that predicts receptions in Year N+1 based on his receptions and receiving yards in Near N will give us the smallest total error?" It will tell us something like "15 + 0.7 * (receptions in Year N) + 0.02 * (receiving yards in Year N)". [i'm just making that answer up; I'm not sure whether it's close to the real answer.]A more complicated, and probably more accurate, version would feed into the computer not just last year's catches and yards, but also other data from last season (yards after the catch, drop rate, etc.) as well as data from previous seasons, maybe data from college production (especially for younger players), maybe data about the player's speed or hands or jumping ability and so on.So if your point about Cecil Shorts is that you don't want to do projections for him by telling a computer "pretend all you know about him is that he caught 55 passes for 979 yards last season," I don't think anyone would disagree with you. That doesn't mean that the concept of regression to the mean is useless. It just means that it's not a substitute, especially in its crudest form, for a fuller, more nuanced analysis.The concept from regression to the mean that will always be relevant and useful, however, is this: when a player outperforms your previous expectations, it might be because your expectations were right on the mark and his outlying performance was strictly a consequence of a small sample size; or it might be because you had previously underrated him, and his performance was not an outlier at all, but accurately reflects his true ability. Most likely, though, the truth is somewhere in the middle of those two possibilities. Therefore, your future expectations should be higher than your previous expectations were, but lower than a continuation of the player's actual recent performance.

So I was trying to suggest that if I've got Jamaal Charles with an expected career mean of 4.7 after his rookie season and he rips off a 6.0 season in his 2nd year there's no way for me to know, mathwise, how I should handle that 6.0. It still comes down to my judgment in the end. Is he really an uber elite player or did he just have a fluky year?
Since the truth is probably somewhere in between, your revised expectations for his future YPC should probably be higher than 4.7 but lower than 6.0. Where in between those numbers should it be? There's a sliding scale. The more confident you were previously in your estimate of 4.7 YPC, the closer to 4.7 it should be. The greater the sample size that the 6.0 YPC was generated from, the closer to 6.0 it should be. In fact, if you can specify those criteria precisely, there are mathy ways to estimate exactly where in between 4.7 and 6.0 your new projection should be. (As always, the math would be based only on statistical information; extraneous factors, like injury news or a new offensive system or whatever, would have to be accounted for separately.)

IOW, regressing top end and elite players against the NFL mean, instead of the mean expected from a player of their ability, is going to understate their future performance (errr.... right?).
Right. Part of the challenge is figuring out whether a player is truly elite, or whether his elite-looking performance was merely a sham. But if you know that he's truly elite, you wouldn't be regressing him toward the overall NFL mean; you'd be regressing him toward the mean of the true elites.

 
Last edited by a moderator:
And while I could be wrong it seems to me that MT does quite a bit more than match some players by similarity scores when making a projection.
I actually don't use similarity scores at all. I think that using similarity scores would probably be the best way to do projections for rookies. I don't do it simply because I haven't spent the time yet to build that model. So for rookies, I just kind of wing it. But in the future I do plan to use something like similarity scores for them.For veterans, I do not use similarity scores because once a player has at least 30 or so runs or receptions, I think using information from his own individual past performance is more useful than using information from the performances of other players who were kind of like him in some ways, and this becomes more and more true with each additional run or reception he notches on his belt.Another note here. A player's rushing yards will be his rushes times his yards per carry. His receiving yards will be his targets times his catch rate times his yards per reception.In my opinion, regression analysis or something similar (I prefer Bayesian inference analysis to more traditional least-squares regression) can be very useful for projecting yards per carry, receptions per target, and yards per reception, but is nearly useless for projecting the number of carries or targets. (I project carries and targets by considering what type of offense the player is in and what role I think he'll have in it — stuff that I don't think regression analysis can very usefully help with.) Moreover, when we're projecting rushing yards for a player, getting his yards per rush right is generally far less important than getting his number of carries right. A running back whose long-term YPC is in the 75th percentile (about 4.35) will average only about 10% more yards per carry than one whose is in the 25th percentile (about 3.9). But RBs typically vary in their number of carries by far, far more than 10%.So in projecting an RB's expected yards this season, I think something like regression to the mean can be pretty helpful for the relatively minor component of the projection, but nearly useless for the more important component.Like I said in a previous post, I don't think anyone will tell you that regression to the mean is the be-all and end-all in doing player projections. Similarly, I don't think a player's speed is the be-all and end-all, or that his draft position is, or that his wonderlic score is. I don't think any one piece of data or any one type of analysis is the holy grail. All of these factors, and many more, are tiny pieces of a large puzzle. All deserve consideration. I think Chase's article did a good job of explaining the concept of regression to the mean. Maybe somebody else can write an article on the importance of a player's speed. Our job as fantasy owners is to make whatever use we can out of all the various concepts and information available. Understanding regression to the mean can be only a small part of that, but every little bit helps.

 
Last edited by a moderator:
Does this mean that if a coin is flipped and comes up heads that the next flip is more likely to be tails? Just kidding, very good thread.

 
Does this mean that if a coin is flipped and comes up heads that the next flip is more likely to be tails? Just kidding, very good thread.
It means it's more likely to be heads again.Before the coin was flipped, we were expecting 0.5 heads per flip. On the first flip, however, we actually got 1.0 heads. So our revised projection going forward should be somewhere between our previous projection and the obtained result: somewhere between 0.5 and 1.0 heads per flip.It should probably be around 0.5000001 heads per flip, give or take.The reason for the increase is that the result of the first flip must cause us to increase our estimate of the probability that it's actually a two-headed coin, or is otherwise somehow biased in favor of heads (while decreasing our estimate of the probability that it's biased in favor of tails). Suppose that one out of every million coins is double-headed while another one out of every million coins is double-tailed, and the remaining 999,998 out of every million coins are perfectly fair. Before the first flip, the double-headed and double-tailed coins were equally probable, and therefore canceled each other out, so our projection would be exactly 0.5 heads per flip. After the first flip, we've completely ruled out that the coin is double-tailed, so the double-headed and double-tailed possibilities no longer cancel each other out, and we should project more than 0.5 heads per flip. (If we are 100% certain that the coin is fair, then this doesn't apply, and we would still project 0.5 heads per flip.) The "regression to the mean" way of thinking about this is to say that we've observed a rate of 1.0 heads per flip over a small sample size, and our projection going forward should be revised downward from that to something less than 1.0 heads per flip, toward the mean (i.e., our previous expectation) of 0.5 heads per flip. In this case, way, way down, nearly all the way back to the mean.

 
Last edited by a moderator:
Thanks for the reply MT.

What are your thoughts on total number of plays in the NFL 2013 season?

2012 17788pa 13925ra 31713 plays
2011 17410pa 13971ra 31381 plays
2010 17269pa 13920ra 31189 plays
2009 17033pa 14088ra 31121 plays
2008 16526pa 14119ra 30645 plays
2007 17045pa 13986ra 31031 plays
2006 16389pa 14447ra 30836 plays
2005 16464pa 14375ra 30839 plays
2004 16354pa 14428ra 30782 plays -enforcement of 5yd rule

2003 16493pa 14508ra 31001 plays
2002 17292pa 14102ra 31394 plays

2001 16181pa 13666ra 29847 plays
2000 16322pa 13677ra 29999 plays
1999 16760pa 13548ra 30308 plays
1998 15489pa 13568ra 29057 plays
1997 15729pa 13639ra 29368 plays
1996 15966pa 13594ra 29560 plays
1995 16699pa 13199ra 29898 plays

1994 15056pa 12550ra 27606 plays
1993 14414pa 12684ra 27098 plays
1992 13408pa 12291ra 25699 plays

I think 17000 passing attempts is a floor but I could see rushing attempts increase to 14000 again if enough teams commit to it more. Should we expect the total number of plays to increase again? 31000 plays has been the most frequent. But rising again the last 2 seasons. What if the league runs 32000 plays in 2013?
 
Thanks for the reply MT.

What are your thoughts on total number of plays in the NFL 2013 season?

2012 17788pa 13925ra 31713 plays
2011 17410pa 13971ra 31381 plays
2010 17269pa 13920ra 31189 plays
2009 17033pa 14088ra 31121 plays
2008 16526pa 14119ra 30645 plays
2007 17045pa 13986ra 31031 plays
2006 16389pa 14447ra 30836 plays
2005 16464pa 14375ra 30839 plays
2004 16354pa 14428ra 30782 plays -enforcement of 5yd rule

2003 16493pa 14508ra 31001 plays
2002 17292pa 14102ra 31394 plays

2001 16181pa 13666ra 29847 plays
2000 16322pa 13677ra 29999 plays
1999 16760pa 13548ra 30308 plays
1998 15489pa 13568ra 29057 plays
1997 15729pa 13639ra 29368 plays
1996 15966pa 13594ra 29560 plays
1995 16699pa 13199ra 29898 plays

1994 15056pa 12550ra 27606 plays
1993 14414pa 12684ra 27098 plays
1992 13408pa 12291ra 25699 plays

I think 17000 passing attempts is a floor but I could see rushing attempts increase to 14000 again if enough teams commit to it more. Should we expect the total number of plays to increase again? 31000 plays has been the most frequent. But rising again the last 2 seasons. What if the league runs 32000 plays in 2013?
I think if we get to 14000 rush attempts, the total number of plays will decrease. The number of plays run per team game has been pretty steady throughout NFL history, and I wouldn't assume it's just going to increase in 2013.

 
What are your thoughts on total number of plays in the NFL 2013 season?
That's one of the things I'm changing in my projections method this season.

As an initial comment, I'd say that it's generally unimportant to accurately project the league-wide number of plays. If I'm 10% too high for every team across the board, or 10% too low, it's not going to affect my rankings or my auction values. When you multiply everyone's absolute value by 1.1, you're not changing their relative values at all, and fantasy football is all about relative values (with the rare exception of bonuses for 100 yards rushing/receiving or 300 yards passing, but that's such a small effect).

Nonetheless, I think it's a good habit to strive for as much accuracy as possible, so I'd prefer not to be off by 10% across the board even if it wouldn't affect my rankings.

My next comment is that I've always found it very difficult to accurately project the number of plays that a team will run.

In the past, I've tried to look for league-wide correlations between offensive plays and other stuff. Do teams run more plays in high-scoring games than in low-scoring games? Do teams run more plays in close games than in blowouts? Do teams run more plays when they're ahead than when they're behind? Etc. I was never able to come up with any such correlations that had much predictive value. Everything I tried was hardly better than just predicting the league-wide average number of plays for each team in each game.

By the very end of last season, I found something that worked much better. In hindsight, it's a total "duh!" What was I thinking before?

Instead of trying to come up with a single formula that works for every team, I now look at each team separately and come up with a formula that works just for that team. The Saints are completely different from the Bills, and the Patriots are completely different from the Seahawks, so why try a one-size-fits-all approach? (When doing weekly projections, I revise each team's formula each week, always basing it on the previous 16 regular-season games — well, technically the previous 17 weeks, so occasionally 15 or 17 games. For my preseason projections, I treat the season as a single representative game and then multiply the result by 16. So if I think the Seahawks will win 11 games this year with an average margin of victory of 6 points and an average total score by both teams of 39 points, then I figure their projected plays as if they were playing a single game that they had a 11/16 chance of winning, a betting line of -6 points, and an over-under of 39 points; then I multiply by 16 to get a full season's projection.)

So I don't project league-wide plays directly. I project each team's plays and then whatever they add up to would be my league-wide projection, I guess.

The method I've described is just the default starting point. I manually adjust any teams whose next game is expected to be different in some relevant way from the previous 16 games. Take the Eagles this season, for example. To account for the Chip Kelly effect, I'm currently tacking on an extra 60 plays above what I'd project if Andy Reid were still there. (I'm not sure that's enough, and may revise further upward if we hear of super high-tempo practices during training camp.)

So to answer your question, I'm not directly looking at league-wide trends and projecting the league-wide number of plays based on those trends.

But I am indirectly accounting for trends by making manual adjustments on a team-by-team basis when a team is expected to use more no-huddle than last year, etc. (which is part of what drives the trend). And by always basing my formulas on the previous 17 weeks, I am always staying somewhat current. If the league-wide number of plays goes up in 2013, by halfway through the season I'll be basing my projected plays 50% on 2013 patterns rather than 2012 patterns, so I'll automatically be catching up to some extent even before making any manual adjustments.

 
Last edited by a moderator:
Thanks for the reply MT.

What are your thoughts on total number of plays in the NFL 2013 season?

2012 17788pa 13925ra 31713 plays
2011 17410pa 13971ra 31381 plays
2010 17269pa 13920ra 31189 plays
2009 17033pa 14088ra 31121 plays
2008 16526pa 14119ra 30645 plays
2007 17045pa 13986ra 31031 plays
2006 16389pa 14447ra 30836 plays
2005 16464pa 14375ra 30839 plays
2004 16354pa 14428ra 30782 plays -enforcement of 5yd rule

2003 16493pa 14508ra 31001 plays
2002 17292pa 14102ra 31394 plays

2001 16181pa 13666ra 29847 plays
2000 16322pa 13677ra 29999 plays
1999 16760pa 13548ra 30308 plays
1998 15489pa 13568ra 29057 plays
1997 15729pa 13639ra 29368 plays
1996 15966pa 13594ra 29560 plays
1995 16699pa 13199ra 29898 plays

1994 15056pa 12550ra 27606 plays
1993 14414pa 12684ra 27098 plays
1992 13408pa 12291ra 25699 plays

I think 17000 passing attempts is a floor but I could see rushing attempts increase to 14000 again if enough teams commit to it more. Should we expect the total number of plays to increase again? 31000 plays has been the most frequent. But rising again the last 2 seasons. What if the league runs 32000 plays in 2013?
I think if we get to 14000 rush attempts, the total number of plays will decrease. The number of plays run per team game has been pretty steady throughout NFL history, and I wouldn't assume it's just going to increase in 2013.
That is sort of what I was thinking is that if the rushing attempts rose to 14000 again that perhaps that would grind enough clock that the passing attempts might decrease somewhat. One team that I think will increase its rushing attempts in 2013 are the Titans. There may be some other teams that do as well.

At the same time I am not assuming anything. That is why I asked a question.

I agree that the plays have been pretty steadily near 31000 since 2002, so for a little over a decade around that level now. At the same time I also see total plays steadily increasing almost every year, and last season they came close to 32000, so what if that upward trend does continue? Why do you think it will not?

Good stuff MT. I appreciate the thoughts on the Kelly offense and overall. I do much the same team by team (I am about half done now) but then at the end I look at total plays and +/- some plays from teams to meet the overall play expectation. 1k plays = 31.25 plays/team. If all of your projections add up to 31000 plays and the Nfl runs 32000 you will be off by about half a game. Which isn't a huge difference except that you might be 15-20 carries low on a lead RB and similarly with some of the pass attempts. Or likewise too high if you projected 32000 and it falls back closer to 31000.

 
Man, between this and that other thread on defenses and their ADP value or whatever, this seems like more of a math class (and I had a minor in math after my freshman year of college, so not like I am afraid of a little math). I would rather spend my time evaluating the players and their situations than going over all these numbers and trying to calculate mythical numbers that are only going to be based on my (or someone else's) ranking/projections anyway. I would say the best fantasy players are the ones who look at these SMALL sample sizes anyway. If you have a player with a large sample size who does well, he is going to get drafted high. If you can decipher which players will be good going forward out of the crop of guys with small sample sizes, then you are gonna have success since you can get a lot of those guys later in the draft, and the later picks are the ones that win leagues in both redraft and dynasty. Numbers alone aren't going to come close to helping you do that. But hey, if some of you out there are able to combine both your views of players along with these numbers to make final decisions, more power to ya. I just can't help but think that the majority of the number crunching is just a waste of time. But if it works for YOU, well, keep doing it.
Agreed. I see stats used non-stop in football discussions (FF and otherwise). Stats and math are used way too much and used incorrectly a large chuck of the time. In FF it is even worse since winning and losing is based off of end game numbers. The reason they are over used is because most people either haven't seen enough of players or don't know enough about football/positions/coaching etc. so they use stats. I'm not saying stats and specifically, regression to the mean, don't have value and a place in our projections but it should be used as extra info not as a core piece of the puzzle. " But if it works for YOU, well, keep doing it. " PS: This wasn't directed at the OP or even anyone on this forum, mostly just a general problem that gets under my skin.
I agree somewhat. I think people put too much emphasis on using past stats to predict a specific player's future performance.However, I think the area where a solid understanding of math and running a few numbers can be a real advantage for a fantasy owner is in devising a strategy regarding how to approach each position. There are so many different scoring setups and so many different alignments in terms of lineup requirements. Basically, I think the math is huge in determining stuff like "do I need to make sure I get a top TE?" "Should I draft a QB early?" Stuff like that...The cross position comparisons (WR A vs. RB X) really do require some math in my opinion. The WR A vs. WR B type stuff doesn't take too much math and should be based more on talent, situation, etc.
 
Last edited by a moderator:
I am somewhere in the middle - I think math and statistical analysis have a place in creating fantasy football projections, but I don't believe that focusing on "regression to the mean" is particularly helpful. I don't believe there is a normal distribution of expected passing yards - I think they are skewed individually based as much on situation (both offensive and defensive philosophy) as on a given skill set.

For starters, the sample size from which you are generating the stats is very small, and the games you are projecting are also a small set. This creates the potential for wide variances on both sides. Unlike baseball, where in a single season a batter is likely to see 500+ plate appearances, in football, a player typically gets 16 games.

In baseball this type of broad statistical analysis works better because over the 500+ plate appearances you expect a player to see a normal distribution of pitcher-types, parks and defenses. This provides greater confidence that the same batter will see similar situations in future seasons. Nonetheless, good baseball projections start with park-neutral stats - adjusting for advantages or disadvantages gained from any particular park - which are noticeable over the course of a season. Then, you would adjust the projections based on a weighted park-adjustment for the coming season.

Football projections should be doing something similar, imo. The disadvantage of only playing 16 games turns into an advantage here. In football, instead of assuming that a player faced all types of defenses (or offenses if doing IDP projections) - you know exactly which defenses he faced. There were only 13 teams the player faced the year before. So, it becomes much easier to create "defense-neutral" stats for a given player.

From a broad view, you can simply compare how a player performed against a team relative to how other players at that position performed against the same team. Not every 4000 yard season is created equally.

Taking a quick look at two QBs: Dalton and Bradford. Last year Bradford averaged 231 yards per game, and Dalton 229. But Bradford got his yards against teams that averaged giving up 225 yards per game (so, about 2.5% above average), while Dalton got his against teams giving up 230 yards - so very slightly below average. Based on that alone, I would expect Bradford to outperform Dalton by a wider margin in 2013.

When we look at the defenses for 2013 that these guys will face, Dalton actually faces tougher defenses this year - averaging 225 yards/gm, so I think it is reasonable to expect Dalton's yards to decline this year.

Conversely, the defenses Bradford will face in 2013, are statistically worse than in 2012 - averaging 228 yards/gm. Based on that, I expect Bradford's total passing yards to increase this year - creating the larger gap between him and Dalton. (From a fantasy perspective, there is a caveat here, even in a general pre-season projection: In Bradford's first 12 games (covering through week 13) the teams he faces are better than last year, giving up only 222 yards/game, which puts his total yards projection very close to Dalton's for the same time frame.)

You can get as granular as you have data, looking at various rates compared to your competition looking at yards/att, ints. tds, etc. If you were really motivated, you could look at how teams defended against similar offenses to try to glean greater clarity - comparing how teams defended the Redskins might not be particularly useful if you are projecting a more traditional pocket-passer.

Obviously there are a myriad of other factors that go into how a player will perform: how much natural improvement, injuries, teammate injuries, changes in offensive/defensive philosophy, changes in offensive/defensive personnel. Some may be apparent before the season starts, while others will take time to see. This is why I think the week-to-week projections in the season are far more important than pre-season season-long projections. Those are the projections that matter, and should be evolving, with heavy emphasis on the current season after about 4 weeks.

 
At what point do you think you know 'the mean' for an individual player though?
You will never know a player's individual mean, but that's not really what regression to the mean is about. (If you knew a player's individual mean, you wouldn't have to do any regression: you'd just project him to achieve his individual mean.) The mean we're talking about is really the mean of all the players who, all things considered, seem most similar to the player in question in terms of physical talent, team situation, etc. (And since our initial expectations about a player will be determined by how players similar to him have performed, you can think of "regression to the mean" as "regression to our initial expectations.") To keep things somewhat simple so that we can use an example straight from Chase's article, let's say we're doing Year N projections for a quarterback that we really don't know anything about. All we know about him is that he's an NFL quarterback, so our initial expectation is that he'll be an average NFL quarterback. (This is a reasonable expectation because most NFL quarterbacks are somewhere around average, and anyway there are about as many significantly above average QBs as significantly below average QBs, so the two latter groups generally cancel each other out.) In Chase's article, go down to the table where he lists 12 NFL QBs (Brew Drees through QB12). If the universe of NFL quarterbacks consist only of those 12 QBs, our projection for any randomly selected QB that we know nothing about (and is therefore equally likely to be any of the listed QBs) would be 3,920 yards. That's the (equally) weighted average of all 12 QBs' expected performances, and so that's our Year N projection. Now suppose that in Year N, our mystery QB actually ends up in the 4,800-4,959 yard range. As Chase states in the article,

One time it will be QB6 (4800), and we'll project 4,000 yards the next year. Four times it will be QB3 (4880), and we'll project 4480 for the next season. And six times it will be Drees, and we'll project 4800 yards the next year.So what does that mean? In Year N, the 11 QB seasons that landed in the 4800-4959 range averaged 4,829 yards. In Year N+1, we'd project a weighted average of 4,610 yards for those quarterbacks.

So our initial expectation for Year N was 3,920 yards. The QB actually threw for 4,829 yards in Year N, and our projection for him in Year N+1 will be 4,610 yards. Note that our new projection (4,610) is in between our previous expectation (3,920) and his actual performance (4,829). That's what regression to the mean is about. (Remember that "the mean" is our prior expectations.) Regression to the mean says that, if we're just going by stats (and not taking into account information that doesn't show up in the stats, like a player was playing on a broken leg last year but will be healthy this year), our projection for this season should be somewhere in between our projection for last season and his actual performance last season. And when we take into account stuff like the typical variance of results over the sample size that we've observed, we can calculate precisely where in between those numbers we should be — i.e., how much regression is warranted. If all this stuff — updating our projections using a combination of previous expectations and actual observed data — reminds you of Bayes' theorem, you win a gold star. Regression to the mean, as we're using the phrase here, is a straightforward consequence of Bayes' theorem. (None of this accounts for the fact that, in the NFL, numerous factors that affect a player's performance will not show up in the stats. That's a separate issue. It suggests that regression to the mean isn't everything when it comes to doing projections — and of course it's not. Football players are not dice. A significant part of doing good projections has nothing to do with highfalutin mathematical concepts like regression to the mean; but understanding such concepts can be at least be a small help.)

A small sample size is more likely than a big one to be wrong, but a small sample size is also more likely to be close to right than completely wrong.
That depends on how big the sample size is, and how far away from the mean it is. It's a sliding scale. For any given sample size, the further away it is from the mean, the more likely it is to be anomalous. And for any given result, the smaller the sample is, the more likely it is to be anomalous.To take an extreme example, suppose we expect Giovani Bernard to average 4.2 yards per carry. On his first carry, he rushes for 50 yards. That is very likely quite anomalous: we should update our expectation from 4.200000000 only to 4.20000001 or so. In other words, a sample size that small and that far away from our previous expectation is not very likely at all to be close to right rather than completely wrong.
my brain went :tfp:

:P

good stuff though,thanks!

if you're saying that A. Peterson will undoubtedly go back to being a 1400 yard RB this season, I agree. history tells us he won't come close to repeating the numbers this season..( regression the year after 2k season)..

to those who think Peterson is still the #1 RB at 1.01, I'll offer D. Martin as an alternative..he gets back two pro bowl linemen,and

his role will be expanded even further in year 2..so while ADP will regress, D. Martin should progress

 
Tanner9919 said:
At what point do you think you know 'the mean' for an individual player though?
You will never know a player's individual mean, but that's not really what regression to the mean is about. (If you knew a player's individual mean, you wouldn't have to do any regression: you'd just project him to achieve his individual mean.) The mean we're talking about is really the mean of all the players who, all things considered, seem most similar to the player in question in terms of physical talent, team situation, etc. (And since our initial expectations about a player will be determined by how players similar to him have performed, you can think of "regression to the mean" as "regression to our initial expectations.") To keep things somewhat simple so that we can use an example straight from Chase's article, let's say we're doing Year N projections for a quarterback that we really don't know anything about. All we know about him is that he's an NFL quarterback, so our initial expectation is that he'll be an average NFL quarterback. (This is a reasonable expectation because most NFL quarterbacks are somewhere around average, and anyway there are about as many significantly above average QBs as significantly below average QBs, so the two latter groups generally cancel each other out.) In Chase's article, go down to the table where he lists 12 NFL QBs (Brew Drees through QB12). If the universe of NFL quarterbacks consist only of those 12 QBs, our projection for any randomly selected QB that we know nothing about (and is therefore equally likely to be any of the listed QBs) would be 3,920 yards. That's the (equally) weighted average of all 12 QBs' expected performances, and so that's our Year N projection. Now suppose that in Year N, our mystery QB actually ends up in the 4,800-4,959 yard range. As Chase states in the article,

One time it will be QB6 (4800), and we'll project 4,000 yards the next year. Four times it will be QB3 (4880), and we'll project 4480 for the next season. And six times it will be Drees, and we'll project 4800 yards the next year.So what does that mean? In Year N, the 11 QB seasons that landed in the 4800-4959 range averaged 4,829 yards. In Year N+1, we'd project a weighted average of 4,610 yards for those quarterbacks.

So our initial expectation for Year N was 3,920 yards. The QB actually threw for 4,829 yards in Year N, and our projection for him in Year N+1 will be 4,610 yards. Note that our new projection (4,610) is in between our previous expectation (3,920) and his actual performance (4,829). That's what regression to the mean is about. (Remember that "the mean" is our prior expectations.) Regression to the mean says that, if we're just going by stats (and not taking into account information that doesn't show up in the stats, like a player was playing on a broken leg last year but will be healthy this year), our projection for this season should be somewhere in between our projection for last season and his actual performance last season. And when we take into account stuff like the typical variance of results over the sample size that we've observed, we can calculate precisely where in between those numbers we should be — i.e., how much regression is warranted. If all this stuff — updating our projections using a combination of previous expectations and actual observed data — reminds you of Bayes' theorem, you win a gold star. Regression to the mean, as we're using the phrase here, is a straightforward consequence of Bayes' theorem. (None of this accounts for the fact that, in the NFL, numerous factors that affect a player's performance will not show up in the stats. That's a separate issue. It suggests that regression to the mean isn't everything when it comes to doing projections — and of course it's not. Football players are not dice. A significant part of doing good projections has nothing to do with highfalutin mathematical concepts like regression to the mean; but understanding such concepts can be at least be a small help.)

A small sample size is more likely than a big one to be wrong, but a small sample size is also more likely to be close to right than completely wrong.
That depends on how big the sample size is, and how far away from the mean it is. It's a sliding scale. For any given sample size, the further away it is from the mean, the more likely it is to be anomalous. And for any given result, the smaller the sample is, the more likely it is to be anomalous.To take an extreme example, suppose we expect Giovani Bernard to average 4.2 yards per carry. On his first carry, he rushes for 50 yards. That is very likely quite anomalous: we should update our expectation from 4.200000000 only to 4.20000001 or so. In other words, a sample size that small and that far away from our previous expectation is not very likely at all to be close to right rather than completely wrong.
my brain went :tfp:

:P

good stuff though,thanks!

if you're saying that A. Peterson will undoubtedly go back to being a 1400 yard RB this season, I agree. history tells us he won't come close to repeating the numbers this season..( regression the year after 2k season)..

to those who think Peterson is still the #1 RB at 1.01, I'll offer D. Martin as an alternative..he gets back two pro bowl linemen,and

his role will be expanded even further in year 2..so while ADP will regress, D. Martin should progress
I can only shake my head at Tanner9919's post here.

First, MT is projecting 1700 rushing yards for AP.

Second, you're using history to say that players gain much less than 2000 yards in year N+1......man, that's really going out on a limb there. But note that AP isn't your typical RB. He's the best RB the league right now and will go down as one of the best of all time. If there's any RB that can go for 2000+ in year N+1, it's Peterson.

To think that Peterson will be a 1400 rushing yard back is just laughable. To think that, two things have to come in place...1) significantly less carries, and 2) YPC to be in the 4.3 to 4.5 range. He's averaged 21 touches a game for his entire career. What makes you think that's going to change? And his WORST YPC is 4.4. WORST. He average 5.0 YPC for his whole career. So I don't see how you can't project AT LEAST 1500+ yards rushing and 1750 total yards, along with 13-18 total TDs. Will Peterson be the #1 back this year? I think so, but worst case scenario, he's a top 4 back and that's money in the bank.

You're assuming Martin will improve on his stats last year. Remember that Martin accumulated 26% of his total yards and 50% of his TDs in TWO games, one of them being a historic one against a pathetic OAK run defense. In his other 14 games, he touched the ball 307 times for 1440 total yards and 6 TDs. He had only 4 ypc in these games. Martin had 9 plays of 30+ yards. That will be hard to duplicate (you can't regress Peterson and not regress Martin.....or that's cherry picking). Martin also touched the ball 368 times last year. Do you really think he's going to touch it more than that AND keep his YPC and YPR the same? That's laughable.

Sure TB is getting their better lineman back, so my projection for Martin is better than 4 YPC that he had in his non blow-up games. But to assume Martin's previous year is the benchmark for this year.....that's very risky.

I will gladly take Peterson's track record and talent over Martin all day and every day for 2013. Dynasty, that's a different story.

 
Great article Chase, and MT and others, I really like the analysis. But, as I have stated in the other Regression thread, I disagree with how this concept is generally used in FF.

Chase's article is a great example, particularly this quote: "When an impressive feat is hit, there's a good bit of luck involved." This is, I believe, the most important part of the article.

​The article uses examples of fictitious QB's and their stats and predicts their future performance based upon situations (age, etc.) not changing. This is emphasized in the article, and I understand it is used to make a point. The problem I have is the quote above - the article is predicated on the assumption, stated as fact, that impressive feats require a good deal of luck. This may be true or it may not be (I tend to think it is not true, but I will admit I very well could be wrong). But my point is that we really don't know. Situations change all the time in the NFL. Something as seemingly benign as bringing in a new left guard could add a few hundred yards to a QB's stats. Coaches come and go all the time. Players are cut and signed all the time. Rules change favoring the passing game. All which means, players' situations are always changing.

My issue with using the Regression to the Mean concept as applied to FF is that I think we too often use the concept as a crutch to lower (or raise) projections, instead of looking at the underlying reasons why we think numbers will change. To say, as Chase did in his article, that Brees threw for 5400 yards in 2011 so we could not project him to do that again is not, IMO, the correct way to look at it. If you did look at it that way, you probably didn't project him for his 2012 stats of 5200 and 43 TD's. Instead of saying that his 2011 season was an outlier so he must come down to earth, how about saying that he lost his head coach and his deep threat, so his numbers might drop a little? I would much prefer to hear the reasons why we think a player will regress than hear he will regress because of a statistical anomaly.

I have a good buddy who got with some really smart MIT guys and built a simulator to predict baseball games. They took years of past data and performed all kinds of analyses on it and came up with what they thought to be a great predictor of baseball games. My one comment early on to them was, "Don't fit the tool to the data." Well, after losing a pretty big amount of money, they agreed that is in fact what they did. I feel like using Regression to the Mean in FF analysis as a statistical estimator of performance is similar. We are using a tool and fitting it to the data without taking into account the endless factors that really go into performance.

Please don't misinterpret what I am saying. I agree there is some value to looking at historical data and using it to make projections, but I think it should be in more general terms. When we start looking at individual players and we try to make predictions about their performance, and we discount their projected numbers because we think their previous numbers were just too high, I think we do ourselves a disservice.

 
Tom: It's finally here! Week one of the NFL season!

Jerry: I know, I'm really excited. I have Montee Ball on my fantasy team and he's starting tonight against the Ravens.

Tom: Yeah. Oh, look, the game is starting.

Jerry: Broncos ball. Kickoff is a touchback. Here's the first offensive play.

Tom: Ooh, a handoff to Ball and . . . LOOK AT HIM GO!

Jerry: Wow, a 40-yard run on his first play. That's a great sign! Nothing much has changed between the last play and this one coming up. It's the same offensive line, same system, same coaching, and so on, so I don't see why he won't break off another 40-yard run.

Tom: LOL.

Jerry: I'm serious.

Tom: He's not going to break off another 40-yard run. Trust me.

Jerry: Why not? What's changed since the last run?

Tom: Nothing has to have changed. It's just that 40-yard runs are pretty rare. They don't happen very often. He got lucky that his first carry was a 40-yarder, but he's very unlikely to repeat it on his second carry.

Jerry: Lucky? Are you watching the same game I am? That wasn't luck. He made a great cut at the line then accelerated through the hole, broke a tackle, and outran the linebacker until the safety got the angle on him.

Tom: Well, yeah, it was a very nice run, but --

Jerry: But nothing. It was not luck. That was all skill. The line blocked skillfully and Montee Ball ran skilfully. It's not like his skill is going to deteriorate all of a sudden after one play. So name one reason why he's unlikely to run for 40 yards again on the next play.

Tom: Regression to the mean.

Jerry: That's a cop-out. It's not a real reason. I mean, give me a reason that makes sense. Did he injure himself on the first play? Did his offensive line get tired running down the field? Did the defense pick up a tell on him that will help them diagnose the play better? Give me a real reason!

Tom: There are a great many reasons that he might not rush for 40 yards on the next play. Maybe his blocker will slip. Maybe Ball will fumble. Maybe he won't be able to bounce it outside this time because the defensive end will have better contain. Maybe the middle linebacker won't hesitate on his first step this time. I could go on forever. There are about a million reasons why somebody -- even Montee Ball -- might not rush for 40 yards on any given carry. The reasons why he might not do it are far more numerous than the reasons why he might do it. That's why 40-yard rushes are so rare, and that's all I mean by "regression to the mean."

Jerry: But you don't know that 40-yard rushes are rare for Montee Ball. Sure, they're rare for dopes like Adrian Peterson, but they might be common occurrences for Ball.

Tom: Here's why that's unlikely. For Ball to commonly run for 40 yards like it's nothing out of the ordinary -- for that to be anywhere near his average run -- he'd have to be the most awesome running back in the history of awesomeness. That's possible, but it's a needlessly extraordinary way to explain his 40-yard run on his first carry. I mean, either he's the best running back ever and that was an ordinary carry for him, or he's a fairly ordinary running back and that was an exceptional carry. Which is more likely? Well, the best running back ever occurs about once per universe, while 40-yard carries by ordinary running backs happen several times a week during the NFL season. Since "once per universe" is a lot less common than "several times a week," I'm going to bet on the second possibility.

Jerry: I guess that makes sense.

Tom: It's the same with Drew Brees throwing for over 5400 yards in 2011. Either he's the best quarterback in the history of ever and that was an ordinary season for him, or he's merely a future Hall of Fame quarterback (but lacking super powers) and he got a bit lucky in 2011.

Jerry: But that's what I take issue with. It's not luck. He's tremendously skilled.

Tom: Semantics. I'm using "luck" as a synonym for "positive variance." Brees doesn't get the exact same result on every play or in every season. Some seasons end up being statistically better than others. There's variance. The ones that are particularly good constitute positive variance, and the ones that are particularly bad constitute negative variance. Moreover, an awful lot of his variance has nothing to do with his own skill. It has to do injuries, weather, strength of schedule, playcalling, and about a zillion other things. I like to call all the stuff that's out of his control "luck," but you can call it something else if you want to. Either way, it's variance. The point is that throwing for 5,400 yards is probably on the positive side of Brees' variance; and if so, he's likely to regress to the mean rather than repeating the feat.

Jerry: Maybe he will, but that's another thing I take issue with. It seems like a cop-out to just say "regression to the mean" without giving a more concrete reason why.

Tom: You can think of it as a burden-of-proof issue. Since 5,400-yard seasons are so exceedingly rare, I don't think a person has to explain why Brees won't throw for 5,400+ yards. Rather, a person has to explain why Brees will throw for 5,400+ yards. The default position is that he won't. The reason that's the default position is because, as with Montee Ball's 40-yard run, there are many, many more reasons why a person (even Drew Brees) might not throw for 5,400+ yards than there are reasons why he might. The reasons are too numerous to list, which is why we say "regression to the mean" as shorthand.

Jerry: Whatever. You may not have noticed with all your blabbing, but it's almost halftime and Montee Ball has rushed for 480 yards on 12 carries.

Tom: :tebow:

 
Last edited by a moderator:
Maurile Tremblay said:
Tom: It's finally here! Week one of the NFL season!

Jerry: I know, I'm really excited. I have Montee Ball on my fantasy team and he's starting tonight against the Ravens.

Tom: Yeah. Oh, look, the game is starting.

Jerry: Broncos ball. Kickoff is a touchback. Here's the first offensive play.

Tom: Ooh, a handoff to Ball and . . . LOOK AT HIM GO!

Jerry: Wow, a 40-yard run on his first play. That's a great sign! Nothing much has changed between the last play and this one coming up. It's the same offensive line, same system, same coaching, and so on, so I don't see why he won't break off another 40-yard run.

Tom: LOL.

Jerry: I'm serious.

Tom: He's not going to break off another 40-yard run. Trust me.

Jerry: Why not? What's changed since the last run?

Tom: Nothing has to have changed. It's just that 40-yard runs are pretty rare. They don't happen very often. He got lucky that his first carry was a 40-yarder, but he's very unlikely to repeat it on his second carry.

Jerry: Lucky? Are you watching the same game I am? That wasn't luck. He made a great cut at the line then accelerated through the hole, broke a tackle, and outran the linebacker until the safety got the angle on him.

Tom: Well, yeah, it was a very nice run, but --

Jerry: But nothing. It was not luck. That was all skill. The line blocked skillfully and Montee Ball ran skilfully. It's not like his skill is going to deteriorate all of a sudden after one play. So name one reason why he's unlikely to run for 40 yards again on the next play.

Tom: Regression to the mean.

Jerry: That's a cop-out. It's not a real reason. I mean, give me a reason that makes sense. Did he injure himself on the first play? Did his offensive line get tired running down the field? Did the defense pick up a tell on him that will help them diagnose the play better? Give me a real reason!

Tom: There are a great many reasons that he might not rush for 40 yards on the next play. Maybe his blocker will slip. Maybe Ball will fumble. Maybe he won't be able to bounce it outside this time because the defensive end will have better contain. Maybe the middle linebacker won't hesitate on his first step this time. I could go on forever. There are about a million reasons why somebody -- even Montee Ball -- might not rush for 40 yards on any given carry. The reasons why he might not do it are far more numerous than the reasons why he might do it. That's why 40-yard rushes are so rare, and that's all I mean by "regression to the mean."

Jerry: But you don't know that 40-yard rushes are rare for Montee Ball. Sure, they're rare for dopes like Adrian Peterson, but they might be common occurrences for Ball.

Tom: Here's why that's unlikely. For Ball to commonly run for 40 yards like it's nothing out of the ordinary -- for that to be anywhere near his average run -- he'd have to be the most awesome running back in the history of awesomeness. That's possible, but it's a needlessly extraordinary way to explain his 40-yard run on his first carry. I mean, either he's the best running back ever and that was an ordinary carry for him, or he's a fairly ordinary running back and that was an exceptional carry. Which is more likely? Well, the best running back ever occurs about once per universe, while 40-yard carries by ordinary running backs happen several times a week during the NFL season. Since "once per universe" is a lot less common than "several times a week," I'm going to bet on the second possibility.

Jerry: I guess that makes sense.

Tom: It's the same with Drew Brees throwing for over 5400 yards in 2011. Either he's the best quarterback in the history of ever and that was an ordinary season for him, or he's merely a future Hall of Fame quarterback (but lacking super powers) and he got a bit lucky in 2011.

Jerry: But that's what I take issue with. It's not luck. He's tremendously skilled.

Tom: Semantics. I'm using "luck" as a synonym for "positive variance." Brees doesn't get the exact same result on every play or in every season. Some seasons end up being statistically better than others. There's variance. The ones that are particularly good constitute positive variance, and the ones that are particularly bad constitute negative variance. Moreover, an awful lot of his variance has nothing to do with his own skill. It has to do injuries, weather, strength of schedule, playcalling, and about a zillion other things. I like to call all the stuff that's out of his control "luck," but you can call it something else if you want to. Either way, it's variance. The point is that throwing for 5,400 yards is probably on the positive side of Brees' variance; and if so, he's likely to regress to the mean rather than repeating the feat.

Jerry: Maybe he will, but that's another thing I take issue with. It seems like a cop-out to just say "regression to the mean" without giving a more concrete reason why.

Tom: You can think of it as a burden-of-proof issue. Since 5,400-yard seasons are so exceedingly rare, I don't think a person has to explain why Brees won't throw for 5,400+ yards. Rather, a person has to explain why Brees will throw for 5,400+ yards. The default position is that he won't. The reason that's the default position is because, as with Montee Ball's 40-yard run, there are many, many more reasons why a person (even Drew Brees) might not throw for 5,400+ yards than there are reasons why he might. The reasons are too numerous to list, which is why we say "regression to the mean" as shorthand.

Jerry: Whatever. You may not have noticed with all your blabbing, but it's almost halftime and Montee Ball has rushed for 480 yards on 12 carries.

Tom: :tebow:
The issue is how many samples do we need before we stop saying it's "luck" and start saying it's just the way it is.

Obviously, one run is not enough. I tend to think an entire season may be enough.

And I'm just not really understanding the Brees example. To say that 5400 yards is an anomaly, but the next year he throws for 5200, doesn't seem to make a lot of sense. We are talking about a 4% difference in his yards from one year to the next. I'd be very happy if all my projections were off by only 4%,

ETA: Assuming a team runs about 70 plays per game (I am really not sure of the number), then we are talking about 1,120 plays for the year. That's a lot more than one and a lot closer to being "just the way it is" as opposed to "luck."

 
Last edited by a moderator:
To say that 5400 yards is an anomaly, but the next year he throws for 5200, doesn't seem to make a lot of sense.
As Steve Spurrier once said, "Hindsight is 50/50."
It's not hindsight. The only people who didn't project Brees for over 5000 yards again in 2012 were those who were so hung up on Regression to the Mean and calling his 2011 season "luck," that they over-discounted his 2011 performance.

 
The issue is how many samples do we need before we stop saying it's "luck" and start saying it's just the way it is.

Obviously, one run is not enough. I tend to think an entire season may be enough.

And I'm just not really understanding the Brees example. To say that 5400 yards is an anomaly, but the next year he throws for 5200, doesn't seem to make a lot of sense. We are talking about a 4% difference in his yards from one year to the next. I'd be very happy if all my projections were off by only 4%,

ETA: Assuming a team runs about 70 plays per game (I am really not sure of the number), then we are talking about 1,120 plays for the year. That's a lot more than one and a lot closer to being "just the way it is" as opposed to "luck."
If things go well for Brees, then yeah, he can throw for 5,000 yards in 2013, too. But regression to the mean understand that things don't always go well all the time. Brees could get hurt, in which case he would way undershoot his projections. The Saints could pass way less, in which case he's not going to hit 5000 yards again.

You might say "there's no way the Saints will pass less" but you only need to go back to 2009 when Brees threw 514 passes. Regression to the mean says there are things outside of a player's control that impact his stats: if those things all point in one direction in Year N, you should lower your expectations in Year N+1.

In 2009, the Saints scored 9 D/ST touchdowns. That both limits the number of drives for the Saints offense and reduces their urgency to score points. In 2012, they scored 5 D/ST TDs. The 2009 defense allowed 341 points, while the 2012 defense allowed 454 points. The 2009 Saints average drive started at the 31.4, while the average drive started at the 24.5 last year. The Saints called 370 rushing plays last year and 468 in 2009.

Brees has no control over any of those things. If you think the Saints won't have the worst average starting field position in 2013, then maybe Brees passes for fewer yards. If you think the Saints will outscore opponents by 10.6 points instead of 0.4 points on a per-game basis, then maybe Brees passes for fewer yards.

Regression to the mean involves looking at all the external factors and getting a sense of how they impacted the numbers a player produced. If you think saying "I love Brees, but I'm only going to project him for 4700 yards this year because he might get hurt, and the defense might be better under Rob Ryan, and the team might run more if the team is winning, and the Saints could get some better luck in turnovers/average starting field position which would mean less passes" is better than saying "regression to the mean", okay, but they're really the same thing.

 
One more thing. I'm all on board with considering "luck" as a factor in stats. For example, two years ago when Victor Cruz had all those long TD's, everyone agreed he wouldn't do it again because he got a bit "lucky" that year. There were a few specific plays you could point to that really stood out, and we all agreed his stats were fluky.

But I don't agree that you can apply that same reasoning to any player who has a great year or who has a year. Sometimes things come together in the right ways for a team/player and it all just works.

 
The issue is how many samples do we need before we stop saying it's "luck" and start saying it's just the way it is.

Obviously, one run is not enough. I tend to think an entire season may be enough.

And I'm just not really understanding the Brees example. To say that 5400 yards is an anomaly, but the next year he throws for 5200, doesn't seem to make a lot of sense. We are talking about a 4% difference in his yards from one year to the next. I'd be very happy if all my projections were off by only 4%,

ETA: Assuming a team runs about 70 plays per game (I am really not sure of the number), then we are talking about 1,120 plays for the year. That's a lot more than one and a lot closer to being "just the way it is" as opposed to "luck."
If things go well for Brees, then yeah, he can throw for 5,000 yards in 2013, too. But regression to the mean understand that things don't always go well all the time. Brees could get hurt, in which case he would way undershoot his projections. The Saints could pass way less, in which case he's not going to hit 5000 yards again.

You might say "there's no way the Saints will pass less" but you only need to go back to 2009 when Brees threw 514 passes. Regression to the mean says there are things outside of a player's control that impact his stats: if those things all point in one direction in Year N, you should lower your expectations in Year N+1.

In 2009, the Saints scored 9 D/ST touchdowns. That both limits the number of drives for the Saints offense and reduces their urgency to score points. In 2012, they scored 5 D/ST TDs. The 2009 defense allowed 341 points, while the 2012 defense allowed 454 points. The 2009 Saints average drive started at the 31.4, while the average drive started at the 24.5 last year. The Saints called 370 rushing plays last year and 468 in 2009.

Brees has no control over any of those things. If you think the Saints won't have the worst average starting field position in 2013, then maybe Brees passes for fewer yards. If you think the Saints will outscore opponents by 10.6 points instead of 0.4 points on a per-game basis, then maybe Brees passes for fewer yards.

Regression to the mean involves looking at all the external factors and getting a sense of how they impacted the numbers a player produced. If you think saying "I love Brees, but I'm only going to project him for 4700 yards this year because he might get hurt, and the defense might be better under Rob Ryan, and the team might run more if the team is winning, and the Saints could get some better luck in turnovers/average starting field position which would mean less passes" is better than saying "regression to the mean", okay, but they're really the same thing.
But they are not the same thing. I agree with you that they are the same thing in principle, but not in practice.

My biggest problem with using the expression, "Regression to the Mean" is that too many people throw it out as a reason someone's stats will decline (a quick look at almost any thread in the Shark Pool talking about player stats will show this) without giving any argument other than Regression. That does not allow any discussion on the topic. I would MUCH rather have someone post the bolded part above and give the actual reasons why they think someone's stats will regress. That way we can all discuss it. Because I don't think the defense will be better under Ryan. Let's talk about that.

So I agree with everything you posted above. My plea is, and has been since last year's thread, that we start talking about the reasons instead of a concept that is too often used as a cop out.

 
My plea is, and has been since last year's thread, that we start talking about the reasons instead of a concept that is too often used as a cop out.
But this is a thread about the concept. I agree you should question "regression to the mean" as the only argument in a player spotlight thread. It would be great if someone could clearly state all the luck factors they saw the in year N.

However, if all I had to go on was year N stats and a sense of how much of an outlier those stats are, I'd be ok with simply saying "regression to the mean" as an argument for year N+1 projections. Then it's a question of how much additional data is needed to say, "I no longer believe that was an outlier that has to regress to the mean next year."

 
My plea is, and has been since last year's thread, that we start talking about the reasons instead of a concept that is too often used as a cop out.
But this is a thread about the concept. I agree you should question "regression to the mean" as the only argument in a player spotlight thread. It would be great if someone could clearly state all the luck factors they saw the in year N.

However, if all I had to go on was year N stats and a sense of how much of an outlier those stats are, I'd be ok with simply saying "regression to the mean" as an argument for year N+1 projections. Then it's a question of how much additional data is needed to say, "I no longer believe that was an outlier that has to regress to the mean next year."
This is the Internet age. There is no reason why you would ever just have "N stats." This is a FF community. There is no reason why we shouldn't be delving into reasons for stat variations instead of just saying that stats will vary.

 
The issue is how many samples do we need before we stop saying it's "luck" and start saying it's just the way it is.
Twenty-six.

Just kidding. There are ways to use math to answer your question precisely, but it's all full of sliding scales, so the short answer is: it depends.

First, whether something is "luck" or "just the way it is" is not a binary set of possibilities. Almost everything important in football is a combination of both. When Drew Brees attempts a pass, he does not get the same result every time. Some of his passes are complete and some are incomplete, so there his some variance. Moreover, some of the variance is caused by factors under his control, and some is not. Because of the variance caused by factors under his control, we can be pretty confident that every time he drops back to pass, he should be expected to get a better result (or at least not a worse result) than a random grandma off the street would. What he does takes skill, and Drew Brees has lots of it. But skill doesn't explain all of his results. (If it did, since his skill is pretty much constant, his results would be a lot more constant. But in fact, from play to play and even from season to season, his results vary quite a bit.)

So how do we know how much of a result we should reasonably attribute to "luck" and how much we should attribute to "just the way it is"? As I said before, it depends.

It depends on the size of the sample of we've observed. It depends on how much variance we can expect between samples of that size. It depends on how greatly the observed sample differs from our prior expectations (and it depends on the reasonableness of those prior expectations, which is the one element that may be outside the purview of math).

Ultimately, since everything depends, I think there are situations where the point you're making is very important and right on the money. I just don't think the Drew Brees 2011 season is the best example of such a situation.

It's not the worst example, either, which is why I invented the Montee Ball situation. Ultimately, regression to the mean can be very helpful in certain situations and much less helpful in other situations. I used the Montee Ball situation to show that, at least in certain contexts, it's silly to downplay or dismiss it. We really don't need to list any specific reasons why we shouldn't expect Montee Ball to gain 40 yards on his next carry. Regression to the mean is very powerful in this situation because all of the factors I listed are leaning toward the "luck" end of the spectrum instead of the "just the way it is" end. The observed sample consisted of just a single run. The results of any single run will frequently vary quite widely from the norm. And before the game started, we were not expecting him to run for 40 yards on his first carry (which suggests that the 40-yard run was an outlier). It's silly to spend time looking for extra reasons why his second carry probably won't live up to his first (maybe his blocking won't be as good, etc.). Just pointing to regression to the mean isn't a cop-out: it's efficient.

At the other end of the extreme, I do think there are situations where we should pretty much ignore regression analysis and look instead only to the kinds of factors you mentioned -- not because the principles underlying regression would give us unreasonable answers if we used them correctly, but because using them correctly is completely impractical and the most natural simplifications would sacrifice way too much accuracy.

As an example, let's try to project what percentage of the Seahawks' targets will go to Golden Tate this season. One simplified way we could do this is to look at what percentage he got last year, and then ask our computer: "Using some fancy math, can you tell me what percentage of targets a player will typically get in Year N+1, given that he got X% in Year N?" The computer will give us an answer, but it won't necessarily be a very good one. There are way too many factors it's not considering. There are some very obvious ones, like some of the computer's data will include players who retired between Year N and Year N+1, so we need to tell the computer to ignore those guys (unless Tate retires tomorrow, in which case we need to tell the computer to focus only on those guys). That's easy enough to account for. Slightly harder, but still doable, is to account for things like maybe Tate became more effective toward the end of the season last year, so his role should expand this season. We can look at other receivers who became more effective toward the end of previous seasons and see how that affected their targets next season. Our analysis is starting to get more complex at this point, but it's still not going to do a very good job of projecting Tate's targets in 2013. For that, we'd have to somehow take into account the Percy Harvin signing. There are statistical, data-driven ways to do that, I guess, but at this point we're making things very complicated, and we're really narrowing down the amount of data that can meaningfully drive our analysis. (How many times in the past has there been a situation comparable to the Percy Harvin signing in its likely effects on a Golden Tate-type of receiver?) The smaller the sample size of our true comparable data, the less predictive value it will have.

To project what percentage of the Seahawks' targets Golden Tate will get in 2013, I'd pretty much throw any fancy math out the window and just try to envision how Percy Harvin's presence will affect the distribution of targets on the Seahawks. What role will Harvin, Rice, and Tate play in the offense (and how likely is it that Baldwin or someone else will be a factor)? Will the team use fewer two-TE sets? More three-WR sets? And so on.

In other words, I think are contexts where regression analysis or something similar is pretty much all you need in order to make the best projection possible, and I think there are contexts where regression analysis is impractical to use well and should be ignored in favor of other (less mathy, more intuitive) types of analysis.

I personally think that the Drew Brees example is a little closer to the Montee Ball situation than the Golden Tate situation when it comes to wondering whether he was likely to repeat his 2011 performance in 2012, but it's admittedly somewhere in between them.

 
Last edited by a moderator:
The issue is how many samples do we need before we stop saying it's "luck" and start saying it's just the way it is.
Twenty-six.

Just kidding. There are ways to use math to answer your question precisely, but it's all full of sliding scales, so the short answer is: it depends.

First, whether something is "luck" or "just the way it is" is not a binary set of possibilities. Almost everything important in football is a combination of both. When Drew Brees attempts a pass, he does not get the same result every time. Some of his passes are complete and some are incomplete, so there his some variance. Moreover, some of the variance is caused by factors under his control, and some is not. Because of the variance caused by factors under his control, we can be pretty confident that every time he drops back to pass, he should be expected to get a better result (or at least not a worse result) than a random grandma off the street would. What he does takes skill, and Drew Brees has lots of it. But skill doesn't explain all of his results. (If it did, since his skill is pretty much constant, his results would be a lot more constant. But in fact, from play to play and even from season to season, his results vary quite a bit.)

So how do we know how much of a result we should reasonably attribute to "luck" and how much we should attribute to "just the way it is"? As I said before, it depends.

It depends on the size of the sample of we've observed. It depends on how much variance we can expect between samples of that size. It depends on how greatly the observed sample differs from our prior expectations (and it depends on the reasonableness of those prior expectations, which is the one element that may be outside the purview of math).

Ultimately, since everything depends, I think there are situations where the point you're making is very important and right on the money. I just don't think the Drew Brees 2011 season is the best example of such a situation.

It's not the worst example, either, which is why I invented the Montee Ball situation. Ultimately, regression to the mean can be very helpful in certain situations and much less helpful in other situations. I used the Montee Ball situation to show that, at least in certain contexts, it's silly to downplay or dismiss it. We really don't need to list any specific reasons why we shouldn't expect Montee Ball to gain 40 yards on his next carry. Regression to the mean is very powerful in this situation because all of the factors I listed are leaning toward the "luck" end of the spectrum instead of the "just the way it is" end. The observed sample consisted of just a single run. The results of any single run will frequently vary quite widely from the norm. And before the game started, we were not expecting him to run for 40 yards on his first carry (which suggests that the 40-yard run was an outlier). It's silly to spend time looking for extra reasons why his second carry probably won't live up to his first (maybe his blocking won't be as good, etc.). Just pointing to regression to the mean isn't a cop-out: it's efficient.

At the other end of the extreme, I do think there are situations where we should pretty much ignore regression analysis and look instead only to the kinds of factors you mentioned -- not because the principles underlying regression would give us unreasonable answers if we used them correctly, but because using them correctly is completely impractical and the most natural simplifications would sacrifice way too much accuracy.

As an example, let's try to project what percentage of the Seahawks' targets will go to Golden Tate this season. One simplified way we could do this is to look at what percentage he got last year, and then ask our computer: "Using some fancy math, can you tell me what percentage of targets a player will typically get in Year N+1, given that he got X% in Year N?" The computer will give us an answer, but it won't necessarily be a very good one. There are way too many factors it's not considering. There are some very obvious ones, like some of the computer's data will include players who retired between Year N and Year N+1, so we need to tell the computer to ignore those guys (unless Tate retires tomorrow, in which case we need to tell the computer to focus only on those guys). That's easy enough to account for. Slightly harder, but still doable, is to account for things like maybe Tate became more effective toward the end of the season last year, so his role should expand this season. We can look at other receivers who became more effective toward the end of previous seasons and see how that affected their targets next season. Our analysis is starting to get more complex at this point, but it's still not going to do a very good job of projecting Tate's targets in 2013. For that, we'd have to somehow take into account the Percy Harvin signing. There are statistical, data-driven ways to do that, I guess, but at this point we're making things very complicated, and we're really narrowing down the amount of data that can meaningfully drive our analysis. (How many times in the past has there been a situation comparable to the Percy Harvin signing in its likely effects on a Golden Tate-type of receiver?) The smaller the sample size of our true comparable data, the less predictive value it will have.

To project what percentage of the Seahawks' targets Golden Tate will get in 2013, I'd pretty much throw any fancy math out the window and just try to envision how Percy Harvin's presence will affect the distribution of targets on the Seahawks. What role will Harvin, Rice, and Tate play in the offense (and how likely is it that Baldwin or someone else will be a factor)? Will the team use fewer two-TE sets? More three-WR sets? And so on.

In other words, I think are contexts where regression analysis or something similar is pretty much all you need in order to make the best projection possible, and I think there are contexts where regression analysis is impractical to use well and should be ignored in favor of other (less mathy, more intuitive) types of analysis.

I personally think that the Drew Brees example is a little closer to the Montee Ball situation than the Golden Tate situation when it comes to wondering whether he was likely to repeat his 2011 performance in 2012, but it's admittedly somewhere in between them.
Really good stuff MT, and I'm on board with pretty much everything you said. I think we probably draw the "luck" line differently, and maybe that's where the big difference is.

For example, how can you say the Brees example is closer to the Ball example, when Brees' stats in 2012 were pretty close to 2011? Would you say in hindsight you were wrong? I would think that if 5400 yards was such an outlier, we would project him for around 4800 or so yards the following year, not 5200.

 

Users who are viewing this thread

Top