What's new
Fantasy Football - Footballguys Forums

Welcome to Our Forums. Once you've registered and logged in, you're primed to talk football, among other topics, with the sharpest and most experienced fantasy players on the internet.

NFL data analysis. How far back is relevant? (1 Viewer)

Gawain

Footballguy
There's some great discussion in the dynasty value thread about looking at Kareem Hunt through predictive data.

For instance, the history of RB's that go for 1,000 yards and were drafted third round or later isn't great. http://pfref.com/tiny/Oj1wR
However, the history of RB's that run for over 1,200 yards (75 yards/game) their first season gives a more positive career trend. http://pfref.com/tiny/9YOt8

Any thoughts of what data analysis is actually relevant aside, how far back in NFL history do we think impacts the thoughts? 
There are 45 guys that played in at least 8 games and went for 75+ Y/G their rookie season since 1950. Is Alan Ameche germane to a discussion about Kareem Hunt? What about Curt Warner?
http://pfref.com/tiny/p1axh

You go too far back and the game isn't representative. You eliminate too many guys and you don't have a sample that's significant.

 
Last edited by a moderator:
I think trying to use past generalizations to predict specific future outcomes is futile.  You can use those past numbers to prognosticate and perhaps create risk tiers (which I used to be just as guilty in doing) but in the end looking at the individual’s skill set as its own entity and trying to determine repeatability as well as potential improvements in weaknesses has been much more successful for me.

Right now, only Hunt is representative of Hunt.  IMO.

.

 
Last edited by a moderator:
2000 seems to a very common cut off. You get full careers and current careers.

What is the question (in context) that you’re trying to solve?

 
2000 seems to a very common cut off. You get full careers and current careers.

What is the question (in context) that you’re trying to solve?
I remember my first course in business statistics back in 1997, learning about regression analysis and thinking that all I had to do was to create the perfect variable set and I'd have the answer to fantasy football in the palm of my hand. 20+ years later, with one lousy championship...that variable set continues to be elusive (I don't think I did that well in the class either)

For my own analysis, I'm trying to come to a decision about future value comping Diggs and Theilen. 
The Kareem Hunt discussion also got me thinking about future values prediction as well based on historical comps. 
Someone (ZWK?) is tweaking their dynasty values by historical comps and the concept is intriguing.

 
Um, young kids have zero reference points of players outside of their life-line so they think anything they don't know about is irrelevant which is nonsense.

Relevant data is relevant no matter how dated.

Basically the counter argument is 'I don't know what I don't know' which is true for any argument but it isn't valid for not allowing dated information into contextual arguments that the young person isn't aware of.  You don't know what you don't know but that doesn't mean I can't know what you don't know kid.

 
Last edited by a moderator:
Um, young kids have zero reference points of players outside of their life-line so they think anything they don't know about is irrelevant which is nonsense.

Relevant data is relevant no matter how dated.

Basically the counter argument is 'I don't know what I don't know' which is true for any argument but it isn't valid for not allowing dated information into contextual arguments that the young person isn't aware of.  You don't know what you don't know but that doesn't mean I can't know what you don't know kid.
Data might become irrelevant over time though, with the way the league is changing.  In this context, back before the RB committee/timeshare trends in the NFL there were probably a lot more players who had high yardage stats due to volume alone.

 
Data might become irrelevant over time though, with the way the league is changing.  In this context, back before the RB committee/timeshare trends in the NFL there were probably a lot more players who had high yardage stats due to volume alone.


You mean back when being a 1000 yd rusher was a really big deal?

 
There's some great discussion in the dynasty value thread about looking at Kareem Hunt through predictive data.

For instance, the history of RB's that go for 1,000 yards and were drafted third round or later isn't great. http://pfref.com/tiny/Oj1wR
However, the history of RB's that run for over 1,200 yards (75 yards/game) their first season gives a more positive career trend. http://pfref.com/tiny/9YOt8

Any thoughts of what data analysis is actually relevant aside, how far back in NFL history do we think impacts the thoughts? 
There are 45 guys that played in at least 8 games and went for 75+ Y/G their rookie season since 1950. Is Alan Ameche germane to a discussion about Kareem Hunt? What about Curt Warner?
http://pfref.com/tiny/p1axh

You go too far back and the game isn't representative. You eliminate too many guys and you don't have a sample that's significant.
The thing about this is for the sample size to be useful, you really want several thousands of examples and the fewer examples in your sample the more random it becomes. Not very predictive at all.

You are right though that eras change. I am not sure a team would give Eric Dickerson the ball 390 times as a rookie in todays NFL.

They used to only play 14 games is a reasonable cut off. 2000 to now seems simple and fine for today.

It really depends on the question your trying to answer as far as how relevant the data is.

 
Data might become irrelevant over time though, with the way the league is changing.  In this context, back before the RB committee/timeshare trends in the NFL there were probably a lot more players who had high yardage stats due to volume alone.
The purpose of analytics is to gain an advantage by using past data to predict future outcomes based on 'best' data so I understand timelines of relevancy would be considered of 'best' use to those who consider timelines 'best' data but that isn't the case when the data is based on unique individuals.  

Arbitrary timelines are easier.

This narrative of throwing out old data is bubbling up around many sites and is gaining momentum for some reason.  It seems like a popularity contest or to make someones model work.

You have to ask what you really seeking with an analytic model.

I would not throw away a timeline model but I would use it as a tool and cross tabulate data outside of the timeline to do further digging to really gain an advantage.

 
There are far too many variables to make even going back to last year a viable point for analysis.

It's not just about his age or what round he was drafted in.  Even if you input every combine number, his SPARQ score, height, weight down to the last pimple on his butt and even if you did find comps in the historical data it wouldn't provide enough information to make any sort of projection for Hunt because then you need to start looking at the team around them.  How many of those teams changed QBs and offensive coordinators? What kind of turnover was there at the skill positions, or along the line? etc.

My only point is that I personally find looking for historical comps eventually requires you to make subjective judgement that ultimately does little more than reinforce your opinions and doesn't yield anything beyond hope and/or uncertainty.

 
I think trying to use past generalizations to predict specific future outcomes is futile.  You can use those past numbers to prognosticate and perhaps create risk tiers (which I used to be just as guilty in doing) but in the end looking at the individual’s skill set as its own entity and trying to determine repeatability as well as potential improvements in weaknesses has been much more successful for me.

Right now, only Hunt is representative of Hunt.  IMO.

.
:goodposting:

I think people must have been absent on the day "independent events" were discussed in their statistics/probability classes.

 
:goodposting:

I think people must have been absent on the day "independent events" were discussed in their statistics/probability classes.
But is a player's career an independent event?
I think that what's happening is a fairly logical extension of the "regression to the mean" type of analysis that was popular a few years ago.

However, it is very difficult to capture the "mean" value of someone's career?
McCaffery is a great example right now. 80+ receptions for a RB is rare. The guys who have caught 80 balls as a RB are few and far between. The guys who have done it more than once can be counted on one hand. So, what's McCaffrey's mean value as a receiving RB? 

Is it a Keith Byars circa 1990? Doesn't seem to fit a physical comp.
Is it a Faulk circa 2002? Doesn't fit a RB load comp.
Is it a Reggie Bush circa 2006? Fits the curve the best.

Each situation in the NFL is different, just like each production run in China is different. However, we still use AQL sampling to learn about the whole when it comes to production. By sampling like instances in the NFL, there's value to be gained. I'm just not smart enough to make that next leap and figure out what's "like" enough.

 
But is a player's career an independent event?
I think that what's happening is a fairly logical extension of the "regression to the mean" type of analysis that was popular a few years ago.

However, it is very difficult to capture the "mean" value of someone's career?
McCaffery is a great example right now. 80+ receptions for a RB is rare. The guys who have caught 80 balls as a RB are few and far between. The guys who have done it more than once can be counted on one hand. So, what's McCaffrey's mean value as a receiving RB? 

Is it a Keith Byars circa 1990? Doesn't seem to fit a physical comp.
Is it a Faulk circa 2002? Doesn't fit a RB load comp.
Is it a Reggie Bush circa 2006? Fits the curve the best.

Each situation in the NFL is different, just like each production run in China is different. However, we still use AQL sampling to learn about the whole when it comes to production. By sampling like instances in the NFL, there's value to be gained. I'm just not smart enough to make that next leap and figure out what's "like" enough.
Each production run in China is still putting out the same product built to the same specs. Each NFL player is a special, unique snowflake and there isn't another one like it anywhere in the world. Throw in 25 or so other offensive teammates who are also unique, special snowflakes for which there aren't any others like them in the whole world and all any AQL in the world is only going to lead you right back to where you started; with, like, your opinion man.

 
Each production run in China is still putting out the same product built to the same specs. Each NFL player is a special, unique snowflake and there isn't another one like it anywhere in the world. Throw in 25 or so other offensive teammates who are also unique, special snowflakes for which there aren't any others like them in the whole world and all any AQL in the world is only going to lead you right back to where you started; with, like, your opinion man.


:goodposting:

 
But is a player's career an independent event?
Yes. What 30 other RBs from Toledo accomplished or what 30 other RBs that were drafted in Round 3 or what 30 other RBs that had the same metrics as him accomplished during their careers has no effect on what Kareem Hunt will accomplish in his.  

The best the data can show is that certain milestones/achievements may be difficult to obtain or maintain generally - but even if 28 out of 30 players that fit one of the above criteria could not follow up on a rookie season like Hunt had does not mean that Hunt will not be the third out of 31 players to accomplish it.

I think the element of "causation" is missing. Did those other 28 RBs fail specifically because they were drafted in Round 3? No, of course not. I think we all know that generally speaking 3rd round RBs are unlikely to become multiple pro bowlers but it's not necessarily predictive in nature on a specific independent player that was drafted in round 3.

 
Each situation in the NFL is different, just like each production run in China is different. However, we still use AQL sampling to learn about the whole when it comes to production. By sampling like instances in the NFL, there's value to be gained. I'm just not smart enough to make that next leap and figure out what's "like" enough.
There is no "enough", at least when it comes to statistical analysis in the NFL. The sample sizes are too small (even given a full career of 16-game seasons, much less after a season or two. which is what we're all after here) ... but, more importantly, the variables aren't nearly independent enough.

It doesn't take a sample larger than a few hundred Mike Trout at-bats or Steph Curry 3FG attempts to tease out the signal from the noise and draw the conclusion that they're both among the best of all time at what they do. But Emmitt Smith had 4,409 career carries and reasonable people still debate whether he was one of the GOAT at his position, or an above-average talent carried by an all-time OL. Because baseball and basketball are largely one-on-one duels, but football is an intimately choreographed 22-man ballet.

Data are still data, and I think even a season or two's worth of it for a given player allows us to do better than throw up our hands and say "nobody knows nothin'" (for instance, the "talent shows itself early" mantra is strongly supported by historical data, which is why I'm much higher on someone like Smith-Schuster than one season of stats would otherwise allow).  But in general I've gotten into more trouble being overconfident in my ability to derive signal out of NFL data than I have missing out by being too underconfident.

 

Users who are viewing this thread

Top