What's new
Fantasy Football - Footballguys Forums

Welcome to Our Forums. Once you've registered and logged in, you're primed to talk football, among other topics, with the sharpest and most experienced fantasy players on the internet.

Quantitative analysis question (1 Viewer)

Ryan99

Footballguy
First of all, hi, I'm new. One of my interests in fantasy football is in statistical analysis, so this post is going to be somewhat technical. I was thinking of a way to analyze what position to draft first (or second, third etc.). Anyway, this is what I came up with. If this type of analysis exists somewhere, please point me towards it.

Rather than trying to predict how well a player will do by looking at his previous years stats, match ups, etc. (this type of prediction is really hard and prone to large errors), it might be useful to predict a players performance by his ADP. For instance, by ADP Arian Foster is the number one RB this year. Historically, how have #1 RBs done?

For each of the past 5 years I assembled players' preseason ADP and their in season stats. To help eliminate injuries skewing the data, I looked at per game fantasy point production, rather than whole season point production. For each ADP spot (by position, so QB1, QB2, QB3 etc., same for RB, TE and WR) I calculated the median points per game for the past 5 years (I used the median to eliminate particularly bad or good seasons, which would show up in an average). I then plotted ADP versus PPG, and fit a straight line. So the equation for this line gives me median PPG as a function of ADP for each position. I can than use this equation (a different equation for each position) to predict, for instance, how many PPG the 5th ranked TE will get.

So after doing this, I got this year's ADP, and wrote a program to simulate an n-owner snake draft (my leagues use 12 players). By default, owner draft by highest remaining ADP (assuming the team needs that position). I then used the above equations to calculate the PPG for each team. No surprise, the 1st team to draft winds up with the best team.

This score, by draft position, I used as a baseline score. So this is the score you'd expect from a certain draft position by just drafting by ADP, assuming everyone else does the same. Then one at a time, for each draft spot (1 through 12), I used a different draft strategy and computed the new score to the baseline. Some of the strategies I tried was RB, RB RB, QB RB, WR WR, etc.

As it turns out, RB RB was the best strategy, outperforming the baseline for every draft spot (except those that were already drafting RB RB just by ADP). One spot (10th I think) outperformed the baseline by a full 5 PPG. And this was with 6 fantasy points for a passing touchdown. So even in a league where QBs dominate the top scoring spots, RB RB still does better than drafting QB early.

Wow, that was long, Hopefully people that are interested in this sort of thing could follow that. Anyway, a few questions: Have you seen an analysis like this before? Do you have any suggestions on how to improve this / make it more useful / make it more accurate? How can I account for injury rate (I assume QB is less likely to get injured than RB and WR, an effect I did not account for). One thing that I think is particularly elegant about this analysis is that it naturally takes into account the idea that the top QB is more likely to finish near the top of QBs than the top RB is to finish near the top of RBs, since it compares ADP to actual in season production.

Yeah first post!

 
There is a few site I visit that are based on advance statistics who try to figure out how to moneyball sports. My issue is there is too many variables involving 11 on 11 football. Baseball can be tracked individually, Basketball advance stats are closed to solved, football is a long ways away because there is too much that can happen in one play and know way to consider things like down and distance with stats. Basing my draft decision on historical ADP just doesn't sound rational. Here's site where you might be interested in for more angles.

http://www.pro-football-reference.com/

http://footballoutsiders.com/

http://harvardsportsanalysis.wordpress.com/papers/

http://fantasydouche.com/

 
Last edited by a moderator:
I don't know how useful it is, but it's an interesting approach, so thx for the post.

The only thing i can suggest for injuries is to use games played or started, with 16 as the baseline, but you'd have to use your judhement.

 
First off, I really appreciate the time you took to do your analysis and write this post. I'm really big into statistics, so I always enjoy reading something new. It's interesting that RB RB comes out as the best strategy. I've been doing mocks all day where I do that, and I've liked my team very much. I would be intrigued to see how this works in leagues with different roster and/or scoring settings.

My only issue with it is the linear regression you used. I tried that last offseason, and it looked like a bit of a mess to me when I saw it with the data points. Reason being, the fantasy points are going to be all over the place, so while there is a "best fit line", a line doesn't really fit the data too well. Perhaps using the median made the data more linear?

As a sidenote, what program did you use? Thanks again for the work!

 
Thanks for the work. Do you have data for average points scored per position on a strict ADP draft that you could share? It's fairly obvious that number 1 was going to be highest when some previous years had LT or Faulk (depending how far back you went). But I'd like to see the 2-11 positions.

 
'My only issue with it is the linear regression you used. I tried that last offseason, and it looked like a bit of a mess to me when I saw it with the data points. Reason being, the fantasy points are going to be all over the place, so while there is a "best fit line", a line doesn't really fit the data too well. Perhaps using the median made the data more linear?'

The data wasn't very linear, the R^2 values ranged from .5 to .7. Perhaps a different curve would have worked better (2nd order polynomial maybe), but you have to fit some sort of curve, because if you don't you get that some lower ranked ADPs score high than some higher ranked ones. For instance, over the past 5 years the highest median PPG for QBs comes from the QB4 spot (more on this later). Using this as is doesn't make sense, because its saying that if you're drafting the first QB, don't take the guy you have ranked 1st, take the guy you have ranked fourth. So fitting the line was just to make sure that a later ranked player scores fewer points.

Something very interesting about the QB 4 spot. In 4 of the past 5 years, the QB 4 by ADP finished first, and the other year he finished 9th. The 4th QB by ADP this year? Matt Stafford. Just sayin'.

'As a sidenote, what program did you use? Thanks again for the work!'

I compiled the data and fit the lines using excel andran the mock drafts using a short program I wrote in python (a nice, simple language).

 
Thanks for the work. Do you have data for average points scored per position on a strict ADP draft that you could share? It's fairly obvious that number 1 was going to be highest when some previous years had LT or Faulk (depending how far back you went). But I'd like to see the 2-11 positions.
Is there a way to attach files? I'm not seeing one.Anyway, I'd be happy to share my data, but its fairly restrictive. First, its just for the past 5 years, and its only ~24 QBs and TEs and ~60 RBS and WRs each year. More significantly, it isn't yearly stats, just end of year fantasy point and points per game using my league's scoring system. So you can't just recompute the points using a different scoring system. Does anyone know of a site that has stats along with ADP in the same list that can just be copied into excel?
 
I was just interested to see how much of an advantage top picks are over mid to late picks per ADP. Any way you could restructure the order to include a bonzai draft (reverse only the third round so 12 picks first on the second/third/fourth rounds) and see if it brings average team scores closer?

 
Also I've never been able to find a reliable site for data gathering that can be copied and pasted. I mostly only do it for baseball however so there might be a football site. I spent probably 10 hours inputting data to numbers (Apples excel) in preps for baseball leagues before I did any real draft prep.

 
if you got numbers and you want real info... use SPSS.

Excel is to stats what dry humping is to porn.

 
if you got numbers and you want real info... use SPSS.Excel is to stats what dry humping is to porn.
Don't know SPSS. If you're doing serious computational statistics and need more than just built in stuff, check out the language 'R'. Its supposed to be the best statistics language bar none.I don't like using excel, but it's what I can use. What I'd really like is a site that I can get comma separated stats from, including ADP, so that I can import it into Python and do some serious programming with it.
 
Last edited by a moderator:
I was just interested to see how much of an advantage top picks are over mid to late picks per ADP. Any way you could restructure the order to include a bonzai draft (reverse only the third round so 12 picks first on the second/third/fourth rounds) and see if it brings average team scores closer?
Here's some copy pasted stuffQBADP 2011 2010 2009 2008 2007 Median1 20.7 24 24.1 3 20.9 20.92 32.5 20.9 20.3 19.1 17.5 20.33 27.8 22 21.5 19.7 19.5 21.54 19.2 23.2 24.7 23.2 30.6 23.25 30 18.2 20.3 8.6 10.6 18.26 21.1 20 18.2 13.3 17.6 18.27 18.8 21.2 20.4 10.4 14.1 18.88 0 17.6 19.6 18.1 18.9 18.19 20.7 9.6 16.3 9.4 13.7 13.710 17.2 16.9 21 14.6 23 17.211 26.3 9.2 17.4 15.4 11.6 15.412 15.6 19 15.4 20.2 6.8 15.613 21.2 18.3 21.8 3.4 15.6 18.314 9.9 17.1 21.3 10.9 14.4 14.415 13 15.6 13.9 9.7 21.5 13.916 15.2 20.5 2.6 14.9 18.7 15.217 16.2 20.2 17.8 12.8 24.5 17.818 12.9 12.3 13 18.4 5.6 12.919 11.5 14.6 16.1 21.4 7.5 14.620 12.2 12.1 9.9 21.4 11.3 12.121 14.5 14.6 8.4 14.1 13.3 14.122 18.1 18.9 15.5 12.7 7.8 15.523 16.7 8.4 15.5 20.1 11.7 15.524 25.3 14.2 5.1 14.1 13.9 14.1RBADP 2011 2010 2009 2008 2007 Median1 16.1 15.2 17.8 14.8 19.8 16.12 19.2 16.4 17.8 15.4 14.1 16.43 19.7 14.2 13.4 16.4 13.2 14.24 11.4 14.9 10.7 10.1 13.1 11.45 7.7 14.9 13.8 16 16.1 14.96 19.6 13.5 11.5 12.7 8.7 12.77 10.9 12.9 13.9 13.7 20 13.78 16.9 10.3 21.8 14.1 10.7 14.19 15.6 14 11.4 13.3 7.6 13.310 13.5 8.3 17.1 10.5 12.7 12.711 11.3 6.7 9.3 10.1 8.2 9.312 12.8 15.3 8 14.3 9.9 12.813 14.9 4.5 8.5 9.1 13.3 9.114 9.4 10.9 10.6 9 11.6 10.615 8.5 11.1 13.4 12.8 19.4 12.816 7.6 21.2 11.8 15.6 11.9 11.917 10.7 15.9 13.8 15.7 9.5 13.818 13.7 10.1 11.4 3.1 9.5 10.119 14.9 13.7 16.4 10.2 13.8 13.820 14 11.2 5.2 17.1 14.5 1421 4.3 4.5 6.2 8.1 13.3 6.222 8.4 7 13.8 9.9 3.4 8.423 8.9 12.7 9.2 9.3 16.7 9.324 10.9 8.3 4.8 5.9 9.4 8.325 11.3 3.6 14.6 4.6 13.1 11.326 6.6 9 10.6 10.6 8.9 927 10.1 12.7 3.6 11.4 14.9 11.428 9.2 6.2 8.3 16.1 6.7 8.329 17.4 6.1 2.5 5.9 7.7 6.130 14.8 5.4 5.7 5.7 6.7 5.731 12.6 4.1 7.1 18.2 6 7.132 10.9 6.3 13.6 2 4.5 6.333 5.3 8.7 12.1 7.6 4.1 7.634 5.6 9.7 5.7 7.6 8.6 7.635 8.7 6.6 9.3 14.6 10.8 9.336 7 8.2 7.9 7.3 6.7 7.337 9.8 11.8 8.8 7.7 7.9 8.838 6.9 8.7 8.1 7 10.3 8.139 9 7.2 6.7 6.9 2.4 6.940 11.9 6.6 6.7 3.7 10.2 6.741 8.3 10 10 0.7 7.9 8.342 10.2 17.5 11.4 6 3 10.243 8.1 6.1 5.8 3.5 7.7 6.144 12.8 1 1.9 6.9 4.4 4.445 8.5 3.2 10.9 3.8 -0.1 3.846 4 1.5 1.6 11.1 5.1 447 3.3 4 2.8 14.6 2 3.348 3.4 5.4 9.3 6.6 4.3 5.449 7.4 4.4 9 8.9 4.3 7.450 1.1 0.1 3.1 7.5 7.4 3.151 4.2 0.7 9.5 8.5 4.4 4.452 9.7 2.6 4.3 4.1 5.1 4.353 4.1 2.4 5.1 4.5 0 4.1TEADP 2011 2010 2009 2008 2007 Median1 9.7 9.9 8 8 9.8 9.72 7.9 13.7 10.3 3.8 9.9 9.93 8.3 7.3 8.5 7.7 5.9 7.74 4.6 8.2 11.4 6.6 6.3 6.65 7.5 10.4 7.4 8.6 5.6 7.56 12.6 4.9 6.9 11.2 8.9 8.97 6 7.2 8 6.3 8.3 7.28 8.8 5.7 10.6 4.4 5.8 5.89 3.3 6.9 6.6 3.2 10.3 6.610 15.1 4.4 5.7 5.3 3.1 5.311 6.1 7 8.2 2.5 4.1 6.112 1.9 7 4.3 6.1 6.8 6.113 10.3 2.8 7.9 6.4 6.5 6.514 7.7 6.5 7 2.3 9.3 715 4 5.1 6 4.3 5.9 5.116 7.2 7.2 9.4 4.9 4.4 7.217 5.4 4.7 7.7 5.7 4.1 5.4WRADP 2011 2010 2009 2008 2007 Median1 8.9 13.3 12.5 10.2 10.5 10.52 15.9 4.4 12.6 10.4 12.1 12.13 11.5 11.2 13.1 9.6 6.5 11.24 11.4 11.8 9.5 6.5 14.8 11.45 10.2 10 12.1 13 12.9 12.16 10.6 13 8.3 13.5 10.4 10.67 11.6 9.1 9.5 9.5 13.2 9.58 10.5 9.8 11.6 8.7 9.6 9.89 7.9 12.1 9.8 6.7 12.7 9.810 10.2 10.3 9.2 7.2 4 9.211 7.8 10.1 12.8 12.2 12.1 12.112 9.3 9.4 7.8 6.4 12.1 9.313 6.1 8 7.3 9.5 14.7 814 9.7 8.6 7.9 7.4 7.1 7.915 9.8 4.9 9.9 11.1 8.2 9.816 13.9 7.1 11.1 15 17.9 13.917 7.4 8.1 12.1 4 11 8.118 8.6 12.7 6.6 12.2 9.8 9.819 2.1 13.5 0 6.7 7.3 6.720 11.6 7.1 5.6 11.3 6.9 7.121 9.2 8.7 3 7.6 8.4 8.422 7.5 9.9 12.6 9.4 9.7 9.723 6.5 6.7 5.8 9.6 7.1 6.724 11.6 9.3 9.6 5.5 5 9.325 9 5.7 4.2 7.3 4.9 5.726 11 11.1 6.6 8.4 13.6 1127 14.6 9.5 6.4 11 7.7 9.528 7 11.1 7 8.4 8.8 8.429 4.2 10.1 10 6 5.5 630 9 7.4 8.8 9.1 3.7 8.831 6.7 7.8 7.3 1.7 7.9 7.332 2.7 3.7 5.8 12.5 8.7 5.833 7.2 8.6 5.7 8.2 0 7.234 8.3 9.1 4.2 5.1 10.7 8.335 11 6 8.8 2.3 3.6 636 10 6.4 7.7 9.4 12.3 9.437 3.9 1.1 9.2 1.8 8.5 3.938 5.5 6.2 4.8 8.6 9.3 6.239 5.5 7.9 5.4 3.3 6.6 5.540 3.8 7.9 6.4 2.4 3 3.841 7.9 5.7 3.8 4.8 4.3 4.842 0.8 6.1 6.3 3.2 7.1 6.143 2.2 0.7 1.6 8.3 4.8 2.244 7 2.1 5.3 3.9 4.8 4.845 8.3 3.5 5.4 3.7 11.6 5.446 4.4 4.9 6.9 7 5.8 5.847 13 9.9 6.2 3.1 5 6.248 4.3 10.4 8.1 8 6.2 849 5.3 4 6.8 6 4.3 5.350 7.8 5.4 4.8 4.3 9.5 5.451 6.2 2.9 12.1 1.6 4.8 4.852 5.7 13.1 5.7 7.2 1.5 5.753 2.2 6.5 7.7 7.6 1.5 6.554 1.9 2.4 2.7 8.8 11.6 2.755 6.8 4.4 11.1 0.5 3.5 4.456 6.1 3.2 1.9 3 3.1 3.157 3.8 3.8 8.2 2.1 8.2 3.858 4 5.3 2.7 2.6 5.4 459 2.7 7.8 4.5 10 1 4.5
 
if you got numbers and you want real info... use SPSS.Excel is to stats what dry humping is to porn.
Don't know SPSS. If you're doing serious computational statistics and need more than just built in stuff, check out the language 'R'. Its supposed to be the best statistics language bar none.I don't like using excel, but it's what I can use. What I'd really like is a site that I can get comma separated stats from, including ADP, so that I can import it into Python and do some serious programming with it.
I couldn't tell you what 'R' offers that SPSS doesn't, all within one neat, but expensive program (of course, it doesn't have to cost a thing)I was heavy into statistics when I worked for the government, where I mastered SPSS, but feel like I wouldn't know how to do ever a simple z-score if you put it infront of me right now.SPSS is the way to go for stats. And you can find online tutorials how to do anything you want with your data sets
 
Last edited by a moderator:
'My only issue with it is the linear regression you used. I tried that last offseason, and it looked like a bit of a mess to me when I saw it with the data points. Reason being, the fantasy points are going to be all over the place, so while there is a "best fit line", a line doesn't really fit the data too well. Perhaps using the median made the data more linear?'The data wasn't very linear, the R^2 values ranged from .5 to .7. Perhaps a different curve would have worked better (2nd order polynomial maybe), but you have to fit some sort of curve, because if you don't you get that some lower ranked ADPs score high than some higher ranked ones. For instance, over the past 5 years the highest median PPG for QBs comes from the QB4 spot (more on this later). Using this as is doesn't make sense, because its saying that if you're drafting the first QB, don't take the guy you have ranked 1st, take the guy you have ranked fourth. So fitting the line was just to make sure that a later ranked player scores fewer points.Something very interesting about the QB 4 spot. In 4 of the past 5 years, the QB 4 by ADP finished first, and the other year he finished 9th. The 4th QB by ADP this year? Matt Stafford. Just sayin'.'As a sidenote, what program did you use? Thanks again for the work!'I compiled the data and fit the lines using excel andran the mock drafts using a short program I wrote in python (a nice, simple language).
You definitely shouldn't use a linear curve to fit the data. The model would break down in exactly the place where you need it to work, i.e., the top of the draft. Presumably a logarithmic curve would work better.Also, while I understand what you're doing, it's important to recognize that there are some legitimate concerns about smoothing the data. If, for example, lower ranked RBs are more likely to become RB1s than lower ranked WRs or QBs, then that would be a reason to go against RB-RB despite the data presented.
 
I don't like using excel, but it's what I can use. What I'd really like is a site that I can get comma separated stats from, including ADP, so that I can import it into Python and do some serious programming with it.
You can get ADP data from fantasy football calculator in XML. Automated conversion to csv should be trivial compared to serious programming. Thanks for the thread.
 
Last edited by a moderator:
You definitely shouldn't use a linear curve to fit the data. The model would break down in exactly the place where you need it to work, i.e., the top of the draft. Presumably a logarithmic curve would work better.Also, while I understand what you're doing, it's important to recognize that there are some legitimate concerns about smoothing the data. If, for example, lower ranked RBs are more likely to become RB1s than lower ranked WRs or QBs, then that would be a reason to go against RB-RB despite the data presented.
Better maybe, but I don't think any curve would fit well since the data is all over the place. Having more than 5 years might improve this. There are also ways to massage the data that might help, for instance by scaling each year's data to that year's average point production by position. For something like this I'm not sure how to get away from doing some sort of smoothing. The expected fantasy points as a function of ADP has to be monotonically decreasing (assuming the ADP is 'correct') for this to make any sense, but unless you enforce it (by fitting a curve or some other method), it won't be.I wouldn't use this system to make specific player choices, but to test an overall strategy, the results are at least interesting. I like the idea of utilizing ADP in some manner, because it takes advantage of the 'wisdom of crowds', which in something like fantasy football is largely a compilation of experts' opinions. You then have to attach a number of expected fantasy points to each ADP spot, and this was one attempt to do so.As an aside, I was thinking about a method to gauge the effect of a particular fantasy expert on the community as a whole. Get ADP as a function of time, get some expert's rankings as a function of time, and see how correlated they are (there's going to be some delay in the ADP, so this can be taken into account by calculating an average lag time or something).
 
Last edited by a moderator:

Users who are viewing this thread

Top