gheemony said:
Would like to hear more about (1) your general philosophy on projections and (2) details of your approach. From your comments, you appear to take a very different approach than most.
I'm planning to write an article for FBG on this.
Very briefly . . .
The first thing I do for each team is project total number of offensive plays (not including sacks), and run-pass ratio. My first cut at this uses the previous year's stats fed into formulas derived using regression analysis over the last 10 seasons' worth of data for each team. (Actually, I don't remember if I used 10 years or 15 or some other number . . . I did the regression analysis a few years ago.)
Then I tweak those numbers based on any coaching or personnel changes, or sometimes just gut feel.
Once I've got projected total plays and run-pass ratios for each team, that gives me the total number of pass attempts (and targets) and rush attempts for each team.
The next thing I do is divide the pass attempts, targets, and rush attempts among various positions. For now, I always give QBs 100% of the pass attempts, and 0% of the targets, and I give TEs 0% of the rush attempts. Rushes are distributed to QBs, RBs, FBs, and WRs, and targets are distributed to RBs, FBs, WRs, and TEs. I start with the percentages for each team from the previous year, and then again adjust them based on any coaching or personnel changes.
Then, within each position, I distribute the pass attempts, targets, and rush attempts to individual players.
My default for QBs when the starter is set is to give 15/16 of the QB pass attempts and QB rush attempts to the starter, and 1/16 to the backup. Where the starter's job security isn't great (or where there hasn't been a starter named yet), those numbers will change. (They'll also change where one QB is more of a runner than another. Like I have Vince Young with 15/16 of the pass attempts, but 15.5/16 of the rush attempts, while Kerry Collins has 1/16 and 0.5/16, respectively.)
This is the part -- distributing passes, targets, and rushes to individual players within their positions -- where I don't rely much on data from previous years. In some cases, where little has changed among WRs, for example, I'll keep the ratios of targets pretty close to what they were last year. But in most cases, things change from year to year as younger players start to take on more of a role in the offense, or guys return from injury, or whatever. So this part is largely subjective, based on media reports regarding how coaches plan to use their players this year and my overall feel for each team situation based on everything I've seen.
So to take Tomlinson as an example -- I've got the Chargers down for 969 total offensive plays (not including sacks); I've got 48% of those plays being runs; I've got 87.2% of those runs going to RBs (note: fullbacks are separate from RBs); and I've got 75% of the RB runs going to Tomlinson. So Tomlinson gets 969 * .48 * .872 * .75 = 304 rushing attempts.
By the same token, 52% of the Chargers' plays will be passes, and I've got 23.1% of the targets going to RBs, and 65% of those going to Tomlinson. So Tomlinson gets 969 * .52 * .231 * .75 = 76 targets.
Once I've got rush attempts and targets for each player, I multiply them by my individual player projections for yards per rush, touchdowns per rush, receptions per target, yards per reception, touchdowns per reception, etc.
When I do individual projections for those things (as well as completion percentage, yards per attempt, touchdowns per attempt, and interceptions per attempt), I rely heavily on Bayesian inference analysis. This is the part that will take up the bulk of my article. To use yards per carry as an example, I've divided the universe of running backs into groups based on long-term YPC. For example, 0.4% of NFL RBs average more than 5.05 yards per carry over 350+ carries, 0.8% average between 4.95 and 5.05, 1.6% average between 4.85 and 4.95, 2.4% average between 4.75 and 4.85 . . . yadda yadda . . . 8.5% average between 4.35 and 4.45, 10.5% average between 4.25 and 4.35, 13% average between 4.15 and 4.25, 11.3% average between 4.05 and 4.15, 9.7% average between 3.95 and 4.05 . . . yadda yadda . . . 0.8% average between 3.25 and 3.35, and 0.4% average fewer than 3.25 yards per carry.
So I take that distribution as the prior probabilities for a given running back before he has his first carry. (In the future, I may use different priors based on draft position.)
Once he has, say, 39 carries for 130 yards (3.33 YPC), I can recalculate the percentage chance that he's in each of those groups (between 4.75 and 4.85, between 4.65 and 4.75, between 4.55 and 4.45, and so on). His observed YPC is 3.33. So the chance that he's
truly a 4.45-4.55 YPC RB can be calculated based on how many standard deviations 3.33 is from 4.5 over his 39 carries. When we add up the probabilities that he's in each individual group, we can get his expected true YPC. (As it turns out, 39 carries is too small a sample size to be all that significant. After just 39 carries at a 3.33 average, his expected true YPC is at about 4.03 -- just a bit below the 4.16 NFL average for RBs. But once he has 390 caries at a 3.33 average, his expected true YPC drops to 3.61.)
I go through a similar analysis for each player for completions per pass attempt, yards per pass attempt, touchdowns per pass attempt, interceptions per pass attempt, touchdowns per rush attempt, receptions per target, yards per reception, and touchdowns per reception.
As input, I will often use a player's career stats. But in the case of older players, I will tend to use just the last few years' worth of stats (to avoid using his younger, faster days). For players who've switched teams, I will sometimes not use stats from old teams if I don't think they are representative of his current situation. (I've excluded Randy Moss's Raider years, for example.) And so on. So there's still a subjective component to it. But after playing around with a number of different ways of doing individual projections for stuff like yards per carry, I
really like the Bayesian method I've got set up. I've found it to give very realistic projections, and (unlike the regression analysis I'd played around with before) I love the way it takes into account sample size. Peyton Manning, for example, has had sky-high stats that would get regressed back to the mean too much using any regression analysis that tries to apply to both experienced and inexperienced QBs. But since he's been so good for so long, Bayesian inference analysis doesn't regress him back that much -- he
really is that good, as he's demonstrated consistently.
So anyway . . . after I have all the individual player projections done, I add them all up to compare total receiving stats to total passing stats. Where there's a divergence, I have them meet in the middle. So if a team's QBs are projected to complete 61% of their passes, while the receivers collectively are projected to catch 59% of their targets, I'll split the difference to make completions equal receptions. Then do the same with yards and touchdowns.
Then I add up all stats for all teams and make sure they fit in with league-wide historical norms, and make any adjustments I need to there.
Then I post a thread in the shark pool asking for corrections.
To me your projecctions seem to be like running an infinite number of possible projections and taking an average of those projections (some of those projections could be ACL tear in the pre-season, others could be record-breaking seasons). On the other hand, nearly all others are projecting on what they think will happen IF the player stays healthy.
That would be a great way to do projections, and I'm aiming for the same results as that method, but I don't have any simulations set up. When distributing rushes and targets to players, I do take into account that there's less than a 100% chance that the starter will be healthy all season.