From Newsweek
http://blog.newsweek.com/blogs/stumper/arc...is-pitches.aspx
http://blog.newsweek.com/blogs/stumper/arc...is-pitches.aspx
Just a snip From Wikipedia http://en.wikipedia.org/wiki/PECOTAOn May 6, expectations were high for Hillary Clinton. After all, the latest polls suggested the former First Lady had built up a 5-point cushion in Indiana and slashed Barack Obama's 20-point lead in North Carolina to 8. But over at FiveThirty Eight.com, an anonymous blogger (nom d'écran: "Poblano") wasn't convinced. Relying on demographic data from previous primaries and ignoring the usual mishmash of polls, the mysterious upstart projected that Clinton would win Indiana by 2 percent and lose North Carolina by 17—a far-less favorable outcome. When the results finally rolled in—1 in Indiana, 15 in North Carolina—Poblano had outperformed every established pollster. Clinton never recovered, but with the National Journal, the Guardian and the New York Post suddenly dissecting or demanding the secrets of his success, Poblano became an Internet sensation. "It was kind of amazing," he says.
It only gets better. For the man behind the blog, outpredicting the experts wasn't anything new—even if outpredicting political experts was. On May 30, Poblano finally revealed his offline name: Nate Silver. Doesn't ring a bell? Chances are you're not a baseball geek. Silver, 30, is already celebrated among ball fans for inventing something called PECOTA. Developed while the University of Chicago econ alum slogged through a post-collegiate consulting gig—"I'm used to not sleeping," he tells NEWSWEEK—PECOTA is now recognized as the most accurate system for forecasting how athletes and teams will perform in the future (down to the number of singles). In 2007, Silver's algorithm enraged at least half of Chicago when it said the White Sox—2005 champs—would post a 72–90 record. Turned out PECOTA was exactly right. For laypeople, the leap from the national pastime to national politics might seem like a stretch. But not for Silver (who posted his first political item on Daily Kos in October). "Baseball and politics are data-driven," he's written. "But a lot of the time, that data might be used badly. In baseball, that may mean looking at a statistic like batting average when things like on-base percentage and slugging percentage are far more correlated with winning ballgames. In politics, that might mean cherry-picking a certain polling result." In other words, different sport—same skill set.
From the start, Silver took pride in myth-busting the MSM, which has tended to reduce 2008's complex calculus—delegate distribution, demographic coalitions—into not-quite-true narratives. Obama has a problem with working-class whites? Actually, he has a problem with Appalachian working-class whites—and not their cousins in Oregon and Wisconsin. And so on. The response was ecstatic, and FiveThirtyEight's daily traffic increased 5,000 percent between March and June. But the main attraction was always Silver's primary predictions. Taking a page from PECOTA—a comprehensive historical database, it projects future performance by matching current players to comparable predecessors—Poblano predicted the results in, say, Pittsburgh by measuring how Clinton and Obama did in demographically similar congressional districts earlier on (once set, their coalitions were remarkably stable). Silver's score wasn't perfect—he underestimated Clinton in Kentucky and South Dakota. But ultimately, he came within 20 delegates of the final split on Super Tuesday (out of nearly 1,700) and 2.5 percent, on average, in the other six post-March primaries. "Nate's work is innovative," says Mark Blumenthal of Pollster.com.
Has anyone ever tried to take this PECOTA system to FF?PECOTA relies on fitting a given player's past performance statistics to the performance of "comparable" Major League ballplayers by means of similarity scores. As is described in the Baseball Prospectus website's glossary:[5]
PECOTA compares each player against a database of roughly 20,000 major league batter seasons since World War II. In addition, it also draws upon a database of roughly 15,000 translated minor league seasons (1997-2006) for players that spent most of their previous season in the minor leagues. . . . PECOTA considers four broad categories of attributes in determining a hitter's comparability:
1. Production metrics – such as batting average, isolated power, and unintentional walk rate for hitters, or strikeout rate and groundball rate for pitchers.
2. Usage metrics, including career length and plate appearances or innings pitched.
3. Phenotypic attributes, including handedness, height, weight, career length (for major leaguers), and minor league level (for prospects).
4. Fielding Position (for hitters) or starting/relief role (for pitchers). . . . In most cases, the database is large enough to provide a meaningfully large set of appropriate comparables. When it isn't, the program is designed to 'cheat' by expanding its tolerance for dissimilar players until a reasonable sample size is reached.
PECOTA uses nearest neighbor analysis to match the individual player with a set of other players who are most similar to him. Although drawing on the underlying concept of Bill James' similarity scores, PECOTA calculates these scores in a distinct way that leads to a very different set of "comparables" than James' method.[6] Furthermore, Silver describes the following distinct feature:
The PECOTA similarity scores are based primarily on looking at a three-year window of a pitcher’s performance. Thus, we might look at what a pitcher did from ages 35-37, and compare that against the most similar age 35-37 performances, after adjusting for parks, league effects, and a whole host of other things. This is different from the similarity scores you might see at baseball-reference.com or in other places, which attempt to evaluate the totality of a player’s career up to a given age."[7]
Once a set of "comparables" is determined for each player, his future performance forecast is based on the historical performance of his "comparables." For example, a 26 year-old's forecast performance in the coming season will be based on how the most comparable Major League 26 year-olds performed in their subsequent season.