What's new
Fantasy Football - Footballguys Forums

Welcome to Our Forums. Once you've registered and logged in, you're primed to talk football, among other topics, with the sharpest and most experienced fantasy players on the internet.

332-95 (2 Viewers)

Good point. I did not even touch on the non-response rate or the other errors that can occur outside of random variation. My take on the 2016 polls is that 1) most were leaning or pretty clearly pointing towards Hillary but 2) they were far from slam dunks as those confidence intervals included scenarios in which Trump comes out the winner, and 3) most apropos to this conversation, I believe we hit a new level where subjects either did not tell the truth that they were voting for Trump or that they waffled at high rates so when it came to actually voting, they swung to Trump. Plus you've got the Comey debacle which hit in October which also threw a wrench into predictions.
I personally saw this a good bit. I know people who voted for Trump holding their nose hoping for a conservative Supreme Court Judge appointment. We can argue they did the wrong thing. I'm just telling you what they did. And as we've seen here, it's exceedingly unpopular in some circles to be a Trump voter. Even in places like Tennessee where he won. I fully understand why they'd not be truthful on a poll. Denying Science and all that. 

 
Last edited by a moderator:
About polls and surveys:

1) Online polls and/or any survey in which people may choose to respond by their own choice are bunk and should be either tossed out altogether or consumed merely for their entertainment value. Think: a Presidential poll on November 7th, 2016 in which online visitors to CNN's website could click who they're voting for -- versus -- the same poll at the Fox website. Might be fun to look at but I wouldn't make any predictions based on that data.

2) One major goal of Statistics in general and polling in particular is to try to capture a single value (often a percent) about a very large population at one snapshot of time. For instance, to take this out of the realm of politics: Among Americans over 18 (a population of hundreds of millions of people), what percent of them would favor abolishing the penny?  I envision this percentage as an unknowable number, that true percentage among all of those folks. If you believe in God, you'd probably say that He'd know that percentage even though it is constantly changing as people leave the population by dying or enter the population by turning 18 or by emigrating here (that's another thread).

That magical, elusive percentage is what a statistician would like to know and it is called a "parameter."  If every single member of that population could be asked, then we should be able to get a good handle on that parameter. But that's a very time consuming and costly proposition. So much so that we only do this kind of statistics rarely and it is called a census. For the sake of illustration, let's suppose that we know the true parameter (even though statisticians almost never do) and that the true percentage of our population who want to abolish the penny is 44%.

Instead of attempting a census on a very large population, statisticians realized that one could get a reasonable approximation for this parameter by taking a random sample of the population and using that percentage as the best approximation. Randomness is key. If you sample people who are hanging around the mall or a church or an NFL game, you might get skewed results if the members of that sample are not representative of the whole population Heaven forbid you sample attendees at a coin collecting convention. So the statistician sets out to conduct a random sample.

3) An key fact about sampling is that, even for a population in the hundreds of millions, one only needs to sample a relatively small fraction of them to be able to get a decent approximation of the parameter. Think of it like ocean water. Just because the ocean is enormous doesn't mean that we'd need a lot of water to take a reasonably representative sample. It just needs to be mixed well and we need to choose a sample at random not somewhere convenient, like right off a pier.

Turns out that a sample of only about 1000 subjects is enough for most situations so if you look closely at most polls, they will say something like "1023 people were surveyed." This is rather amazing: a well-conducted random sample of only 1000 people will give a reasonably accurate estimate for the true percentage of all Americans who favor or oppose a proposition. Such a sample yields a "margin of error" of about +/- 3%, roughly.

4) Conducting a random sample of 1000 Americans over the age of 18, though, is a royal PIA. You cannot very well put all those names in a hat and pull out 1000 of them. I won't get into the gory details but good samples tend to break this up into stages and do a stratified random sample. Suffice to say, it is easier to cut corners and make a specious sample than it is to do it well. This is why I trust the big pollsters like Gallup, Roper, Quinnipiac, etc. because they have the funds and expertise to conduct this random sample. Now, as you must suspect, it is possible that if you only sample 1000 people, we could get extraordinarily unlucky and just happen to select all penny-lovers in our sample even though, in truth, 44% of our population want to abolish it. This can happen but you can also hit the Powerball on three consecutive weeks. Sure it can happen but it's very, very, very unlikely. The techniques of statistics allow us to quantify just how likely it is that our random sample will be very far off from what one would expect after doing a random sample. The percentage of people in our sample who want to abolish is called a "statistic" and let's for the sake of argument assume that it came out as 40%

5) That statistic (the 40%) would vary from sample to sample. If we redid the sample of 1000 again, we might get 45%. And again and get 41%. But it varies according to a pattern which is very well understood. Imagine if you sampled over and over again (like millions of times), those percentages would dance around the true parameter percentage of 44% with some hitting right on the money and some being pretty far away (maybe as far away as 30% or 60% but very, very unlikely that it would be much further from 44% unless our sample was tainted). Graphing all of those different percentages would reveal a bell curve (a normal distribution) with the peak at 44% and with it trailing in to the rare tails down towards 30% on the left and 60% on the right.

6) Here is the bummer: we usually only have the time and energy to do one sample. So let's assume we got 40% for our sample statistic. Remember that the parameter was 44% but we need to pretend like we didn't know this because statisticians never know this "true" number.  So we really need to rely on that 40% as our best estimate. If someone put a gun to your head and said "Predict the true parameter" you should guess 40% because that's what the sample said. But we would not have much confidence in this result because of the sampling variability mentioned above. I'd feel much better if I could say "I think the true parameter is pretty close to 40%". In fact, if I were to give that +/- 3% wiggle room, I would report that I'm pretty confident that the parameter is somewhere between 37% and 43%. That range gives us 95% confidence that we've captured the parameter in that interval.

7) Whoops, the parameter is actually not in that interval. We got unlucky, and that happens about 5% of the time. We do a perfectly random sample, we get an estimation and give ourselves the 3% wiggle room and still we wiffed. It happens about 5% of the time. But you never know when it is going to happen--we do not know what the parameter is, remember. So we claim it is in that interval but we cannot be certain. This, by the way, is a central difference between mathematics as statistics. Mathematicians are certain (they prove things) while Statisticians wrestle with probabilities and can tell you when something is likely to be true or false.

The conclusion is: look for a random sample of at least 1000, go ahead and do your +/- margin of error, but do not assume that the "true percentage" lies in that range. We just don't know in any given sample, although our confidence grows with more samples or with larger ones.
There's also free sampling at Costco and Sam's on the weekends. 

Seriously, thank you for taking the time to write this up. You must be a great teacher.  I took Calc in college about 30 years ago and was all "dude, I'm so high. This is crazy". 

 
I personally saw this a good bit. I know people who voted for Trump holding their nose hoping for a conservative Supreme Court Judge appointment. 
Love this.  Ignore everything else you believe in because you're so focused on that one thing.

"But God has a history of using imperfect people to do his bidding."

And with that you can justify supporting anything. 

And they did.

 
There's also free sampling at Costco and Sam's on the weekends. 

Seriously, thank you for taking the time to write this up. You must be a great teacher.  I took Calc in college about 30 years ago and was all "dude, I'm so high. This is crazy". 
You’re welcome. So you’re saying you want another long post, this time on Calculus?

eta: I smell a new sub forum.

 
Last edited by a moderator:
Non response bias or margin of error aside, the 2016 polls had Clinton winning. I do think there is something to be said for people not admitting to voting for Trump. It's a good lesson for people that you shouldn't be concerned with the polls and make sure you go vote for the candidate you want to win.

 
Non response bias or margin of error aside, the 2016 polls had Clinton winning. I do think there is something to be said for people not admitting to voting for Trump. It's a good lesson for people that you shouldn't be concerned with the polls and make sure you go vote for the candidate you want to win.
And yet they were still painfully accurate as far as pop vote total.  Again, polls were pretty accurate nationally.  The state by state (to help predict electoral vote) were a little less so.  And the analysis was really off leading to bad projections and probability.

 
And yet they were still painfully accurate as far as pop vote total.  Again, polls were pretty accurate nationally.  The state by state (to help predict electoral vote) were a little less so.  And the analysis was really off leading to bad projections and probability.
Completely agree.

 

Users who are viewing this thread

Top