Questions on the polling data:
- Is there a secret Trump vote that's even more pronounced than currently thought? I would say that I have had conversations with some people who believe this, but that's enough to claim any large polling errors.
- Most polling show Democrats are being sampled higher than Republicans. For example, I think the NBC/WSJ poll showed 45% Democrats and 36% Republicans. Why are these samples tilted to this degree? Does this suggest that pollsters believe there are more Democrats than Republicans, and sample more of them as a result? If not, what is the reasoning for this?
1.) The strongest form of the "shy Trump voter" theory -- that Trump voters are afraid to express their intentions to pollsters for fear of being judged -- seems to be wrong. If it was true, we'd expect Trump to perform better in automated/online polls in which respondents don't have to talk to a live person, but we don't actually see much of a gap between live-caller and non-live-caller polls.
That's not to say the polls can't be underestimating Trump support, though. If they are, the main reason is probably non-response bias. White people with low levels of social trust and no college degree are much less likely to answer the phone when an unknown number is calling, and that's a very pro-Trump demographic. Undersampling these people was the main source of polling error in 2016. To correct this, lots of pollsters now weight their samples by educational attainment -- this fixes some (but not all) of the problem. And even if white/non-college voters are undersampled again this year, it probably won't result in quite as much error, because Biden is doing much better with that demographic relative to Clinton.
2.) Based on the General Social Survey and American National Election Survey, two gold-standard nonpartisan surveys, there are in fact more people who identify as Democrats -- and there have been for decades. The most recent figures where D+8 (GSS) and D+7 (ANES). Different pollsters have different methods of handling partisanship. For example, when polling states that have party-based registration, NYT/Siena uses a quota. In their recently released poll of AZ they purposefully collected a sample that was R+4 in registration.
You can't do that in national polls, though, since you don't actually register by party in 19 states. You can ask for party ID, as they did in that NBC/WSJ poll, but most pollsters don't weight by it because it's fairly volatile (and is often just a lagging indicator of voting intent). National pollsters will typically weight by more reliable variables like age, race, sex, etc. and sometimes that results in a sample that has a strange partisan skew. I wouldn't worry about it much though, because the national polls are some of the best we have. Even in 2016, the national polls were very accurate despite the state polls having some issues. Usually when folks begin to pick apart the partisan composition of a poll, it means they're trying to discredit some bad results for their preferred candidate.