Wednesday, October 23, 2024

The polls don't ell us what voters think. They tell us what data scientists suspect voters might think. Based on data scientist's unvalidated assumptions.

From Here’s What My Gut Says About the Election. But Don’t Trust Anyone’s Gut, Even Mine. by Nate Silver.  He's warning everyone about the closeness of the polling and that polls cannot tell us who will win because the election is 50:50.  If the electorate is in reality 50:50, then more and more accurate polling will only increase the confidence that it is indeed 50:50.  It won't give a winner.

A fair point.

But he makes a different point which I think is important.

Instead, the likely problem is what pollsters call nonresponse bias. It’s not that Trump voters are lying to pollsters; it’s that in 2016 and 2020, pollsters weren’t reaching enough of them.

Nonresponse bias can be a hard problem to solve. Response rates to even the best telephone polls are in the single digits — in some sense, the people who choose to respond to polls are unusual. Trump supporters often have lower civic engagement and social trust, so they can be less inclined to complete a survey from a news organization. Pollsters are attempting to correct for this problem with increasingly aggressive data-massaging techniques, like weighing by educational attainment (college-educated voters are more likely to respond to surveys) or even by how people say they voted in the past. There’s no guarantee any of this will work.

For polling to work, you have to have 1) a sufficiently large responding population who 2) are RANDOMLY selected.  The first requirement is rarely met and the second is never met.  You cannot compel people to respond to a poll and therefore, straight out of the gate, you are having to deal with the issue of self-selection - Are the people who choose to participate in a poll fully representative of those who choose not to do so.  Virtually never.  

Further, polls, for a variety of reasons, have historically oversampled Democrats.  The pollsters have compensated by extrapolating results from the undersampled portion of Republican respondents, dramatically exacerbating the error rate, especially when the overall sample size is insufficient in the first place.  

Over the past few election cycles, polls have become more and more expensive, respondents have become fewer and fewer (0.4%), and sampling is less and less random.  As much effort is now spent manipulating the data produced from the polling as spent on conducting the polling itself.  

The polls are no longer polls.  They do not tell us what a random selection of voters think.  They tell us what data scientists suspect voters might think.  Based on data scientist's unvalidated assumptions.  The link to reality has been completely broken.  It is all Garbage In Garbage Out.  

No comments:

Post a Comment