Problems with Polls

Election 2016 Problems with Current Election Analysis: Polls, What are They Good For?

Before looking closer at polls and polling, we should review some of their basics to highlight how something that is so often conducted by academics could go so wrong.

Polls provide very little transparency in how they actually determine the results. Most just release the end results, like “Candidate A 49% and Candidate B 51%”. Some release the statistics and questions used. In fact, some offer so much information that it makes your head swim. But lots of data does not equal transparency! And, without transparency, you are really not certain how they came up with their numbers.

It simply seems like polls are using a black box between data collection and output. The process looks something like the following:

They of course do not focus on the black box element. Some provide the actual questions asked in the polls which is great to see. But the data is thrown into a mysterious system that reweights and/or even throws out elements of the input.

Here are typical statements concerning methodology, taken from Rasmussen Reports :

“After the surveys are completed, the raw data is processed through a weighting program to ensure that the sample reflects the overall population in terms of age, race, gender, political party, and other factors.”

“For political surveys, census bureau data provides a starting point and a series of screening questions are used to determine likely voters. The questions involve voting history, interest in the current campaign, and likely voting intentions.”

“Rasmussen Reports determines its partisan weighting targets through a dynamic weighting system that takes into account the state’s voting history, national trends, and recent polling in a particular state or geographic area.”

I do not mean to single out Rasmussen, as they seem to be one of the best. But, this stage of changing the data appears very much like a dark art. They mention the Census Bureau and it sounds fine in general. But by the time you finish reading your 50^th poll and realize that none of them actually showed you original data and none of them actually explained their ‘weighting process’ or how they transformed the original data into the final data …. it means it is a black box and/or that they have some individuals making judgment calls and massaging the data that they do not want known. This is fairly scary.

I might not say this if I had not spent a career in finance analyzing investments and investment systems. From the outside, investment analysis and trading systems look impressive as do their creators. But once you have seen them from the inside you realize that much of the actual analysis is done by analysts making judgment calls, sometimes good and sometimes not so good calls. Then there are black boxes where data goes in and other data comes out and only a few people really know how it transforms that data. Black boxes are notorious for working well in historical cases and failing right when you need them most. By the way, the most common excuse for a black box failing is that the conditions were so much different than the historical conditions on which it was tested – meaning “it worked great during the bull market, we never tested it on a bear market, oops!”

Assume that this election cycle really is as different as most believe – that this election will essentially re-write many of the political rules. Why would you expect a black box approach to work in this 2016 election environment when things are changing so dramatically? Well, you would not expect it to work.

I am afraid that many polls are inherently off as they are based on black boxes tested in other environments and ‘approved’ for use as long as those conditions do not vary that much. Further, I believe that post-election we will repeatedly hear this as the excuse for how polls could have been so off – ‘it wasn’t our fault, the conditions changed too dramatically’. (But everyone analyzing the election knew the conditions were changing dramatically, so this will be but an excuse of convenience.)

Here are some observations and suggestions:

For the polls that are relying on experienced analysts to make the calls, you need to be more transparent. Letting people know what adjustments you are making and why is important – and who these analysts are.
For the polls that rely on black boxes, you should really tell the public what are your main assumptions on the environment so that we know whether to trust your end results or not (are you keeping demographic based turnout ratios the same as last election?).
Raw data should be provided separately or in conjunction with the polls final results.

Finally, getting to the question that many Republicans are likely thinking – does this mean that the polls are ‘rigged’ or manipulated? No.

The polls are likely not being rigged in favor of a candidate on purpose. However, if polls are depending heavily on things like relative turnout from last election to forecast 2016 results it will by default provide Clinton an advantage. So, there would appear to be a natural skew (simply due to the backward looking nature of the polls that rely on 2008 and 2012 data) in favor of Clinton but without more transparency it is difficult to pinpoint an exact degree.

Does this mean that people will continue to doubt the validity of polls until they start to become more transparent? Yes.

There is very little reason for polling companies to sustain high trust levels without more transparency. But what else could you expect in a low transparency sector?