Election 2016 Polling / Is Clinton a shoe-in? Clinton Landslide?
With about three weeks to go to the general election (at the time of writing not public publishing), most have already declared Clinton the winner. The average of FiveThirtyEight, PredictWise, Daily Kos, Princeton Election Consortium, New York Times, and Huffington Post puts the probability of a Clinton win at 92%.
This is just about as high as you can go. There is always a chance, even in the most lopsided election, of an extraordinary event occurring in the last month of a hotly contested race. What that thing might be is unknowable, but certainly odds-makers will leave some cushion for such an event. In other words, barring an extraordinary event, odds-makers are pretty much saying Trump does not have a chance. Status quo means Clinton will win with many calling for a landslide.
Certainly average poll figures are screaming for a Clinton victory. And, admittedly, looking at most traditional metrics it would be difficult to argue against such a weight of evidence. However, there are simply too many inconsistencies and odd data being ignored for a rational analyst to be so complacent. It is my very unfortunately duty to point out where and how so many recognized political pundits and forecasting wizards are mistaken. For a more complete analysis of some of these ideas, you will have to (unfortunately) sift through other posts or contact me directly as there will not be enough space to cover these concepts in one post.
There are many ways that analysts can error. One of the easiest is to not take into consideration that the underlying environment has changed and that such a change will directly impact forecasts. This is squarely where I believe we are at present. As in finance where people are paid significant amounts of money to make ‘sophisticated’ forecasts that look extremely similar to last quarter’s earnings, political analysts have relied way too heavily on recent election results and this is throwing off their entire process. It has influenced everything from expected turnout levels to which polls are most reliable.
This is a very normal error that analysts make. What is shocking is that more people have not noticed it – or at least have not pointed out that such a mistake could exist. Recall that at 92% probability for Clinton, analysts and pundits are directly stating they disregard any and all questions. It has gone to such an extent that frankly it seems like pointing out potential errors leaves you open to be called a conspiracy theorist or ‘in-the-tank’ for Trump. This is extremely unfortunate given that analysts should be more neutral and rational.
The 2016 election is considerably different from previous elections. This statement is frankly fairly well accepted so it is seems very odd that there has been little to no adjustments made in analysis. Modifying analysis techniques when underlying conditions change is very normal. Hedge funds use different trading techniques during bull and bear markets. Retail outlets will modify marketing campaigns during different economic conditions. Odds makers will change probabilities for sports teams competing on their home field. Basically, when underlying conditions or environment settings change, so should the analytical process – sometimes to a minor extent and sometimes it requires a new tool box.
Again, these concepts cannot be covered here, but are covered in other posts. Summarizing, the electoral environment has radically changed from the last two elections and polls / analytical techniques have not changed sufficiently to incorporate such shifts – this leaves the current polls wide open to inaccurate calls.
As for the how the environment has changed, a summary is provided in the following paragraph, but feel free to skip it as much of this is common knowledge to those interested in politics or even the news.
First, the electorate is unusually interested in the election. By most every metric, the interest levels of the voting populace are at record-highs, implying potentially record breaking turnout. Second, the level of interest varies significantly by demographic group, implying a rather substantial shift in turnout by demographic. Third, many pro-Democratic demographics hit or came very close to record high turnout and record high percent-voting-Democrat in 2008 or 2012. Some demographics were so exaggerated that a repeat seems unlikely (reversion to mean). Fourth, the ‘whites with no college degrees’ demographic group has shown considerable and surprising interest in this election but due to its declining turnout over the last two elections has mostly been disregarded by most. A significant swing in this demographic would likely take analysts by surprise even given fairly large pro-Trump rallies that appear to be heavily populated by this demographic. Fifth, the electorate is mad and not satisfied with either candidate. This implies that the potential swing in votes is not likely as forecastable with such precision (certainly not enough to declare a 92% probability rate of victory). Sixth, the media has been in a feud with Trump since before the primaries. Similar character attacks against a candidate from the media have not been seen in modern US history. Seventh, there is strong evidence that poll respondents have been changing their answers due to a strong Republican / Trump shaming effect. The size of such a distortion appears to vary depending on the demographic group and geographic location – analysts ignoring its existence appears fairly illogical. Eighth, average poll data show Clinton way up but polls using an IVR / robocall data collection format show the race as being extremely close. Analysts fairly universally ignore such IVR polls as being outliers or they are content to include them into averages knowing that as there are so few that use this format that their results will be diluted to the extent of being ignored. Ninth, social media points to a Trump advantage. Though in past elections such an advantage has called the winner (Obama dominated both McCain and Romney by such metrics), in 2016 analysts are mostly ignoring these metrics. Tenth, on-line activity points to a Trump advantage. As with social media, on-line activity correctly called the last two elections, but is being ignored by most analysts. Eleventh, the level of undecided voters is unusually high for this point in the election cycle. Some of the largest polls, such as the Google Consumer Survey, puts undecideds at an amazing 20%, or approximately double that from 2012 prior to the election, making this a major wild card capable of throwing off most any poll.
This partial list of significant differences in the 2016 election environment highlight the fact that tools that worked well in the previous elections might need to be modified for this election – or that they should be buttressed with additional analytical tools or data. However, as it stands currently, most well-known forecasters and pundits have placed all their bets on sticking with a fairly narrow set of tools and data – namely, polls conducted by live interviewers. In a scenario where such data collection is not as full-proof as assumed, all of the major forecasts are significantly off. This is not to say that some are off by a little and we can get into a mathematical argument over minutia. This is to say that the entire industry will get egg on its face within a month’s time.
As almost all of the current focus for forecasters is on polls, let’s take a look how polls could be off. The biggest problem getting people to see any weakness with polls is that they have a long track record, which in most cases is great. In almost all scenarios, political forecasters should focus on polls such as those that were conducted using traditional methods. This is the standard and most anyone looking to fit within an orthodox system will most certainly follow a standard and tested approach.
As explained earlier however, the 2016 election has broken the mold in such a way that it makes traditional approaches problematic. The most glaring problem is that almost all of the main polls are conducted using a live interviewer. In most elections (or in ‘normal’ elections), there would not be an issue with a live interviewer. However, in the case that one of the candidates has been successfully branded in an exceptionally negative way and/or in the case when the election is extremely emotional on both sides, live interviewer polls likely do not accurately reflect underlying support.
As stated elsewhere, this single post is not long enough to go into full explanations. But, this effect is actually quite common in socially uncomfortable settings. Respondents to polls and surveys have been repeatedly shown to change their answers for a variety of topic areas such as sexual practices, drug use, personal finances, and exercise habits when a socially more acceptable answer is easily available though not entirely accurate. So a respondent might lessen the claimed amount of cigarettes he smokes per week or increase the claimed amount of money she saves as they believe that changing those answers will make them appear more socially desirable.
Other elections have not produced biases to the same degree because candidates were not branded to such a negative extent. In 2016, however, the media and opinion leaders have magnified and repeated claims that Trump is a sexist, racist, misogynist, bigot, fascist, sexual predator, and a cocaine addict and is untrustworthy, unhinged, greedy, and narcissistic. Furthermore, similar accusations have been levied at anyone who supports Trump. So, by extension, if you support Trump you could very well be an unhinged racist sexual predator narcissist. In contrast, Clinton has many negatives to her name but the extent of branding her supporters with anything resembling the negative image that Trump supporters potentially have to carry is limited.
In other words, there is very little negative social impression by saying in public that you support Clinton. Saying in public that you support Trump carries with it potentially negative branding to that individual. This is highly unusual in the US where political affiliation has not historically impacted society to such an extent.
In the current environment, unlike most previous US election cycles, people feel social pressure to support or not support certain candidates to a much greater extent than any time in living memory. This is the type of environment where people will change their responses as they feel social pressure, real or implied, to respond in a certain socially acceptable way. There is little to no difference between the electorate feeling social pressure in this charged 2016 election environment and an individual feeling social pressure to claim to smoke fewer cigarettes.
This gap in responses can be detected in a variety of ways. For the 2016 election, comparing polls by data collection method is a fairly easy and quick way to determine if there is a significant bias. More specifically, polls using live interviewers and those using IVR or robocalls should more or less produce similar results. If the gap between these two polling methods reveals too large of a gap, then bias is likely.
The following chart shows poll medians for the first two weeks of October. The fact that there is such an enormous gap should be a red flag for analysts.
Chart 1: Clinton Margin of Popular Vote Victory, Comparing Poll Cohorts using Various Methods of Data Collection, October 1, 2016 to October 14, 2016
Source: Realclearpolitics
The live polls depict a hopeless situation for Trump. A variety of live poll results shows a Clinton lead of approximately 8 percentage points, when taking the median of the set. Being so close to the election would certainly mean that Clinton is a shoe-in that is assuming that this data is accurate.
The conflicting piece of evidence is that polls over the same time period from IVR / robocalls show that Trump is up. Not that Clinton will win by less of a margin – but that Trump actually is in the lead.
This is an astounding finding and one that is frankly being ignored. One explanation is that these polls are outliers so you can safely throw them out. Another is that Rasmussen and Gravis, two of the primary IVR agencies, are viewed as conservative leaning much the same way that some of the major media outlets are viewed as liberal leaning. Therefore, a Trump advantage is implicit in their results (this is the argument stated by detractors). Most analysts appear to view these as simple outliers that are potentially biased anyway and therefore not worth further analysis.
Another explanation is the one that appears the most likely but has been disregarded by almost every analyst and pundit – that in a highly charged emotional election, people feel more comfortable stating their true voting intention to a robocall than to a live interviewer. Additionally, the intensity of the social pressure has resulted in the difference being extremely large – the previous chart showing it to amount to approximately 9 percentage points. Such a level of bias might appear excessive, especially with the weight of live polls showing Clinton pulling away from Trump. Additionally, Social Desirability Bias has been shown elsewhere at much lower levels in the 2016 election. A difference of 9 percentage points or greater has been shown to exist in certain demographic groups (such as among African-American voters being pro-Clinton in live polls) and for other topics (such as Obama’s Approval Rating) – but it has not been shown to exist at such high levels for the general electorate in the US presidential race in the last six months.
The timing of the increase in the bias seems to coincide with the latest round of accusations of Trump as a sexual predator. Multiple accusations of Trump forcibly kissing or inappropriately touching females have surfaced. However, these accusations do not appear provable or disprovable in the weeks up to election-day. In other words, this topic will not be resolved before voting and has therefore become an extremely toxic form of branding.
On election-day, many will refer to Trump as a sexual predator, but such statements will be based on accusations. In the minds of many people these accusations are fact just as those who accuse Clinton of being a criminal believe that such is fact. In reality, however, to-date Trump has not been proven to be a sexual predator and Clinton has not served any jail time as a criminal. Sorry to both sides, but until proven as such, these political brands are in fact ‘truthy’ but not validated, in one of the most successful mud-slinging contests in US election history.
Returning to the analysis, the bias appears to have jumped to unusual levels as these accusations of sexual predatory behavior have appeared. The difference between live interview poll margins and IVR poll margins represent the size of the bias. An amazing 9 percentage points would be enough of a bias to swing almost any close election. Interestingly, comparing the polls to 2008 and 2012 at equivalent points shows that similarities exist.
Chart 2: Democrat Margin of Popular Vote Victory, Comparing Poll Cohorts using Various Methods of Data Collection in 2016 to US Presidential Elections of 2008 and 2012, data from October 1 to October 14 of the election year
Source: Realclearpolitics
The 2016 election, when using live interview polls, appears strikingly similar to the 2008 election. This is the perfect narrative for the Democrats who see the election as a similar referendum on a changing society in terms of Identity Politics. In 2008, the first non-white was elected and in 2016 the first non-male would, in this scenario, be elected – by similar margins. In a Democrat’s mindset, this data fits perfectly.
In contrast, the IVR or robocall data supports the Republican, or perhaps better stated ‘Pro-Trump’, narrative in that it shows not only that he is ahead but that a large bias exists in the live interviewer polls. This, with very little mental exertion, supports the concept that the media has created a negative environment around Trump which has become so strong that many refuse to publicly support him due to assumed social stigma. In a Republican’s mindset, this data fits perfectly.
In this sense, the 2016 election is actually incredibly important and will leave a significant mark on the country as the winning party of this election will be able to further their talking points and perception of reality for the next four to eight years. As the election has been so contentious and (for the US) violent, it is doubtful the winning side will be conciliatory but will likely use the election victory as a confirmation of their side’s ideas of how the world works. This goes well beyond policy preferences from previous elections and into concepts of what the US is as a country and how it should work.
Getting back to the question of is this election already decided in favor of Clinton. It appears that much depends on your assumption regarding Social Desirability Bias. Disregarding such bias in polls, then an average of all polls appears like a rational approach which points to a resounding Clinton victory. Assuming a bias in society, which seems likely in an extremely emotional election with a candidate branded to such a negative extent, then live interviewer polls should be seen as naturally skewed and replaced by IVR / robocall polls which would point to an incredibly tight race being determined in the last few weeks before election-day.
My personal opinion is that this election is not decided to the degree to which people assume. Placing a Clinton win at 92% at this stage given conflicting interpretations of data and a radically changing underlying environment seems just like poor forecasting. This does not say that Clinton will not win, she very well might, but placing such a high probability distorts reality and does a disservice to all forecasters, mathematicians, and analysts.
Independent of who wins this election, it will certainly be an enormous learning experience for all analysts and forecasters, and will be the topic of endless academic studies for years to come. And, if Clinton does not win with at least a healthy margin of victory, the slew of analysts calling for a landslide will have a lot of explaining to do.