Election 2016 Hidden Trump Supporters might be hiding in plain sight
From the Democrat side, anyone who questions polls is a bit off (except if an ‘outlier poll’ shows Trump up, then it is ok to disregard it, so it goes both ways). Even on Fox News, which is traditionally very conservative, many eyes have rolled at questions concerning validity of polls. After all it would take a considerable effort to fix so many polls. So, on the Democrat side you have an attitude of disbelief that conservatives are not listening to polls which represent science, a very similar argument that is used in the on-going Global Warming debate – the argument that Republicans are ‘anti-science’.
From the Republican side, many of the poll numbers showing Clinton up by double digits do not make sense. They see the mega rallies for Trump and read news about Republican primary vote hitting records, so something seems off. Their answer is the hidden Trump supporter (which has also been covered here under the term of Social Desirability Bias) which they say exists but does not want to make their voting inclination public. According to this argument, come election-day, voters will come out of the wood-work and vote Trump. Further, they believe that within poll calculations there could be some sort of data manipulation or misinterpretation to produce data distortions.
My inclination is to look at the actual numbers. This has led me to focus on two points: (1) Social Desirability Bias, and (2) researching actual poll data to see if ‘hidden Trump supporters’ exist and if so why hasn’t their presence been felt in the final poll results.
Comparing the poll results by method of data collection shows a considerable bias against Trump when using a live interviewer. This phenomenon has been covered in other posts as Social Desirability Bias. A similar effect occurred in Brexit as more anonymous polls had Remain and Leave as neck-and-neck whereas live interviewer polls had Remain up by around 10 percentage points just before the vote.
Social Desirability Bias aside, a point of frustration in analyzing polls further in order to pinpoint where issues occur surrounding the ‘hidden Trump supporter’ is the lack of data transparency. Many polls just show the headline figures. Other polls provide anywhere from around 5 to 20 pages of data, which is great but knowing there is considerably more data available that is not shown leaves a large hole in any analysis.
A recent GWU Battleground poll provides 461 pages. That’s right. It is a crazy amount of data. This data transparency significantly helps in determining, but does not fully resolve, how polls can vary so much and will be the main topic of this post.
Before getting into some numbers, my basic conclusion is that pollsters are not using the data they have available to determine ‘likely voters’ but are relying on their own judgment which is likely based off historical norms or on an approved ‘likely voter’ algorithm that is heavily weighted towards historical norms. In these cases, they likely can see the ‘hidden Trump supporter’ but end up throwing them out of the sample or weighing their responses at a lower level so that the demographic group weights reflect historical election data.
A partial result of all of this is that a surge in support for Trump from demographic groups that normally do not turnout in very large numbers is being disregarded or diluted. A Republican talking point of late has been ‘undersampling’ of Republicans in polls is throwing off the results. Pollsters might very well not be ‘undersampling’ but simply disregarding or under-weighting certain demographic groups due to historical norms. This is not necessarily nefarious, but the end result of downplaying support for Trump is the same.
Some pollsters have pre-existing ‘likely voter’ algorithms that produce this effect without having given much thought to the impact. For instance, many polls either directly or indirectly state they use weights derived from previous elections. Others make known some of the filters such as having to have voted in one of the last X elections. Assuming many of these ‘hidden Trump supporters’ do not fit neatly into such categories they will either get rejected from the likely voter sample pool (meaning they will be ignored) or included in the sample but then as a group diluted by a weighting process (that is if they fall in a demographic group that normally has lower turnout, like ‘whites without college degrees’ which is highly supportive of Trump).
The very nice thing with this explanation is that it shows that neither the Democrats nor Republicans are in fact crazy and that both have valid points. In this scenario, which is supported by the GWU Battleground poll data, analysts are (apparently) inadvertently disregarding such ‘hidden Trump supporters’. In other words, these supporters are implied to exist in the polling data but their full impact on the poll results are not being felt as they are (apparently) being disregarded or diluted to a significant degree.
Repeating to be clear, it seems like pollsters know of the ‘hidden Trump supporter’ but are relying too heavily on historic norms so these supporters are not being counted on a one-to-one basis with other demographic groups or are simply not making the cut due to previous low probability voting behavior.
This does not impact Social Desirability Bias analysis. The fact that polls using different data collection methods show such starkly different conclusions shows a bias that would not be impacted by ‘likely voter’ filters or demographic weights, unless of course the polls that happen to use more anonymous data collection also happen to use non-historically based weights.
If you have not picked up on it yet, polls tend to be very transparent on the front and back ends but not in the middle. So, they will provide good transparency on questions asked, number of respondents, method of data collection, etc. They will also provide the results of the study that includes not only the final percentages but also things like margin of error. However, the middle portion of how they determine likely voters and weights of different groups is not really transparent. Historically, this has not been that important as relative interest levels and voter turnout from many of the main demographic groups has shifted but remained fairly stable.
The 2016 election is different from historical elections in a variety of ways (covered in other posts). One of the main differences is that there appears to be an unusually large shift in voter enthusiasm towards large demographic groups that have historically had lower turnout and away from smaller demographic groups that have historically posted strong turnout. Further, the groups on the rise are heavily supporting Trump and those on decline heavily support Clinton. You can see how, if you do not create the proper ‘likely voter’ filter or demographic weights, the end result of a poll could be significantly off.
This phenomenon appears larger than anything the US has experienced in at least 100 years. However, in 2008 a similar effect occurred, but on a smaller scale. In that election, and to a lesser extent in 2012, the turnout of many minority groups surged whereas the turnout of whites without college degrees declined – producing a very strong pro-Obama turnout. In 2016, the reverse appears to be occurring but the size of the spike seems to be much higher and within a larger demographic group. It is a little surprising that more pollsters and analysts are not discussing this potential impact as 2008 voter turnout shifts were well covered.
In another post, we discussed how in 2008 many of the ‘likely voter’ filter questions turned out to be invalid. In that case, questions regarding historical voting behavior or demographic group membership were less successful in determining actual votes than ‘interest’ in the election and ‘enthusiasm’ for the candidate. In 2008, Obama’s supporters clearly led on these metrics but where mostly under-estimated. A very similar issue appears to exist in 2016, but on a far greater scale. The fact that so many are turning a blind eye to this phenomenon when a similar case was recently studied just seems unconscionable.
Now, getting to the data from the GWU Battleground poll, some highlights:
All of these point towards an advantage for Trump.
Clinton essentially needs women to continue to post higher turnout than men, for whites without college degrees to maintain their historically low turnout, and for African-Americans to continue at high turnout levels. Clinton essentially needs historical voting patterns to continue through the 2016 election. But, the GWU Battleground poll data shows that these trends will likely invert. And, it is not just this poll but many others that point to similar conclusions.
Pollsters, however, can manipulate the data as it goes through the pipeline to conform to historical norms. If they were to weight responses on interest and expected turnout levels, the ‘hidden Trump supporters’ would be revealed. While pollsters disregard and dilute such supporters due to historical turnout models or historical voting behavior these supporters will remain hidden in plain sight.
Before going further, we should note that women and African-Americans have a very strong historical inclination to vote Democrat, so any relative decrease in their turnout would hurt Clinton. Also, ‘whites without college degrees’ are polling extremely strongly for Trump, so any increase in their turnout will directly hurt Clinton as well.
First, let’s look at how women compare to men in terms of likelihood to vote.
Chart 1: ‘Extremely Likely to Vote’ by Race and Sex
Source: GWU Battleground poll
Historically, you can see that women in every major demographic voted at higher rates than their male counterparts. This has been a fairly consistent trend over the last generation. However, when comparing this data to likelihood of voting in 2016, men look more interested in this particular election. In particular, white males who generally vote in favor of Republicans are shown leap-frogging from third place in 2012 to the highest turnout level in 2016e. Hispanic men also jump considerably in 2016e, to be higher than Hispanic women.
This surge in interest from men in 2016 does not appear to be being fully incorporated into the weighting models used by polls. Although such findings of higher interest expressed by men are shown in other poll data, it seems like analyst have preferred to use historical turnout weights and not 2016 interest levels.
Another interesting point is that African-American interest seems to have declined in 2016. This point again is confirmed in other polls. This makes sense as this demographic group turned out in very high numbers for Obama, breaking records, but to-date they have not shown the same kind of enthusiasm for Clinton. But again, such a decline in voter likelihood, though showing up in initial polling, does not appear to be making it through to the final results.
Further comparing the numbers we can see just how much the potential swing can be between elections.
Table 1: Comparing Voter Turnout by Race and Sex, Women Turnout minus Men Turnout by Black, Hispanic, and White Demographic Group, 2012 Actual versus 2016e GWU Battleground Poll “Extremely Likely” to Vote
2012 Actual Turnout |
2016e GWU Turnout |
|
Black Women – Men Turnout |
9% |
11% |
Hispanic Women – Men Turnout |
4% |
-4% |
White Women – Men Turnout |
3% |
-3% |
Source: GWU Battleground poll
Using the same data form Chart 1, Table 1 shows that in 2016e men could actually post higher voter turnout levels than women for both Whites and Hispanics. This would be a terrible blow to Clinton who has targeted women as presumably her key demographic.
In terms of finding the ‘hidden Trump supporter’, the fact that men are showing extremely high interest levels in 2016, much more so than in previous elections, is a good starting point. Men historically vote at much higher rates for Republicans and have polled very strongly for Trump. If their aggregate inclination to vote Trump is being diluted due to historical voting patterns then certainly this could go a long way uncovering this hidden supporter.
We can look at the same data by seeing how the actual results in 2012 compare to the 2016e data.
Table 2: Comparison of Voter Turnout and ‘Extremely Likely’ Voters by Race and Sex, Difference between 2016e GWU Battleground and 2012 actual
Voter Turnout, Difference between 2016e – 2012a | |
Hispanic Men |
15% |
Hispanic Women |
7% |
Black Men |
-4% |
White Men |
15% |
White Women |
9% |
Black Women |
-2% |
Source: GWU Battleground Poll
Looking at the data this way, we can see just how large the spike could be for White and Hispanic males. Interestingly, the increases for White and Hispanic women are approximately the same as well making them similarly proportional to the increases for men, which tend to confirm the data.
Turning to the key swing demographic of ‘whites without college degrees’, it looks like this group that historically has posted low and declining turnout is experiencing a huge uptick in interest in the 2016 election.
Chart 2: Comparing Actual 2012 Turnout and GWU Battleground Poll “Extremely Likely” Voter Percentages for Whites with College Degrees and Whites without College Degrees
Source: Center for Immigration Studies, GWU Battleground Poll
In a normal election year, the demographic group of whites with college degrees posts significantly higher turnout than the whites without college degrees group. In 2016e, this should continue but the margin of difference is expected to shrink substantially. The group whites with college degrees currently polls just slightly above it actual turnout level of 2012. This is about where you might expect it as the turnout from 2012 for this group was very high at 79% — it would be difficult for it to go much higher. Additionally, a variety of ‘interest’ indicators have pointed to the white demographic following this election more closely than in the past, so you could expect an uptick.
The main reason for the gap closing is that whites without college degrees are showing substantial interest in this election, much more so than in the past. Comparing the actual 2012 turnout and the “extremely likely” responses in the GWU poll shows that whites without college degrees could jump by an astounding 12 percentage points. Assuming this demographic turns out to such a degree, it would break anything seen in the last generation for this group and it would also be a shock to almost all of the turnout models which overwhelmingly appear to reflect historical norms and not 2016 expressions of interest or enthusiasm.
Next, we can look at breakdown of interest by political groups. The following chart shows the main political divisions by sex.
Chart 3: Comparing “Extremely Likely” Voting Responses by Major Party and Sex
Source: GWU Battleground
This breakdown of the data also shows a strong inclination for likely voters to support Trump. We can clearly see that very high percentages of both sexes of GOP voters declare themselves extremely likely to vote this year. What is surprising is just how much lower the Democrat intentions are – for instance, female Democrat voters are a full 9 percentage points lower than female GOP voters. Also, the female Democrat voters post a lower intention to vote than their male counterparts. According to common wisdom, this was supposed to be the year that energized female voters would vote disproportionately for Clinton. However, these figures show that the most energized group is female GOP voters, not pro-Clinton voters.
Additionally, the swing voters or the Independent voters tend to lean Trump from a demographics standpoint. Again, normally female voters lean more Democrat and male voters lean Republican, so the fact that male Independents declare a higher likelihood of voting shows another slight negative for Clinton.
Looking at this data, the weight of evidence is that the interest in the election or the intention to vote in the election clearly resides with Trump. From a pure demographics standpoint, the groups that mostly support him show higher likelihood of voting. Additionally, the partisan divide strongly favors GOP voters. Female voters, a key or the key voting block for Clinton, appear to show slightly less interest in the election than male voters, especially on the Democrat and Independent side, which is a major strike against her. African-American voters’ interest has also declined which hurts Clinton as this group tends to vote lopsidedly in favor of Democrats.
In short, the GWU Battleground data shows that there is a very strong advantage for Trump in terms of the breakdown of those “extremely likely” to vote. You would expect that this would translate into strong final poll results for Trump. However, the GWU poll has Clinton up by 8 percentage points! You might have to reread this entire post to really understand the importance of these last few sentences. After looking at the data a variety of ways, we saw that Trump supporting demographic groups and partisan groups all have higher likelihood of voting – but we see at the end of the poll in terms of results that Clinton is forecast to essentially runaway with the election.
This is the where you can uncover many of the ‘hidden Trump supporters’ – between the obviously high interest in voting responses of Trump-supportive demographics and the obviously Clinton-supportive final poll results.
Pollsters can see them in the initial data – especially those polls that ask questions regarding interest, intention to vote, enthusiasm or similar. These hidden supporters show up in these results, at least they are assumed to as the raw data is almost always non-transparent to the public. The data strongly suggests through lopsided interest responses that such hidden supporters are in the polls. For instance, assuming that interest levels are very much in favor of Trump, as they have been shown to be in GWU Battleground and other polls, then certainly the hidden supporters would be visible.
These 461 pages of the GWU data shed a lot of light on the situation. But, it should be noted that all of the data provided is already marked “Weighted Table”. In order to quantify the size of the ‘hidden Trump supporter’ cohort the raw data would need to be made available.