View on GitHub

Election Analytics Blog

Dominic Skinnion, Harvard College

Final Prediction

November 1, 2020


Background

This election will be one of the most important in modern history.

With only 2 more days until the 2020 Presidential Election, both fears and hopes are heightening across America. Will Donald Trump remain in office, or will Joe Biden be able to defeat an incumbent president? In what has been deemed a “Referendum on President Trump” by the Wall Street Journal and others, this election has been framed around Trump as a president, rather than the specific policies. Even with hopes of swaying voters by attacking Donald Trump’s COVID-19 response, furthering of racism, and denial of the effects of climate change, Biden’s main argument is that Donald Trump is simply unfit to be President of the United States. President Trump, on the other hand, has attempted to show voters that he is “Making America Great Again,” and he says to vote for Biden would be to elect a corrupt, life-long politician of 47 years.

Unprecedented times means bad news for trusted models.

It would be a mistake to claim that this election is like any other in the history of the United States. We are at the crossroads of multiple crises: COVID-19, a racial reckoning, and the threat of an impending economic depression. While each of these are certain to shape the outcome of the 2020 Presidential Election, the unprecedented events make predictions this election even more difficult than ever before. With record lows seen across the board in economic variables, many fundamentals-based models are no longer reliable in 2020. Polling, too, has been called into question after the 2016 election, which predicted a near-certain win for Clinton. Even those whose models using the economy have performed well in the past are turning away from such variables in 2020, including Abramowitz, whose Time For Change model predicted a Trump victory in 2016. In 2020, Abramowitz instead opted to use net approval, rather than relying on grossly extrapolated predictions using 2nd Quarter GDP Growth, in order to predict the election.

Despite the 2016 polls, we can remain (cautiously) confident in 2020 polls.

Because of the uncertainty associated with extrapolating fundamentals, I have taken Nate Silver’s advice: rely on polling close to the election. Silver maintains the belief that polls were not truly wrong in 2016, and he says that polls today would have to be even worse than those in 2016 to give Trump a victory. Because of this, I will remain confident in the use of polls in my model.

Biden is predicted to win the National Popular Vote.

First, I wanted to predict the Republican Two-Party Popular Vote Share. I used the following variables in these models:

I chose these variables because I believe they do the best job of capturing public opinion about President Trump, which is very much in-line with the “Trump Referendum” speculation. One variable that I considered including was incumbency, but because of the already small sample size, adding another variable made overfitting the data a problem. Incumbency also may not play a huge role in this election, due to President Trump’s anti-establishment approach and a not very incumbent-like campaign message.

The first model predicted the Republican Two-Party Popular Vote Share from just the Republican Weighted Poll Average. The second model predicted the Republican Two-Party Popular Vote Share from the Republican Weighted Poll Average and the Last Republican Two-Party Popular Vote Share. The third model predicted the Republican Two-Party Popular Vote Share from the Republican Weighted Poll Average, the Last Republican Two-Party Popular Vote Share, and the White Percent of Population. Because there have been relatively few years in which we have had elections with current data, I relied on Leave One Out Cross Validation (LOOCV) to evaluate a few different linear models.

The results of the models are summarized below:

R Natl Models Table

We can see that the second model has the best in-sample fit, with an R-Squared and R-Squared adjusted higher than the other models. In addition, the second model had an out-of-sample (OOS) Root-Mean-Squared-Error (RMSE) of 0.0246, meaning that on average, data points not included in the training set were about 2.5% off from the model’s predicted value, which is not bad. Also, while the third model had a smaller OOS RMSE, we can see that the model’s predictors were not seen as statistically significant (as shown in the table above). Because of these reasons, I opted to use the second model for my National Popular Vote Prediction.

Because of the relatively small OOS RMSE (it was only about .5% higher than the in-sample standard error), I retrained the model on the entire data-set, rather than leaving one out. The final National Two-Party Popular Vote Model was:

Rep. Two-Party PV = 0.072 + 0.122 * Last Rep. Two-Party PV + 0.736 * Rep. Weighted Poll Avg.

These coefficients mean that for every 1% increase in Last Republican Two-Party PV, the model predicts that the Republican Two-Party PV this election will increase by 0.122%, and for every 1% increase in Republican Weighted Poll Average, the model predicts the Republican Two-Party PV this election will increase by 0.736%.

Using the data from 2020:

Our model’s prediction: Trump receives a Two-Party Vote Share of 0.464, with a 95% Confidence Interval of (0.445 – 0.482). Biden receives a Two-Party Vote Share of 0.536, with a 95% Confidence Interval of (0.518 – 0.555).

This interval is shown below as a distribution of simulated election outcomes from the Republican National PV Model:

R Natl Model Simulation

We can see that there is almost a 0% chance that President Trump wins the Popular Vote. However, this does not mean that he will not win the Electoral Vote, as he had done exactly that in the 2016 election. Because of this (very real) possibility, we turn to state-level predictions.

State Level Predictions

Joe Biden is predicted to win the Electoral Vote.

For these linear models, I again used the same three predictors as described above, but because of the increased sample size (as a result of having 50 state elections each time there is a national election), I decided to include incumbency as another variable. I called this variable R_inc and it had a value of 0 if there was no Republican Incumbent President that year, and 1 if there was a Republican Incumbent President that year. Note that all of these variables are on the state-level, not the national-level.

Again, I used LOOCV in order to select the best model. The first model predicted the Republican Two-Party Popular Vote Share from the Republican Weighted Poll Average and the Last Republican Two-Party Popular Vote Share. The second model predicted the Republican Two-Party Popular Vote Share from the Republican Weighted Poll Average, the Last Republican Two-Party Popular Vote Share, and Republican Incumbent. The third model predicted the Republican Two-Party Popular Vote Share from the Republican Weighted Poll Average, the Last Republican Two-Party Popular Vote Share, Republican Incumbent, and White Percent of Population. The fourth model predicted the Republican Two-Party Popular Vote Share from the Republican Weighted Poll Average, the Last Republican Two-Party Popular Vote Share, and White Percent of Population.

The results of the models are summarized below:

R State Models Table

We can see that the third model has the best in-sample fit, with an R-Squared and R-Squared adjusted higher than the other models. In addition, the third model had an out-of-sample (OOS) Root-Mean-Squared-Error (RMSE) of 0.0226, meaning that on average, data points not included in the training set were about 2.25% off from the model’s predicted value, which is again not bad. This was the lowest OOS RMSE of the models. It is also interesting to note that Republican Incumbency was not statistically significant in this model. Nonetheless, I decided to keep it in the model because it was better in both cross-validation and in-sample performance. Because of these reasons, I opted to use the third model for my State Vote Predictions.

I then sampled the data into a 75%-25% train-test split to fit the final model. This gave us the following model:

Rep. Two-Party PV = -0.063 + 0.157 * Last Rep. Two-Party PV + 0.929 * Rep. Weighted Poll Avg. - 0.003 * Rep. Incumbent + 0.030 * White Percent of Population

for each of the states.

These coefficients mean that for every 1% increase in Last Republican Two-Party PV, the model predicts a 0.157% increase in Republican Two-Party PV, for every 1% increase in Republican Weighted Poll Avg., the model predicts a 0.929% increase in Republican Two-Party PV, for every 1% increase in White Percent of Population, the model predicts as 0.03% increase in Republican Two-Party PV, and if the Republican Candidate is an Incumbent, the model predicts a decrease of 0.3% in Republican Two-Party PV (this one is 100 times in magnitude compared to the others because the other variables were all measures as percentages).

Using this model to predict the 2020 State results, we get the following prediction map:

R State Model Map

Where states are categorized as:

We can see that many states that voted for Trump in 2016 now are categorized as Lean Biden, like Wisconsin, Michigan, and Pennsylvania. Now Toss-Up states are those that usually lean Republican, like Georgia, Texas, and Arizona. Please note that Maine and Nebraska are treated as giving all of their Electoral Votes to the winner, whereas in reality they split the vote by district.

This gives us the following Electoral College Break-Down:

R State Model EV Bar

As we can see, Biden will win if he does indeed win all of the Solid Biden and Lean Biden states.

Our model’s prediction: Trump receives 188 Electoral Votes, with a 95% Confidence Interval of (150 – 204), and Biden receives 350 Electoral Votes, with a 95% Confidence Interval of (334 – 388).

However, I felt it was important to introduce slightly more uncertainty into the model, especially given the COVID-19 Pandemic, potentially record Voter Turnout, and the possibility of contested elections. As such, I manually constructed a 95% Confidence Interval using Margin of Error = 2 * OOS RMSE = 4.5%, giving us larger intervals. This resulted in the following prediction: Trump receives 188 (114 – 325) Electoral Votes and Biden receives 350 (213 – 424) Electoral Votes. (95% Confidence Intervals shown in parentheses).

In both cases, Biden is still predicted to win, but with more uncertainty introduced, it is possible for him to either slightly lose or to win in a major landslide victory.

Final Thoughts

I would truly be surprised if President Trump won re-election. However, it is not impossible, especially if polls are wrong yet again. If the polls are correct, however, and Biden is truly favored to win back the Midwestern states won by Trump in 2016, Trump will have a hard time winning at all.

But, with the possibility of contested elections due to mail-in voting, and the chance that votes may be thrown out entirely, the results of the election may be called into question by either side of aisle, and many Americans fear that there will not be peace after the election.