On the Predictive Value of Election Models

Election models dominate political coverage, but their predictive value—and the discourse around them—often misunderstands their true purpose.

As the election draws ever closer, more and more of our political coverage seems to be revolving around various election models. This, surely, cannot come as much of a surprise—everyone loves a good horse race, and this election certainly is shaping up to be one—but it certainly is not a particularly productive means of framing the election. Worse yet, the coverage (and some of the election models) seem to fundamentally misunderstand the function, value, and criticisms of quantitative election modeling, which I hope to address here.

Much of the recent online discourse stems from a Politico piece written by political scientist and Stanford professor Justin Grimmer, which summarizes to a lay audience the content of a research paper he co-wrote about the reliability of election forecasts. The paper argues that it is functionally impossible to gauge whether an election model is (a) statistically significantly better than random guessing, and in particular (b) statistically significantly better or worse than any other election model, because there is not a large enough sample of presidential elections to gauge if a particular model performs better due to its superior construction or random chance; I previously described this reasoning to be in the same vein as one baseball player batting .200 (replacement-level) and another batting .250 (above-average) being nearly indistinguishable from each other to a viewer or the difference between a safe state and a tossup being just one in roughly twenty voters: it takes a fairly large sample size to discern a meaningful difference between two observations in each case.

I agree directionally with the paper, but I think it misjudges the scope of the problem; I also certainly do not think the sensationalized “Don’t Trust the Election Forecasts” title of the Politico piece did Grimmer’s arguments any favors. While the paper goes to some lengths to provide generous assumptions to help the forecasters’ case, it suggests that, say, a Congressional election model made by a forecaster is too different in setting to judge alongside their presidential models. It’s evident that adding House and Senate forecasts would greatly decrease the expected time to judge a modeler’s validity—not only do Congressional races occur twice as frequently as presidential elections, there are also far more races to compare models to. I disagree with the paper that Congressional and presidential races occur in too distinct of settings to compare them—in many cases, those models are based on the same sets of data, interpolated in similar ways, and arguably a Congressional forecast has to deal with more and varied challenges than a presidential model. The paper also assumes that a pundit makes election guesses essentially at random, assigning a hypothetical 50% accuracy, an assumption I take issue with. Qualitative forecasters are notoriously quite fickle and arguably make difficult predictions with less accuracy—notably, the Crystal Ball’s 2022 Senate forecast had what seemed to be an internal decision to forecast 51 Republican seats, leading to an unusual last-minute ratings switch based on the prediction of a separate, Nevada-specific qualitative forecaster. Some qualitative forecasters simply refuse to assign ratings to difficult or unusual races, which surely does not bode any better for their forecasting abilities.

My biggest criticism of the paper, however, is in the way it simulates the differences between election models. Looking solely at the 2020 election, it assigns some comically large times (in the millenia!) for election models to be meaningfully distinguished from each other. Despite being a fairly close election, the 2020 electoral map was fairly static, featuring a relatively small handful of competitive states. While the same is true of the 2024 election map, the same was not the case with how the 2016 election was perceived throughout the cycle, and certainly not in cycles like 2008. Such an assumption could be justified—with increasing political polarization it seems to be the case that fewer states are contested—but it is impossible to validate that assumption, and it seems like a poor one to make when there are other judges for how accurate a forecast is, including voteshare and margin. Having utilized these methods or incorporated different models, such as those from previous election cycles or those for Congressional races in 2020, where variance was higher, likely would have led to a significantly reduced time to distinguish forecasts from each other.

With that said, I think the paper makes a fairly convincing argument to lessen the focus on election forecasts, and I certainly don’t think criticism of Grimmer’s article, the paper, or of election forecasts comes from a statistics-forward perspective. I suspect most political observers looked at the fairly sensationalized headline and evaluated the claim based on their priors. Notably, neither the paper nor the article attempts to discuss model methodology or transparency, because it is orthogonal to their analysis of election models, yet much of the criticism of election models stemming from the article addresses these aspects of modeling instead.

Hypercharged by the ‘model swap’ and the lack of clarity around the post-DNC FiveThirtyEight presidential election model (created by G. Elliott Morris, not Nate Silver as in years past), we’ve seen increasing judgment of election models on their methodology rather than their ability to model elections, which is an inaccurate value system—there is no value in a well-designed election model which fails to perform the task it was built to do! While Morris’s initial FiveThirtyEight election model was very precisely built and made clear, academic assumptions and modeling decisions, it ultimately failed to capture the electoral environment it attempted to model. While it is impossible to validate the accuracy of election models prior to the election, it seems that the current FiveThirtyEight model, while more opaque than its predecessor, better captures the electoral environment and has a greater ability to adapt to disruptions (say, for example, a poor debate showing).

Ultimately, the best election model is one which makes reasonably justifiable assumptions to most accurately forecast election results. It is vitally important that a good election model does both—there is no value in a model which makes invalid assumptions and is lucky, or one which makes justifiable assumptions (or no assumptions at all!) and forecasts incorrectly. Notably, Allan Lichtman performs the former: his Thirteen Keys to the White House, while accurate, seem to have functioned mostly on luck and some hackery, retroactively adjusting definitions to justify the 2000 and 2016 divergence between the popular vote and the Electoral College. It is trivially easy to dismiss his methodology: consider the following five questions:

  1. Was the home stadium of the losing team in the election year’s MLB World Series located within the old Confederate States of America?
  2. Was the host of the election year’s Summer Olympics located in the Northern Hemisphere?
  3. Is the percentage of vowels in the Democratic candidate’s full first name greater than 30%?
  4. Is the largest prime factor of the election year greater than 200?
  5. Did the Edmonton Oilers fail to make the NHL Stanley Cup Finals in the election year?

As it turns out, in every presidential election since 1984, if the answer to at least three of these questions is Yes, the Democratic Party wins the presidency—if you are a Democrat, you best be rooting for the Rangers or Astros and the Braves to win their pennant races! But certainly this is a frivolous method for divining the election results and yet not much more frivolous than Lichtman’s Keys, which arguably are more subjective to his whims and seem to, more or less, allow Lichtman to impose his personal opinions rather than provide objective electoral analysis. On the flipside, the latter is clearly an inadequate way to construct an election model as well: a key aspect of election modeling is to make assumptions about the electorate and voting patterns, because fundamentally, election modeling is not a standard machine learning classification task. As Silver articulated in 2020, applying a machine learning method to years of testing data is not a particularly challenging task, but it does little to provide an accurate prediction—in some ways, it is more akin to Lichtman’s fitting to particular variables than it is to a useful algorithm, and it fundamentally disregards the necessity for political experience and knowledge in model construction—surely a more astute political observer would find it improbable that Rep. Maria Gluesenkamp Perez, running in one of the Democrats’ toughest re-election campaigns in the House, and Kelly Ayotte, a Republican with a subpar electoral record, running in increasingly Democratic New Hampshire, would both win in the same electoral environment more than half of the time.

Ultimately, much of this factors into our decision at VoteHub to forgo building headlining 2024 election models, though there certainly is value to be gleaned from well-constructed models—we encourage readers to view models built by Split TicketFiveThirtyEight, and the Silver Bulletin, to name a few. While we do not believe we have many novel insights to share for top-of-the-ticket races, we do believe there is a gap in quality, quantitative coverage of downballot races, which we may try to fill in upcoming cycles as well as this one—watch this space!