Wednesday, November 7, 2012

100% Accurate Prediction Map, But a Word on Probabilities

Above is the predicted outcome map calculated the night before the election and the results map from Politico. This demonstrates that the polls were generally quite accurate and the states for which Obama won the most simulations in my model, he won overall. Even FL, where my model indicated the race was closest with purple (a true toss-up), remains the closest state, and has not yet been called. My model gave Obama a 46% win probability in FL and based on the current count, Obama looks to be slightly ahead in this veritable coin flip.

In my estimation, my model fell short in the area of probabilities. This is because there were 8 states whose odds favored Obama but were between 3:2 and 3:1 (i.e. 60% - 75% probability of victory). Obama won all of them. Based on the binomial coefficient, even if Obama was favored with a 75% probability in 8 states, the likelihood he would win all 8 is only 10%. In fact, even if Obama were the 90% favorite in each state, he would only be expected to win all of them 43% of the time.

The key is that these estimates of probability--and those used to calculate Obama's win probability in my forecast--assume independence. It may be the case that such an assumption is unfair. By this I mean that
situations in which many close states are won/lost may happen more than would be expected if they were simply a series of independent coin-flips. In such cases, there may be some underlying momentum to a candidate, which will make it far more likely that he will win a state despite a relatively small margin. Alternatively, the margin itself could be adjusted according to daily tracking figures, as I believe Nate Silver does on 538. Except for the critical state of OH where Obama underperformed the latest polling averages by about 1 point, he generally exceeded polling expectations, as in states like PA, VA, FL, CO, and IA.

Future versions of my model may incorporate a momentum factor, based perhaps on clear trends from daily tracking polls or some other potential factor, that adjust the slope of the function relating polling margin to win probability in such circumstances. Alternatively, my simulations could simply have a bias built in to make running the table more likely, i.e. in a given simulated election for which a candidate won several close states,  winning others could be favored. My problem with this approach is it is not at all clear to me how much of a bias should be built in. Perhaps studying previous elections can give a clue, but we are dealing with a pretty small n. Of course it could be that what we saw last night is unique to Obama (e.g. the particular ground game and get-out-the vote effort) and may not generalize to other elections. In other words, making such adjustments to the model may actually worsen predictions in the future and this was really just a 10% event by a 10% candidate. Maybe, but I wouldn't bet on it.

I guess I have four years to think about other ways to improve things. As I have mentioned before, the idea of using a simple model that works is far more appealing than using one that has a great deal of complexity, even if the complexity makes the model appear more realistic. In the 2008 election my model was both so simple and so accurate that I felt that additional complexity was not warranted. This election was far closer, however, and it may be that in close elections such added complexity proves justifiable.