Now that we have had a week to digest the election results, I would like to do a little post-election analysis of the results. My model predicted Obama would win with a probability > 99.9%. While I don't know how to verify this number, I can compare my electoral vote estimates with the actual result. Not all of the results were immediately available. Several close contests included IN, NC, and 1 NE district (all went to Obama) and MO (probably will go to McCain). This brings Obama's final electoral vote tally to 365.

My first question is: Assuming that the actual outcome came from a distribution of Obama electoral vote probabilities identical to that of my model's (null hypothesis), what is the probability of having an actual Obama electoral vote count as extreme as 365?
To answer this we do a simple z-test: Z = 365 - 352.1 / 24.39 = 0.5289. This translates into a probability of p = 0.30. A typical significance threshold would be alpha = 0.05. Obviously the p-value obtained here is well above that so we do not have enough evidence to reject the null hypothesis, that the actual outcome came from my simulated distribution.
Perhaps a clearer way to see this is by simply marking the bin in my distribution which contains the actual outcome (Obama 365).
It's clear from this that the final outcome is right in the fat part of my bell curve.
It seems then, that these results suggest:
(1) The polls were quite accurate in predicting the outcome and (2) The model adequately translates accurate polling data into a valid prediction of cumulative electoral vote outcome.
OR
(1) The polls were poor or modest predictors of outcome BUT (2) The simulation model is robust to somewhat inaccurate polls.
OR
(1) The polls were so accurate that they made up for (2) a model that otherwise would have been a poor or moderate predictor of outcome.
All of these can be examined to some extent.
pollster.com, one of my two polling data sources almost immediately published a graph showing the relationship between their regression predictions and the actual outcomes

States above the line indicate where Obama over-performed the pollster trend margin. States below the line are those in which Obama underperformed the pollster margin. I haven't yet examined these data extensively (nor have I done a comparable comparison for the realclearpolitics data, the other polling source in my model). At a glance, however it seems that while the polls look like pretty good predictors, Obama seemed to over perform in more of the deep blue states (e.g. VT, HI, RI, MA) and underperformed in deep red states (e.g. OK, UT, AK, AR). Those designated as toss-ups seemed to fit the unity line quite well (states that hover near 0,0).
Off the top of my head it makes sense that the pollster trends would be the most accurate for toss-ups because these states tend to be more extensively polled, especially near the time of the election. The tendency for strong red states to become more red and strong blue to become more blue might reflect the superior ground game of the dominant party in those states leading to higher voter turnout for their party. This explanation is somewhat unsatisfying to me though, as likely voter models (which many pollsters use) are generally designed to account for turnout discrepencies. Also, if the results happened to be in the opposite direction I would have said that this too would make sense, as I would expect greater complacency of democrats in strong blue states and republicans in strong red ones. I guess that's a problem with any post-hoc explanation.
I intend to examine some of these issues more fully later. One last point though. I had mentioned that the site fivethirtyeight.com seemed to have the best election model out there. The sophistication of Nate Silver's model has actually won him a great deal of publicity lately, including interviews on cable news channels and a recent NY Times article. His final prediction had Obama winning 98.9% of the time with an average of 348.6 electoral votes. As I mentioned before, I know of no way to evaluate the accuracy of win probability. His electoral vote prediction of 348.6 was similar (slightly less accurate even--although not a statistically significant difference) than my simple bare-bones model. I wonder whether my model was as accurate as his because of the particular electoral landscape in this election, or whether the added sophistication in his model (with demographic information and weights associated with each pollster, etc.) simply reach a point of diminishing returns and does not affect the outcome that much.
Whatever the answer, this whole experience has reinforced my novice interest in playing with statistics and modelling and I intend to continue doing this sort of thing in the future.













