Tuesday, November 11, 2008

Election Night Post-Mortem (I)



Now that we have had a week to digest the election results, I would like to do a little post-election analysis of the results.  My model predicted Obama would win with a probability  > 99.9%. While I don't know how to verify this number, I can compare my electoral vote estimates with the actual result.  

Not all of the results were immediately available.  Several close contests included IN, NC, and 1 NE district (all went to Obama) and MO (probably will go to McCain).  This brings Obama's final electoral vote tally to 365.

My first question is: Assuming that the actual outcome came from a distribution of Obama electoral vote probabilities identical to that of my model's (null hypothesis), what is the probability of having an actual Obama electoral vote count as extreme as 365?

To answer this we do a simple z-test: Z = 365 - 352.1 / 24.39 = 0.5289.  This translates into a probability of p = 0.30.  A typical significance threshold would be alpha = 0.05.  Obviously the p-value obtained here is well above that so we do not have enough evidence to reject the null hypothesis, that the actual outcome came from my simulated distribution.

Perhaps a clearer way to see this is by simply marking the bin in my distribution which contains the actual outcome (Obama 365).  
It's clear from this that the final outcome is right in the fat part of my bell curve.

It seems then, that these results suggest:
 (1) The polls were quite accurate in predicting the outcome and (2) The model adequately translates accurate polling data into a valid prediction of cumulative electoral vote outcome.
OR
(1) The polls were poor or modest predictors of outcome BUT (2) The simulation model is robust to somewhat inaccurate polls.
OR
(1) The polls were so accurate that they made up for (2) a model that otherwise would have been a poor or moderate predictor of outcome.

All of these can be examined to some extent.  
States above the line indicate where Obama over-performed the pollster trend margin.  States below the line are those in which Obama underperformed the pollster margin.  I haven't yet examined these data extensively (nor have I done a comparable comparison for the realclearpolitics data, the other polling source in my model).  At a glance, however it seems that while the polls look like pretty good predictors, Obama seemed to over perform in more of the deep blue states (e.g. VT, HI, RI, MA) and underperformed in deep red states (e.g. OK, UT, AK, AR).  Those designated as toss-ups seemed to fit the unity line quite well (states that hover near 0,0).

Off the top of my head it makes sense that the pollster trends would be the most accurate for toss-ups because these states tend to be more extensively polled, especially near the time of the election.   The tendency for strong red states to become more red and strong blue to become more blue might reflect the superior ground game of the dominant party in those states leading to higher voter turnout for their party.  This explanation is somewhat unsatisfying to me though, as likely voter models (which many pollsters use) are generally designed to account for turnout discrepencies.  Also, if the results happened to be in the opposite direction I would have said that this too would make sense, as I would expect greater complacency of democrats in strong blue states and republicans in strong red ones.  I guess that's a problem with any post-hoc explanation. 

I intend to examine some of these issues more fully later.  One last point though.  I had mentioned that the site fivethirtyeight.com seemed to have the best election model out there.  The sophistication of Nate Silver's model has actually won him a great deal of publicity lately, including interviews on cable news channels and a recent NY Times article.  His final prediction had Obama winning 98.9% of the time with an average of 348.6 electoral votes.  As I mentioned before, I know of no way to evaluate the accuracy of win probability.  His electoral vote prediction of 348.6 was similar (slightly less accurate even--although not a statistically significant difference) than my simple bare-bones model.  I wonder whether my model was as accurate as his because of the particular electoral landscape in this election, or whether the added sophistication in his model (with demographic information and weights associated with each pollster, etc.) simply reach a point of diminishing returns and does not affect the outcome that much.

Whatever the answer, this whole experience has reinforced my novice interest in playing with statistics and modelling and I intend to continue doing this sort of thing in the future.

Monday, November 3, 2008

2008 Presidential Election Simulation (IV): Election Day Eve

With the polls showing Obama with a commanding lead, including leads in the swing states (all former red states), tonight's simulation is more of an academic exercise. My model, which relates polling margins from realclearpolitics.com and pollster.com to win probabilities has Obama winning 9,997 times out of 10,000. The mean Obama EV count is now at 352.1, with a standard deviation of 24.39. Therefore, if these polling data, and this model relating polling margin to win probability are accurate, the 95% confidence interval for Obama EVs is (303.3, 400.9). Don't forget to vote!

Monday, October 27, 2008

2008 Presidential Election Simulation (III): One Week To Go

With one week left, Barack Obama seems to be holding on to his lead. For this simulation I decided to average the state-by-state win probabilities based on realclearpolitics.com and pollster.com polling aggregates.

The results: Obama wins greater than 99.9% of the time. I have him averaging 345 electoral votes with a 95% confidence interval of (298, 392).

All data to the right of the vertical dotted line represent Obama simulation "election" wins. Those few results to the left of the dotted line are wins for McCain. As with my previous simulations, these data represent the results of 10,000 simulated elections.

The chart below represents the current status of the race based on the win probabilities computed from my model. The states are ordered by probability from most likely Obama wins to most likely McCain wins. This demonstrates the challenge McCain faces. The states colored in green are places where Obama currently leads but are the likeliest of this group for McCain to pick off. In order to get to 270, McCain must either win all of those green states: NM, NH, CO, OH, NV, FL, NC, and MO or some other combination of even more strongly blue states, such as PA (which despite this week's broadcast of This American Life, McCain is at the wrong part of the normal curve winning just 5% of the time).


From Chart

These data are a snapshot of what would happen if the election were held today. Who knows what events will occur to change the dynamics of the race in the next week. I am skeptical that McCain can do much of anything at this point to win. Only a low probability crisis or surprise about Obama would likely change the outcome. I will crunch the numbers on election day and examine how close my electoral vote projections are to the election results.

Friday, October 24, 2008

2008 Presidential Election Status (Compared with 2004)

In the preceding post, I discussed the commanding lead that Obama has sustained and alluded to a changing electoral landscape in which (at least this time) the outcome will not simply hang on the results in FL, OH, and PA.

To provide context for where things were immediately before the 2004 election, I decided to dig up state polling data from Real Clear Politics.  This page gives the RCP Average polling margin for 18 states (those states for which the race was somewhat tighter) in the 2004 election.  Those states were FL, OH, PA, WI, IA, MN, MI, MO, NM, NV, CO, NH, ME, WV, OR, NJ, AR, and HI.

Below is a histogram of the margin of error.  Negative values indicate states where Bush polled ahead of the actual result and vice-versa for positive values.

From this we can see that the RCP Average was generally pretty accurate, with the notable exception of HI (far left of the histogram) which RCP predicted for Bush by 0.9 points but Kerry won by 9.

Since not all states have equal value, I decided to normalize these errors based on electoral votes to give a theoretical RCP error impact (simply, percent margin * state electoral votes)


Again, we see no major bias in the data, with the average theoretical error impact near zero (0.07) and the cumulative error impact is 1.32.  This indicates that the polling theoretically overestimated Kerry's electoral vote total by 1.32 votes.  This positive value is almost entirely due to FL which was polling at Bush +0.6 but was won by Bush with a margin of +5.  Of course this error did not affect the outcome in FL,  so it is important to not misinterpret these theoretical error impact scores as reflecting actual differences in electoral votes.  In fact, there were only two states for which the polling data chose the wrong winner:  HI (4 EV) and WI (10 EV), Kerry won both and they were predicted for Bush.  While polling for HI was way off (see above), the RCP Average for WI was Bush by 0.9 while Kerry won it by 0.4.  Although the outcome was wrong, the RCP Average was still somewhat accurate.  Together the actual impact of Polling error was therefore Kerry +14 EVs.

So what did the polls look like in 2004?  Below is a histogram of the state margins (Kerry - Bush) based on the RCP Averages for 18 competitive states just prior to the election:

The average margin in 2004 for these states was in slightly in favor of Bush by 0.3889 percent.  Kerry led in the polls of 7 while Bush was ahead in 11.  Again, because these states have different EV values, the predicted electoral vote count among these 18 states was: Bush 109, Kerry 78.  The actual outcome for these states was: Bush 95, Kerry 92, a large enough margin for Bush to still win the election.

Below is a histogram of the same 18 states indicating the current RCP Average margin for the 2008 election (Note: states like VA and NC, which may flip to the democrats this year were not among those included in the 2004 list of 18 and are therefore not included here either.

One obvious difference between this histogram and the one above is a rightward shift in polling margin to an average Obama lead by >9 %.  Based on state electoral vote values, this predicts that of the 18 states Obama wins all but two of them, receiving an amazing 176 electoral votes to McCain's 11.

In summary, the 2004 polling data indicates RCP Averages are pretty close to the actual results.
If the current pattern, or anything close to it holds up for the 2008 election, Obama will win far more than the 270 EVs he needs, with a high probability for a landslide.

Thursday, October 16, 2008

Obama doesn't need Ohio or Florida (although he may still get one or both of them)

In the past two weeks, the numbers have become far more favorable for Obama across the board. This includes leads in Florida and Ohio as well as previously red states like Virginia, North Carolina, and others.

The most recent polls seem to have the race tightening somewhat in Ohio in particular and to some extent in Florida (graphs created by pollster.com, using trend lines that are maximally sensitive to local changes as well as noise).




Nevertheless, Obama still wins if he is able to hang on to New Hampshire (Obama + 7.3 to +10.4), New Mexico (Obama +7.5 to +8.4), Virginia (Obama +7.7 to +8.1), and Colorado (Obama +5.8 to + 6.2) (ignoring all other toss-up states)





Of these, the margin in New Hampshire may be tightening somewhat, although Obama still has a healthy lead.



Additional toss-ups which lean in Obama's direction (when smoothing is maximally sensitive to local changes) include Nevada, North Carolina, Missouri, and still of course Ohio, and Florida. In recent days, North Dakota and West Virginia have also become toss-ups, and Obama still threatens to pick up Indiana. With the four I mentioned and graphed above (NM, NH, VA, and CO), however, all these other states, which include the ostensibly critical Florida and Ohio simply become gravy for Obama.

Here's a wonderful graph from Charles Franklin, that puts this into perspective. Read his full article here.
So the story line of the last two elections (namely, that it all comes down to who can win 2 out of Ohio, Pennsylvania, and Florida) has changed. There are many ways Obama can win whether or not he gets Ohio or Florida, and with a double-digit lead in Pennsylvania, it is unlikely to be a deciding factor (any more than, say, New York or Maryland).

Monday, October 6, 2008

Presidential Tape Measure


The New York Times recently published a chart with all major party presidential candidate heights and weights since the 1896 election. They explicitly ask:

Does candidate height and weight play a role in electoral success?
I decided to run a linear regression using height and weight to examine the role of each variable. It seems that while neither significantly contributes (p > 0.05), there is a trend for weight (p = 0.10). Adding height to the model does not improve it, which makes some sense given the highly significant correlation between height and weight.

In the group, the average weight of winners was nearly 12 lbs heavier than that of losers, although again, this is not enough to reject the possibility that this difference is due to chance.

One last note: I looked to see whether heights and weights have changed much over time. It seems that there is absolutely no change in average presidential candidate weight since 1896. However, presidential candidates have gotten significantly taller (statistically speaking) through the 20th century. from 1896-1960 (beginning of the TV era) presidents were about 5'10.6". Since then, they average 6'0.5". I suppose that these data suggest that modern presidents have been more physically fit than early-to-mid 20th century ones.

Saturday, October 4, 2008

2008 Presidential Election Simulation (II)

I last posted election simulation results a few weeks ago.  At that point Democrats were holding their heads in their hands as McCain coasted through a post-convention bump and the dominant news stories were about the "lipstick on a pig" comment and Obama's supposed kindergarten sex ed program.  Since that time we have had a financial crisis, McCain "suspended" his campaign, we have had presidential and vice presidential debates, and today a wall street/economy bail out plan finally passed in the House, after it was initially rejected.  In all of this time there has been a great deal of movement in poll numbers, generally in a favorable direction for Obama.

Here is another presidential election simulation based on current polling data, predicting what would happen if the election were held today.  Keep in mind that as polling numbers change in the next month, so too will these probability values.

When I previously explained my methodology, I may have glossed over the details a little bit.  Here is hopefully a clearer explanation of my procedure.

By taking the current polling margins I compute a win probability for each state based on a normal cumulative distribution function approximating that of Charles Franklin's.  His curve can be found here.  Below is my approximation of his curve, using the parameters mu=-1, sigma=6.8.  
I then use the current state-level polling data to determine win probabilities for each state.  I create two separate models, one based on the Real Clear Politics averages and the other based on the latest regression analysis margins from Pollster.com.

I then simulate 10,000 elections based on these probabilities.  The percentage of the time Obama gets more than 269 electoral votes is taken to be his overall win probability if the election were held today.

Here are the results:
Based on the margins of polling averages from Real Clear Politics, Obama wins an astonishing 98% of the time (average 328 electoral votes).  Note that results to the left of the dotted line are McCain wins while those to the right of the dotted line are Obama wins.
Based on the margins of pollster.com regression lines, Obama wins 93% of the time (average 312 electoral votes).


So at this point the race is Obama's to lose.  Nevertheless, after seeing such dramatic movement in the polls in the last three weeks we might expect that these numbers could shift back.  In fact, I expect Obama's lead to decrease in the next month if for no other reason than the fact that races tend to get tighter as elections get nearer.