This is an archive of the old Stones Cry Out site. For the current site, click here.
February 09, 2005
Examining the "Unexplained Exit Poll Discrepancy"
A day after Edison/Mitofsky released their much anticipated report on the 2004 Presidential Election exit polls, the University of Pennsylvania issued a press release announcing that an "expert" on the presidential election exit poll errors has access to a satellite link and is available for interviews.
UPenn is the home of the Annenberg Center, a widely respected Public Policy institution that regularly conducts public opinion surveys. Given that Dr. Freeman is a non-tenured visiting professor to UPenn and not affiliated with the Annenberg Center, I wonder if Dr. Jamieson, Director of the Center knew that she had an expert on exit polls on campus?
Please forgive my snarky question (one that is definitely rhetorical), but rather than prove his expertise on exit polls, Dr. Freeman exposes himself as a dilettante with his latest version of his Research Report, The Unexplained Exit Poll Discrepancy.
Freeman's analysis hinges on the "Critical Battleground States" of Ohio, Florida, and Pennsylvania. He analyzes the discrepancy between data obtained from saved screenshots of the CNN exit poll web-site taken shortly after midnight on election eve and reported election tallies. Dr. Freeman's paper concluded:
In this report, I have: (1) documented that, in general, exit poll data are sound, (2) demonstrated that it is exceedingly unlikely that the deviations between the exit poll predictions and vote tallies in the three critical states could have occurred strictly by chance or random error, and (3) explained why explanations for the discrepancy thus far provided are inadequate.With this paper, I will demonstrate that Freeman did not accomplish points (1) and (2). With the release of the Edison/Mitofsky report, point (3) is moot.
Did Dr. Freeman Demonstrate That, “In General, Exit Poll Data Are Sound”?
Either Dr. Freeman is a poor researcher, which I do not believe for a moment given his credentials, or his argument is weakened having suppressed evidence. Freeman's argument that exit polls are generally sound rested largely on statistics from German, BYU, Mexico, and Ex-Soviet Block exit polls.
Dr. Freeman selects and analyzes data from several national German exit polls that show that the estimates were highly representative of the election tally. His data were compiled by several individuals, but his analysis does not appear to be based on any published research on German exit polls and fails to include information on the methods used in the cited polls.
First, a note European exit polls in general. Mystery Pollster (MP) Mark Blumenthal uncovered this opinion prepared by the ACE project, which is funded by the UN and the US Agency for International Development. As excerpted by MP, the opinion states,
[Exit poll] reliability can be questionable. One might think that there is no reason why voters in stable democracies should conceal or lie about how they have voted, especially because nobody is under any obligation to answer in an exit poll. But in practice they often do. The majority of exit polls carried out in European countries over the past years have been failures (Emphasis added).In his post, MP indicates that he warned Dr. Freeman during a telephone conversation to check the methodology used in the German exit polls before comparing them to the NEP exit polls. MP contacted Dr. Dieter Roth of FG Wahlen, the organization that generated the data used by Freeman in his analysis. Dr. Roth provided some information about methods, which from a review of MP's postmade it easier to understand why the errors would be smaller in Germany's exit polls. Dr. Roth made the following statement to MP:
I know that Warren Mitofsky's job is much harder than ours, because of the electoral system and the more complicated structure in the states.Dr. Freeman's paper failed to include basic information regarding methodology, which is crucial when comparing different polls, either demonstrates a desire to suppress evidence or an ignorance of survey research methods.
Dr. Freeman wants everyone to know that the BYU poll came within .03% of predicting the Bush percentage and 0.1% of the Kerry percentage. To him, this is one more piece of evidence that exit polls are "generally" accurate. But once again, Freeman doesn’t compare methodologies.
MP also dealt with the BYU exit poll methodology in his post directed at Dr. Freeman. When compared to the NEP methods, the BYU methods are far superior - hence one reason the BYU exits would be more accurate than the NEP exits. Is it not curious though why Freeman would choose the BYU exit poll of Utah as evidence of exit poll accuracy, but not look at the NEP exit poll of Utah? The NEP exit poll of Utah showed 3% Kerry bias.
Memo to Dr. Freeman: It's not the best idea to use an exit poll (BYU) that nailed the election result in Utah this year as evidence of the accuracy of exit polls in general, when the NEP exit poll that you are trying to prove as being accurate in OH, PA, and FL showed large (3%) Democratic bias in the same state and in the same year.
Mexico, and Former Soviet Block Exits
Dr. Freeman does not present data regarding the accuracy of any Mexico or Former Soviet Block exit poll; however, he mentions that they were used and that they added legitimacy to the process.1
How about an Apples to Apples Comparison?
Why did Freeman not include the literature on US Presidential Exit Polls? Dr. Freeman's Bibliography hit all the major exit poll literature sans one chapter by Warren Mitofsky and Murray Edelman written in 1995. In that chapter on the 1992 VRS exit polls, the authors wrote,
The difference between the final margin and the VRS estimate was 1.6 percentage points. VRS consistently overstated Clinton’s lead all evening...Overstating the Democratic candidate was a problem that existed in the last two presidential elections.2Certainly this year's US Presidential exit polls showed greater bias than other years, but that is not the case that Dr. Freeman built. He chose to highlight data from exit polls that employed methods that are highly disparate from the methods typically used for media-funded US Presidential Exit Polls. He did this also while ignoring literature on the media-funded US Presidential Exit Polls that demonstrated chronic Democratic bias.
Did Dr. Freeman Demonstrate Statistically Significant Discrepancies in OH, PA, and FL?
To conclude the section of his paper entitled, "Statistical Analysis of the Three Critical Battleground States: Ruling out Chance or Random Error," Freeman wrote:
Assuming independent state polls with no systematic bias, the odds against any two of these statistical anomalies occurring together are more than 5,000:1...The odds against all three occurring together are 662,000-to-one. As much as we can say in social science that something is impossible, it is impossible that the discrepancies between predicted and actual vote counts in the three critical battleground states of the 2004 election could have been due to chance or random error.Let me make myself clear: The following analysis is NOT to be construed as an attempt to prove that chance alone can explain the exit poll discrepancy in OH, PA, and FL. I intend only to demonstrate that Dr. Freeman knows little about statistics, let alone exit polls.
Freeman's Data and Methods
Dr. Freeman's null hypothesis states that, assuming independent state polls with no systematic bias, Kerry's predicted proportion should not significantly exceed his tallied proportion.
To test his null, Dr. Freeman compared data extrapolated from data posted on CNN's web-site shortly after midnight on election eve to election tally data.3 The CNN data were presented in tabular format and reported the predicted proportions for Bush, Kerry, and Other candidates by gender. From the Male/Female split, which was reported as a whole number, Freeman extrapolated to achieve a value significant to a 10th digit. Although Freeman reported both Bush's and Kerry's "predicted" (exit poll) proportion of the vote, his statistical analysis is solely based on the Kerry's proportion; therefore, I have only reproduced these data for Kerry’s proportion in Exhibit 1.
Freeman correctly recognizes that exit polls are not simple random samples, but cluster samples, and therefore have higher standard errors than typical phone surveys of similar sample size.4 The difference between these standard errors is referred to as the design effect.
Dr. Freeman chose to apply a 30% adjustment to each state to account for this design effect and relied on a single citation for the adjustment. Merkle and Edelman (2000) calculated a 1.7 design effect for the 1996 Presidential Elections, the square root of which is 1.3, leading the authors to state that the 1996 exit polls showed "a 30% increase in the sampling error computed under the assumption of simple random sampling."5
Freeman then performed a single-tailed test for comparing the results derived from a single sample to a mean of samples (or established standard) and determined that Kerry's proportion significantly exceeded the election result in all three battleground states at the 95% Confidence Level. In the statistics world, if a finding is "significant" (p-value <.05), then one can reject the null hypothesis. If the result is "not significant" (>.05), then statisticians do not reject the null hypothesis; in fact a non-significant finding is just that - non significant. Freeman rejected the null hypothesis by stating that the observed discrepancies are "impossible" to have been due to chance or random error (i.e., the discrepancies in each state were significant).
Dr. Freeman's "statistical" analysis fails on three main points: it 1) violates the "rule of significant digits"; 2) improperly calculates the design effect; and 3) employs a single-tail test when the assumptions require a two-tail test.6 When considered, these issues dramatically affect Freeman's analysis and conclusions.
Rule of Significant Digits
Freeman determined that the exit poll predicted Kerry would win 49.7% of the vote in FL, 52.1% in OH, and 54.1% in PA. He "divined" these data by building partials from an extrapolation of the CNN data by gender. The CNN data were presented in whole proportions. The rule of significant digits states,
In a calculation involving multiplication, division, trigonometric functions, etc., the number of significant digits in an answer should equal the least number of significant digits in any one of the numbers being multiplied, divided etc.That means that since the CNN data only reported a whole number, Dr. Freeman cannot justify his predicted Kerry proportion without considering the error bounds with the data (i.e., the data that he used are "fuzzy"). Given the data available to Dr. Freeman, it was impossible to know, for example, whether Kerry's predicted proportion in FL was 49.5% or 50.4%. Exhibit 2 shows the range of possible values for Kerry's predicted proportion given the error bounds of the data when a significant digit is considered (10th).7
What Design Effect? Why?
For Ohio, Dr. Freeman calculated the standard error of a poll with sample size 1,963 and multiplied that value by 1.3 to obtain a standard error.
Why 1.3 again?
As mentioned earlier in this post, the 1.3 factor was applied because this was determined by Merkle and Edelman (2000)8 to be the difference between the standard error of the 1996 presidential election exit polls and the standard error of a poll based on a simple random sample of the same size. This factor is also known as the design effect square root (DESR).
Warren Mitofsky explained to me in an e-mail that the factor calculated for the 2004 exit polls ranged from 1.5 to 1.8 depending on the average number of samples per precinct.
The Merkle/Edelman paper is not what we computed this year...both Merkle and Edelman participated in this latest calculation.9Dan Merkle of ABC News wrote the following regarding the use of this factor for analysis of the 2004 Presidential Election exit polls,
What was in the Merkle and Edelman chapter is only a general estimate based on work at VNS in the early 1990s.Complicating the computation of the DESR is that the fact that there are likely two different factors used for the intercept interviews and the telephone interviews. Dan Merkle wrote,
The design effect will vary state by state based on the number of interviews per precinct and how clustered the variable is. More work was done on this in 2004 by Edison/Mitofsky. Edelman and I did participate in this. I would suggest using the design effects calculated by Edison/Mitofsky for their 2004 polls.10
[Mitofsky's DESR's] only applies [sic] to the intercept interviews. [T]here may be a separate (smaller) design effect for the telephone survey component.11I checked with Jennifer Agiesta of Edison Media Research whether there was a smaller DESR associated with the telephone survey component than that which was conveyed by Mitofsky. Ms. Agiesta replied,
According to Warren, we did a new study since the one that Dan Merkle and Murray Edelman did some years ago and the design effects Warren reported to you were the latest ones computed. The whole advisory council, including Dan Merkle and Murray Edelman, participated in it and agreed that the information on design effects that Warren sent you is correct.12Although I'm not certain that Ms. Agiesta understood my question and I have a follow-up question pending with her, it should be clear that use of the 1.3 factor is not appropriate; the confidence interval varies by state (by ratio of interviews per precinct) and is at least 1.5, but could be as high as 1.8 in FL.
I shared all of this information with Dr. Freeman before he published the latest version of this paper, so the omission/errors are not due to ignorance.
How Many Tails?
Dr. Freeman's null-hypothesis stated that, assuming independent state polls with no systematic bias, Kerry's predicted proportion should not significantly exceed his tallied proportion. Exhibit 3 is a reproduction of Dr. Freeman's Figure 1.2, which depicts a normal distribution for OH given his calculated standard deviation for all three states.13
Normal Distribution of Kerry’s Tallied Distribution
with Kerry’s Predicted Proportion
The normal distribution depicts the range of possible proportions that could occur if the exit poll were conducted 100 times. Kerry's "tallied percentage of the vote" is the established standard, or, in this case, the mean of samples. The 95% Confidence Interval shows the range of proportions that would result for 95 of 100 exit polls and is commonly referred to as the "margin of error." With this figure, Freeman is attempting to show that the OH exit poll is outside the margin of error and therefore the discrepancy is "significant" at the 95% Confidence Level.
Notice though that there are two "tails" outside the 95% Confidence Interval: left and right. The right tail consists of the 2.5 exit poll samples of 100 that could be expected to significantly exceed the tallied percentage, whereas the left tail represents the 2.5 exit poll samples of 100 that could be expected to be significantly lower than the tallied percentage.
Unless Freeman sets aside his assumption of no bias, he must include the probability of a significant finding at BOTH ends of the normal distribution. By insisting on a single-tail test, he is hinting that either the exit poll is biased or the tally is wrong. This insinuation is inappropriate prior to, or in the process of, testing a null hypothesis that assumes no bias. Dr. Freeman's failure to properly apply a two-tail test means that his p-values are 1/2 what they should be.
Implications for Dr. Freeman's Analysis and Conclusions
Using Freeman's Data, I calculated Z-scores and p-values under three scenarios, each of which considered the lower and upper bounds of the data when using a significant digit (10th). The first scenario assumed Freeman's use of the 1.3 DESR and a single-test is appropriate. The second scenario considered a 1.3 DESR, but a two-tail test is applied. The final scenario considered a conservative estimate of the DESR as supplied by Mitofsky (1.5 for OH and PA, but 1.6 for FL, which could easily be 1.8). The results of these tests are presented in Exhibit 4.
If you recall, a p-value of <0.05 represents a significant finding. If the p-value is >0.5 then statistically, nothing can be said about the discrepancy - it is not significant. As shown in the table, when the lower-bound of the data is considered using Freeman's assumptions for the DESR and a single-tail test, the discrepancy is significant under all scenarios. However, if the more appropriate two-tail test is applied and a 1.3 DESR is assumed, the lower bound for each state is no longer significant. The same results occur when the more appropriate two-tail test and the conservative estimate of the design effect as conveyed by Warren Mitofsky are considered. If you notice though, all three proportions remain significant at the upper bound under these last scenarios as well.
What does this mean?
Given the data analyzed by Dr. Freeman, it is impossible to determine whether the discrepancies in OH, FL, and PA were significant.
Almost every point presented in this post was made available to Dr. Freeman weeks before he published his final paper. In fact, I went over some of the points with him via telephone. Other analyses of the data were sent to him via e-mail.
Had Dr. Freeman considered the error bounds associated with his data and applied an appropriate statistical test, he would have realized that statistically, his point was weak; the data are too fuzzy for rigorous statistical analysis. Rhetorically however, he had a solid point: The exit polls were off and off more so than they ever have been in the history of US Presidential exit polling. We don’t need a PhD to tell us that. At the time Dr. Freeman published his paper, even Mitofsky had acknowledged that something went wrong with the exit polls and that chance error could not account for 100 percent of the discrepancy.
Dr. Freeman, recently ordained by UPenn as an "expert" on the US Presidential exit polls, believes that Bush stole the election and his analysis of the exit poll data is supposed to be highly suggestive of foul play sans any reasonable and falsifiable explanation from Edison/Mitofsky. In this press release, Dr. Freeman declared,
"Although the authors of the report state that, 'the differences between the exit poll estimates and the actual vote [are] most likely due to Kerry voters participating in the exit polls at a higher rate than Bush voters,' they provide little data or theory to support this thesis," said Freeman.After carefully reviewing the Edison/Mitofsky report, I must admit that there are some claims that need further empirical substantiation.14 However, Dr. Freeman didn't simply make his point about the shortcomings of the Edison/Mitofsky report; instead he went on to say that the report bolstered his research:
"Rather, the report only confirms the exit poll official count discrepancy that I documented in my Nov. 12 paper, corroborates the data I collected, and rules out most types of polling error."That statement of Dr. Freeman’s provoked me to write this post earlier than I had planned: While the Edison/Mitofsky report certainly ruled out most types of polling error, it did not confirm or corroborate anything presented in his paper.
One final thing: Freeman wrote a book based on his research that is due out in a couple of months. (What’s the universal symbol for a simultaneous sigh and rolling of the eyes?)
1My review of the literature on exit polls turned up one review of the 1998 Ukranian Parliament exit polls (see Kucheriv, Ilko, Elehie Skoczylas, and Steven Wagner. 2000. Ukraine: 1998: Parliamentary Election Exit Poll Kennan Institute Occasional Paper #275. Washington, DC: Woodrow Wilson International Center for Scholars). This review did not present the poll’s confidence interval or a quantitative comparison of the poll and the election result, but did note that the exit poll “accurately predicted the vote” (pp. 3). See the following for a comparative analysis of the recent Ukranian exit polls and the NEP 2004 exit polls that found that the US Presidential Exit Polls were designed better than the 2004 Ukrainian exit polls.
2Mitofsky, Warren J. and Murray Edelman. 1995. “A Review of the 1992 VRS Exit Polls.” In Presidential Polls and the News Media. Eds. Lavrakas/Traugott/Miller. Boulder, CO: Westview Press. (pp. 81-100)
3I could not locate the source of Freeman's election tally data.
4For a more thorough discussion of standard errors associated with cluster samples, refer to: 1) Frankel, Martin. 1983. Sampling Theory. Handbook of Survey Research. Eds. P. Rossi, J. Wright., and A. Anderson. Orlando, FL: Academic Press. (pp. 47-62); 2) Kalton, Graham. 1983. Introduction to Survey Sampling. Beverly Hills, CA: Sage. (pp. 28-47); 3) Kish, L. 1965. Survey Sampling. New York: Wiley; 4) Mendenhall, William, Lymann Ott, and Richard Scheaffer. 1971. Elementary Survey Sampling. Belmont, CA: Duxbury Press. (pp. 121-141, 171-183); 5) Sudman, Seymour. 1976. Applied Sampling. New York: Academic Press. (pp. 69-84, 131-170); and 6) Williams, Bill. A Sampler on Sampling. New York: Wiley. (pp. 144-161, 239-241).
5See page 72 of Merkle, Daniel M. and Murray Edelman (2000). "A Review of the 1996 Voter New Service Exit Polls from a Total Survey Error Perspective." In Election Polls, the News Media and Democracy, ed. P.J. Lavrakas, M.W. Traugott, pp. 68-92. New York: Chatam House.
6Freeman omitted data from other states, which indicate that the magnitude of the discrepancy in Democratic stronghold states like NY, VT, and RI was substantially larger than that observed in FL, OH, and PA. I consider this omission to be another example of supressed evidence, but did not address this omission further because it is not necessary to demonstrate my point.
7It would be preferable to include at least two more significant digits for more precise analysis, but to remain consistent with Freeman’s extrapolations, I stick here with the 10th.
8See note 4 above for citation.
9Mitofksy, Warren J. 2004. Electronic communication to Rick Brady, December 7.
10Merkle, Dan. 2004. Electronic communication to Rick Brady, December 15.
11Merkle, Dan. 2004a. Electronic communication to Rick Brady, December 17.
12Agiesta, Jennifer. 2004. Electronic communication to Rick Brady, December 23.
13This normal distribution for OH is constructed with the incorrect 1.3 desr.
14The primary issue with the Edison/Mitofsky report is their analysis of Within Precinct Error (WPE) by voting method. As noted in the comments to this Mystery Pollster post, I conveyed my concerns to Edison/Mitofsky. My next post on exit polls will be an overview of the Edison/Mitofsky report.
Posted by Rick at February 9, 2005 03:18 AM
please help me make my project in statistics, i'm gonna fail. about hypothesis and surveying.
Posted by: zyra bofill at March 7, 2005 09:28 AM