August 14, 2008
By Charles Franklin
[This is Part IV of the recent discussion betwen Mark Blumenthal and Charles Franklen called "How We Choose Polls to Plot." For previous posts in the discussion see parts I, II, and III].
"What happens if you leave out 'x'?" is probably the single most asked question at Pollster.com. Everyone has their favorite pollster to hate, and wonders if only that one were removed would the results be closer to the truth. It is a really good question because it goes to the heart of the robustness of our trend estimates and the role of one (or a couple) of pollsters in shaping the conventional wisdom of what "the polls show". The former issue is statistical, the later goes to how shared understandings are constructed. If our estimators are highly sensitive to any one pollster then we have a statistical problem. If one pollster unduly influences shared perceptions, then we better hope they are "right".
Today's question from Mark (and many readers) is what role the tracking polls play in our estimates. This is an issue Mark and I debated quite a lot during the winter when Gallup and Rasmussen began their daily tracking polls. Because they produce so many numbers, including all their data runs the risk that these two dominate our trend estimate to an unacceptable degree. But do they exert that much influence-- there is the question.
And just to be contrarian, take note of the opposite problem: data are valuable. You should never want to ignore information. In that sense excluding data from prolific sources is a mistake unless the data are biased in some uncorrectable way.
The first decision we reached in January was that we would only include each INDEPENDENT sample from tracking polls. This was an easy call. Rolling samples are great for daily updates but Thursday's poll isn't independent of Wednesday's because they both contain Tuesday's and Wednesday's results, if it is a three-day track. In that sense, there isn't as much new information as it seems. So we take only the independent results: Mon-Tues-Wed, Thur-Fri-Sat, Sun-Mon-Tues and so on for a three day tracker. This means we are only including independent data collections, and cuts down on the number of entries in our data that come from any single tracking poll.
Despite this, we get a lot of data in the national track from two primary sources: Rasmussen accounts for 63 of 286 data points in our national trend data. Gallup's tracker provides 41 more. (We keep Gallup's USAToday polls separate from the tracker.) And a third source, The Economist/YouGov's internet poll accounts for 24 data points. (Full Disclosure: YouGov/Polimetrix Pollster.com and supports our work here.) The next most common pollster is Zogby with only 12. So let's take a look at the influence of these top-three pollsters in terms of data. Together they account for 128 of 286 data points, or 45% of our national data.
Let's begin with recognizing that every data point MUST have some influence on our trend estimator. If it didn't then the trend would not be responding to the data! So in that simple sense, the Rasmussen, Gallup and YouGov data must play some role in determining the value of our trend estimate. That really isn't the issue that concerns people. The question is whether these three pollsters DISTORT the trends we would otherwise estimate from all other sources. It would be fine if Rasmussen or Gallup or YouGov had a huge influence on our estimate so long as their trends were exactly in line with everyone else's trends. The concern arises when there is the possibility that one of these is both influential AND out of line with the rest of the world.
We need to look at three things: the overall trend with all pollsters included, the trend only for a single pollster, and finally the trend we'd estimate if we excluded this pollster. If a pollster is different from others, that's a concern. But if they don't substantially change the trend estimate, then we aren't that worried. But if they are different AND shift the trend, then we have to worry.
So let's look at the data. The chart below plots the overall trend (the blue line), the trend for each of the three most prolific pollsters (solid red), and the trend estimate if we exclude that pollster (dashed red line). A fourth plot shows what happens if we exclude all three prolific pollsters and rely only on the 28 different pollsters who've done 12 or fewer polls each (dashed blue).

Over all our polls, we estimate an Obama advantage over McCain of 3.4 points (as of early morning on 8/14). If we exclude Gallup, the trend estimate is 3.2. If we exclude Rasmussen, the estimate is 4.5. If we exclude YouGov the estimate is 3.3. And if we omit all three (and 45% of our data) the trend estimate is 5.1. So it DOES matter which of these we include. By as little as 0.1 points or as much as 1.7 points.
The most striking thing to me about these figures is that all three tracking polls trend a bit below the overall trend, which is why omitting them all produces the biggest change in the current trend estimate. Gallup is only a bit below trend, YouGov a bit more in May but less recently. Rasmussen stands out as the most consistently below trend, with convergence only in June for a while.
At first glance, the worst thing about Rasmussen is that his trend seems much more sharply downward since late June than either other frequent pollster (both Gallup and YouGov see flat or rising Obama margins in that time.) The dashed line without Rasmussen looks flat or possibly rising slighting, while including Rasmussen with all others produces a modest downward slope recently. So is Rasmussen determining our current trend's tendency to be moving down? This is especially relevant given the upward moves by Gallup and YouGov.
The bottom right panel of the figure offers some reassurance. While Rasmussen does look different from Gallup or YouGov, when we take all three tracking polls out, the dashed blue line in the bottom right figure trends slightly down, approximately in parallel with the overall trend estimate using all the polls. To be sure, omitting the tracking polls does produce a higher current trend estimate: 5.1 vs 3.4 for all polls. Clearly the tracking polls are showing a lower margin and that is reflected here. But from my point of view, the happy news is that the trend with or without the three trackers moves in pretty much the same way over the year. Granted some minor differences, both curves move up and down at about the same time and the gap between the solid and dashed blue lines is roughly equal over time. This suggests that the effects of the three trackers may be to lower the estimated Obama margin over McCain, but they don't distort the dynamics of the race. When trends are up or trends are down, they are reflected in both the with and without tracker estimates.
It is reassuring that both the Gallup and YouGov trackers have very little influence on the overall trend estimate. Including or excluding either of these polls has very little effect on the trend estimate.
A final point is what this says about the validity of the polls. If Gallup and YouGov are flat or slightly up, and Rasmussen is sharply down, how are we to know which is "right"? The data here say a bit of both are right. Gallup and YouGov do somewhat better jobs tracking the overall trend than does Rasmussen. But the recent decline in Obama support, even though modest, is not captured by Gallup or YouGov. Rasmussen clearly overstates the decline (compared to other polling) but the consensus of the 158 polls NOT from these three sources is that there has been a little downturn in Obama's lead since late June.
It is easy to exaggerate how large these differences are, especially in light of the intrinsically hard problem of knowing what "the truth" is at any moment. The chart below compares the trend estimates we would get from dropping each of the 31 different pollsters in our our data. Two things stand out. Dropping any single pollster has very little effect on the trend estimate, with one exception. Omitting Rasmussen, who is both the most prolific pollster and the one with considerably more variation than others, does make a noticeable difference in the trend estimate. But the reassuring element of this graph is that even the line omitting Rasmussen still falls within the 95% confidence interval around our overall trend estimate. While there was a time in March when the "without Rasmussen" line moves just outside the 95% confidence interval, this is the exception rather than the rule. Most of the time, including now, the trend without Rasmussen is NOT significantly different from the trend over all pollsters (or the trend omitting any individual pollster.)

So what do we conclude from this exercise? I'd say that any individual pollster can have important effects on our trend estimate under the right circumstances. Concentrating a lot of unusual polls in a short time span can shift our estimates. But I am encouraged that while there are important differences in Gallup, Rasmussen and YouGov trends, none of them seem to outright dominate our trend estimates. Even Rasmussen's effects look less important when we see what all the non-tracking polls are showing. We might worry about what the right level of support is, but the shape of the trends looks pretty robust no matter who is included or excluded. While there are differences of as much as 1.7 points in the estimated margin, it is worth taking a deep breath and appreciating the margin of error in these and all other estimates of candidate support right now. The current confidence interval covers an range from +1.1 to + 5.2. That 4.1 point range looks pretty large compared to a 1.7 point difference among estimators. Meanwhile, individual polls range over a MUCH wider spread- over at least 10 points and often more. The trend estimate manages to narrow that range of uncertainty by more that 50%. A good achievement. But not one that is precise to tenths of a percentage point, nor one that is immune to some effects of individual pollsters.
By Charles Franklin on August 14, 2008 3:37 PM
| Permalink
| Comments (4)
| TrackBacks (0)
August 12, 2008
By Charles Franklin
Mark started this conversation with "
Why we choose polls to plot: Part I" asking how we decide to handle likely voter vs registered voter vs adult samples in our horse race estimates. This was especially driven home by the Washington Post/ABC poll reporting quite different results for A, RV and LV subsamples but it is a good problem in general. So let's review the bidding.
The first rule for Pollster is that we don't cherry pick. We make every effort to include every poll, even if it sometimes hurts. So even when we see a poll way out of line with other polls and what we "know" has to be true, we keep that poll in our data and in our trend estimates. There are two reasons. First, once you start cherry picking you never know when to stop. Second, we designed our trend estimator to be pretty resistant to the effect of any one poll (though when there are few polls this can't always be true.) That rule has served us pretty well. Whatever else may be wrong with Pollster, we are never guilty of including just the polls (or pollsters) we like.
But what do we do when one poll gives more than one answer? The ABC/WP poll is a great example, with results for all three subgroups: adults, registered voters and likely voters. Which to use? And what to do that remains consistent with our prime directive: never cherry pick?
Part of the answer is to have a rule for inclusion and stick to it stubbornly. (I hear Mark sighing that you can do too much of this stubborn thing.) But again the ABC/WP example is a good one. Their RV result was more in line with other recent polls while their LV result showed the race a good deal closer. If we didn't have a firm, fixed, rule we'd be sorely tempted to take the result that was "right" because it agreed with other data. This would build in a bias in our data that would underestimate the actual variation in polling because we'd systematically pick results closer to other polls. Even worse would be picking the number that was "right" because it agreed with our personal political preferences. But that problem doesn't arise so long as we have a fixed rule for what populations to include in cases of multiple results. Which is what we have.
That rule for election horse races is "take the sample that is most likely to vote" as determined by the pollster that conducted the survey. If the pollster was content to just survey adults, then so be it. That was their call. If they were content with registered voters, again use that. But if they offer more than one result, use the one that is intended to best represent the electorate. That is likely voters, when available.
We know there are a variety of problems with likely voter screens, evidence that who is a likely voter can change over the campaign and the problem of new voters. But the pollster "solves" these problems to the best of their professional judgement when they design the sample and when they calculate results. If a pollster doesn't "believe" their LV results, then it is a strange professional judgement to report them anyway. If they think that RV results "better" represent the electorate than their LV results, they need to reconsider why they are defining LV as they do. Our decision rule says "trust the pollster" to make the best call their professional skills can make. It might not be the one we would make, but that's why the pollster is getting the big bucks. And our rule puts responsibility squarely on the pollsters shoulders as well, which is where it should be. (By the way, calling the pollster and asking which result they think is best is both impractical for every poll, AND suffers the same problems we would introduce if we chose which results to use.)
But still, doesn't this ignore data? Yes it does. Back in the old days, I included multiple results from any poll that reported more than one vote estimate. If a pollster gave adult, RV and LV results, then that poll appeared three times in the data, once for each population. But as I worked with these data, I decided that was a mistake. First, it was confusing because there would be multiple results for a poll-- three dots instead of one in the graph. That also would give more influence to pollsters who reported for more than one population compared to those pollsters who only reported LV or RV. Finally, not that many polls report more than one number. Yes sometimes some pollsters do, but the vast majority decide what population to represent and then report that result. End of story. So by trying to include multiple populations from a single poll, we were letting a small minority of cases create considerable confusion with little gain.
The one gain that IS possible, is to be able to compare within a single survey what the effect of likelihood of vote is. The ABC/WP poll is a very positive example of this. By giving us all three results, they let us see what the effect of their turnout model is on the vote estimate. Those who only report LV results hide from us what the consequences might be of making the LV screen a bit looser or a bit tighter. So despite our decision rule, I applaud the Post/ABC folks for providing more data. That can never be bad. But so few pollsters do it that we can't exploit such comparisons in our trend data. There just aren't enough cases.
What would be ideal is to compare adult, RV and LV subsamples by every pollster, then gauge the effect of each group on the vote. But since few do this, we end up having to compare LV samples by one pollster with RV samples by another and adult samples by others. That gets us some idea of the effect of sample selection, but it also confuses the differences between survey organizations with differences in the likely voter screens. Still, it is the best we can do with the data we have.
So let's take a look at what difference the sample makes. The chart below shows the trend estimate using all the polls, LV, RV and adult samples separately. We currently have 109 LV samples, 136 RV and 37 adult. There are some visible differences. The RV (blue) trend is generally more favorable to Obama than is the LV (red) trend, though they mostly agreed in June-July. But the differences are not large. All three sub-population trend estimates fall within the 68% confidence interval around the overall trend estimate (gray line.) There is good reason to think that likely voters are usually a bit more Republican than are registered or adult samples. The data are consistent with that, amounting to differences that are large enough to notice, if not to statistically distinguish with confidence. Perhaps more useful is to notice the scatter of points and how blue and red points intermingle. While there are some differences on average, the spread of both RV and LV samples (and adult) is pretty large. The differences in samples make detectable differences, but the points do not belong to different regions of the plot. They largely overlap and we shouldn't exaggerate their differences.


There is a valid empirical question still open. Do LV samples more accurately predict election outcomes than do RV samples? And when in the election cycle does that benefit kick in, if ever? That is a good question that research might answer. The answer might lead me to change my decision rule for which results to include. But if RV should outperform LV samples, then the polling community has a lot of explaining to do about why they use LV samples at all. Until LV samples are proven worse than RV (or adult) then I'll stick to the fixed, firm, stubbornly clung to, rule we have. And if we should ever change, I'll want to stick stubbornly to that one. The worst thing we could do is to have to make up our minds every day about which results to include and which not based on which results we "like."
[
Update: In
Part III of this thread, Mark Blumenthal answers to some of the comments below and poses a new question].
By Charles Franklin on August 12, 2008 10:16 AM
| Permalink
| Comments (12)
| TrackBacks (0)
August 11, 2008
By Charles Franklin

It's all about who votes. Those that do win. Those that don't lose. The chronic losers in American politics are the young who famously turn out at low rates election after election.
This year, those young people are of great interest. Allegedly they will be mobilized in huge numbers, and allegedly they will vote strongly for Barack Obama. The latest available Gallup weekly estimate (July 28-Aug 3) shows Obama leading 56%-35% among 18-29 year olds, while McCain leads 46%-37% among those 65 and older.
But will the young vote? And how much difference does it make when they don't?
The chart above shows the turnout rate by age for 2000 and 2004, based on the Census Bureau's "Current Population Survey (CPS)", the largest and best source of detailed data on turnout. The most striking result is just how low turnout is among those under 30 compared to older voters. No age group 18-29 managed to reach 45% turnout in 2000, and only two made it in 2004. Not one single age group over 30 fell so low in either year. Despite a little noise for each group, the pattern is a strong rise in participation rates with every year of age at least until the late 60s, after which there is some decline. Yet even among those 85 and over the turnout rate remains above 55%, more then 10 points higher than among their 20-something grandchildren and great-grandchildren.
The second striking feature of the chart is that the young can be mobilized a bit, under the right circumstances. Turnout among those under 30 rose significantly in 2004 compared to 2000. While turnout went up among all age groups, the relative gain was clearly greater among those under 30. While mobilizing the young is difficult, these data show that it is possible to get significant gains, at least relative to past turnout.
Even so, the "highly mobilized" 20-somethings of 2004 still fell behind the turnout of their 30-something older siblings. A supposed Obama-surge among the young may still not catch up with those even a bit older.
The irony is that the young are a large share of the population, but not of the electorate. The chart below shows the population by age in 2004 (it shifts a little by 2008 but not enough to change the story.)

The "boomers" in their 40s and 50s remain the largest group, but for our purposes there are two important points. Those under 30 make up a substantial share of the population, while those 60 and over represent a substantially smaller share at each age.
In 2004 those 18-29 were 21.8% of the population, while those 58-69 were just 13.2%. Add in the 11.5% 70 and up, and you get just 24.7% of "geezers" over 58 vs. 21.8% of "kids". But the sly old geezers know a thing or two about voting. Shift from share of the population to share of the electorate and the advantage shifts to the old: 18-29 year olds were just 16% of the electorate in 2004, while those 58-69 were an almost equal 15.9%. Add in the 70+ group at 13.4% and the geezers win hands down: 29.3% of voters vs 16% for the young. That difference is the power of high turnout. It goes a long way to explaining why Social Security is the third rail of American politics.
High turnout buys "over-representation". Divide share of voters by share of the population and you get proportionate representation. A ratio of 1.0 means a group votes proportionate to its size. Values over 1 are overrepresented groups. In 2004, for example, 55 year olds were represented 20% more than their population would suggest, with a 1.2 score. The youngest voters, 18 year olds, had an abysmal representation rate of 0.49 in 2000, less than half their share of the population.

While turnout rises with age, it is not until we hit 40 or so that we reach "fair" representation (1.0). After that, every age group is over-represented in the electorate. Less than 40, and every age group is under-represented. (Two small exceptions-- so sue me.)
So what are the implications? If you gave me a choice of being wildly popular with the young or moderately popular with the old, I'd take the old any day. They are far more reliable in voting, and while their population numbers are small they more than make up for it in over-representation thanks to turnout differences.
There is much conversation about "youth" turnout this year. Perhaps we will indeed see another rise, as we did in 2004. But unless something truly unprecedented occurs, no one can win on the young alone. The gap in turnout is simply too large.
But is age destiny? If there were constant differences in partisan preference by age, then perhaps so. But there aren't. Despite being supposedly "old and set in their ways", those 60 and up shifted their votes more than any other age group between 2000 and 2004. In 2000, the 60+ vote went to Gore by a 4 point margin. In 2004, however, those 60+ went for Bush by 8 points. That net 12 point swing, multiplied by their over-representation means a lot.

The 20-somethings also shifted, from +2 for Gore to +9 for Kerry. Coupled with their surge in turnout, the younger voters kept Kerry close in 2004 when he was losing in every other age category. But it wasn't enough to win.
The Obama campaign may be right that they can gain votes by mobilizing the young. But the old play a bigger role in elections, and they are not imovable in their vote preferences. Indeed, they make the youngest group seem a bit static by comparison. It is not the candidate's age that will be the key to winning the votes of those 60 and over. Issues and personality will play a large role. Any candidate would be well advised to recognize that the dynamic swings among older voters coupled with their substantial over-representation makes them a potent force for electoral change.
Cross-posted at PoliticalArithmetik.com
By Charles Franklin on August 11, 2008 7:02 PM
| Permalink
| Comments (7)
| TrackBacks (0)
August 4, 2008
By Charles Franklin

The most common description of polls is that they are snapshots, not predictions. A good way to look at that in the 2008 election is to compare the '08 campaign with the two that came before.
The chart above shows the trend estimates for each of the last three presidential campaigns. I'm plotting the estimated margin between the two candidates, Dem minus Rep, for each year.
With 93 days to go until the 2008 election, Obama holds a 3.3 point advantage over McCain, though that has been eroding over the past six weeks. If we put a confidence interval around today's estimate, we get a race that is just barely leaning Democratic.
But what about the future? The dynamics of the next 92 days are all important for where we stand on November 4. Since we can't foresee those 92 days yet, let's see what happened during the same time in 2000 and 2004. That gives us a better idea how much change we might anticipate in the next three months.
In 2004, Kerry slowly built a 2 point lead by this time, and held a small lead through much of the summer. But then the race took a sharp turn, with Bush making a 6 point run, taking a four point lead with 50 days to go. Kerry gained back 3 points of that in the polling, but less than 2 points of it in the actual vote, losing by a 2.4 point margin.
In 2000, Bush led in most of the early polls, holding a 6 point lead with 107 days to go. Then Gore moved sharply up, erasing Bush's lead and then adding a 3 point lead for Gore with about 56 days left. Bush promptly reversed Gore's gains with a six point move in the GOP's direction, and led by about 3 points over the last three weeks of the campaign. Of course, the 2000 polls were misleading in predicting a Bush win. Gore won the popular vote by 0.6 points.
So far in 2008, Obama has enjoyed a run up of 5.5 points since his low point in late March. That run is on a par with Bush's in 2004 but still a bit less than Gore's 9 point run in 2000, and on par the Bush's 6 point rebound that year.
Judging from the dynamics we've seen in the past it is quite reasonable to expect the current trend to shift by half-a-dozen points. August and the conventions have been periods of substantial change in both previous elections, so if history repeats itself the next 4 or 5 weeks should be pretty interesting.
The bottom line is neither campaign should be complacent or despondent. There is a lot of time left and recent history shows that both up and down swings of 6-9 points are entirely plausible.
As a P.S. here are the three campaigns with educational confidence intervals around them.

The current 2008 estimate is just barely inside the "lean Dem" range, and will move to toss up if the current trend continues for another couple or three polls.
The 2004 estimate was pretty close to the outcome which was well within the 68% confidence interval around the trend.
The polls in 2000 were troubling for having the wrong popular vote winner, but even there the outcome was inside the 95% confidence interval. With races as close as the last two, it is worth appreciating just how wide those confidence intervals are.
Our efforts to characterize races rely on the best estimates of those confidence intervals, but it is all too easy to focus on who's ahead and not remember how much uncertainty there is. That uncertainty is both about where the current estimate says the race stands today and about how the race may change in coming weeks. The data here show that unless one candidate builds a bigger lead than either has held so far, the uncertainty remains pretty big.
Note: My trend here is slightly different from the Pollster National trend because I'm working off the difference between candidates, not each trend separately, and because I've made 2008 comparable to 2000 and 2004, just a slightly different amount of smoothing compared to Pollster's standard estimator this year. None of those differences change the qualitative picture or shift the magnitude of changes I cite above.
Cross posted at Political Arithemik.
By Charles Franklin on August 4, 2008 5:49 PM
| Permalink
| Comments (3)
| TrackBacks (0)
June 13, 2008
By Charles Franklin

This week my colleague Ken Goldstein and I conducted a Wisconsin statewide survey sponsored by the UW Department of Political Science and WisPolitics.com. So fair warning that I'm a party to this survey rather than an independent observer.
A number of people have commented on the party identification balance in the survey: 38% Dem, 24% Rep, 29% Independent (37% Independent when "no preference/other" are allocated to independent. When this group is asked how they "lean", very few insist on some other party, so this allocation makes sense.) See Alan Reifman's blog on weighting and party id for a good example and discussion of broader issues of weighting to party id.
I want to point out two things here and put our data in the context of other polls in Wisconsin.
The chart above shows party identification trends since 2000 using data from three sources that have done frequent polling in the state. What we see is a relatively stable Dem/Rep parity from 2000-2004, with Dem ID falling a bit around 2004 while Reps moved up slightly.
Starting in 2005, however, there is an initially slow but then sharper shift in partisanship. Republican ID declines from about 30% to about 24% today, while Dem ID rises from about 30% to nearly 40%. After an initial surge of independents, that group has recently fallen off a bit. (You have to squint a bit to see WPRI and Badger after 2005, but they are close to the trend lines during this period, so the changes are not just a matter of house effects or phone vs ivr methods. WPRI, for example, has Rep ID moving from 33% in 2004 to 28%, 26% and 25% in 2005-2007. Their Dem ID rises from 30%-33%-34% then falls to 29% over the same period. The final 29% is a large discrepancy from the trend, of course.)
We did not weight our survey to party identification, and these trends help explain why we have reservations about doing that. While relatively stable, party id does move over time, and by a fair bit, as you can see here. But that said, our unweighted results turn out to be quite close to the estimated trends in partisan categories in any case.
The second point is to compare these trends with those in exit poll measures of party id. In 2000, the VNS Exit poll put Wisconsin pid at 37% Dem, 32% Rep and 31% Ind. This shifted in 2004 to 35% Dem, 27% Ind and 38% Rep. But in 2006 the exit polls found that the balance was 38% Dem, 34% Rep and 27% Ind. Those values all show a smaller share of independents at the polls on election day compared to the polling trend, but that is to be expected given differences in turnout between partisans and independents. The size of the party ID groups grows as a result, but the balance between them is in line with what we see in the trends in the polls, though certainly not an exact match. The polls, after all, are of either adults or likely voters, while the exits are by definition a measure of who actually showed up on election day.
For 2006, the Dem exit percent and the Dem trend estimate are a close match. Republicans gain in the exits, by about 6 points over the 2006 trend estimate. If that holds for 2008, we might expect an electorate more like 38% Dem and 30% Rep. Of course both parties will have very active "ground games" and GOTV efforts to try to change those numbers.
While I'm certainly happy that our party id balance is so close to the trend in all the other polling, the more important point is that party id in Wisconsin has shifted quite a bit over the past four years. The coming campaign may alter that, possibly bringing disappointed former Republicans back home, for example. Likewise a Republican advantage in turnout could bring the exit polls back to closer balance. But as the data show, today the GOP is at the worst disadvantage the state has seen in over eight years.
Let me conclude with a bit of description of the polls used here.
Wisconsin Policy Research Institute ("WPRI") has done some of the longest running polls in the state, usually two a year. Their data here is taken from their annual estimates, which I assume pool the two surveys though they don't say so explicitly. WPRI describes itself as "Wisconsin's Free Market Think Tank".
The "Badger Poll" is conducted by the UW Survey Center. They did more extensive polling in 2002-04 but now do about two polls a year.
SurveyUSA is a well known national pollster that uses "Interactive Voice Response" (IVR) automated interviews. SurveyUSA has done monthly polling in the state since 2005, providing some of the best data on state trends in approval of elected officials and as a byproduct have an excellent data series of party ID.
Finally, there is our new Department of Political Science/WisPolitics poll. Ours uses a commercial call center, not the UW Survey Center or undergrads in a class calling for a grade. WPRI, Badger and our poll all use live interviewers, SurveyUSA uses IVR. Most of these surveys are in the 500-600 respondent range.
Cross posted at Political Arithmetik.
By Charles Franklin on June 13, 2008 3:08 PM
| Permalink
| Comments (2)
| TrackBacks (0)
May 21, 2008
By Charles Franklin

Marriage for gay and lesbian couples has been a hot button issue, most especially so in the 2004 election cycle when 11 states considered and passed referendums banning (in various ways) same-sex marriages. In 2006 an additional 8 states voted on marriage ballot measures, with only Arizona defeating the proposal. In all, 41 states have statutes defining marriage as "between one man and one woman", and 27 states have put that definition into their constitutions. Only five states currently have no law banning same-sex unions (MA, NJ, NM, NY, RI). In 2008, Florida will have a "defense of marriage" amendment (DOMA) on the ballot, while California is awaiting certification of a ballot proposal and Arizona may reconsider its 2006 initiative (currently awaiting state Senate approval). (An excellent summary of the status of same-sex marriage in the states is available here.)
Despite this overwhelming majority among other states, the California Supreme Court last week ruled that the state cannot constitutionally withhold the right to marriage from same-sex couples. (Text of the ruling is here. The LA Times initial report on the decision is here.) Supporters of gay marriage hailed the decision as a breakthrough for fundamental rights, in line with the same California Court's decision in 1948 striking down laws banning inter-racial marriage. Opponents of gay marriage argued the ruling puts the issue squarely back on the table for 2008 and confirmed the opponents argument that only constitutional amendments can prevent courts from overturning popular opinion on this issue. In 2000 California passed, by a 61%-39% majority, Proposition 22 affirming that "only marriage between a man and a woman is valid and recognized in California."
California has one of the strongest domestic partnership laws in the nation, so the Court's decision has the effect of ruling that by withholding the designation "marriage", such domestic partnership laws still fall short of the equal treatment required by the state constitution.
The California decision follows the Massachusetts Supreme Court's ruling of November 18, 2003 which ultimately made Massachusetts the first, and so far only, state to legalize same-sex marriage. (Rhode Island law recognizes same-sex marriages from other states.) Subsequently, the state Supreme Courts of New York, New Jersey and Washington have each declined to find a constitutional right to same sex marriage. Four states have civil union laws providing full state-level spousal rights (CT, NJ, NH and VT) while six have domestic partnership laws that provide varying degrees of spousal rights (DC, HI, ME, OR, WA plus the California law at issue in this decision).
In light of the California decision, let's take a look at public opinion on same-sex marriage and how opinion has responded to past events.
A typical question asks "Do you strongly favor, favor, oppose, or strongly oppose allowing gays and lesbians to marry legally?" (This is the form used by the Pew Research Center polls. There is considerable variation in question wording, but most polling has used a similar dichotomy between favoring gay marriage or opposing it. I've collapsed "degrees" of support or opposition into a dichotomous measure for all polls.) The earliest use of such a question I could find dates back to September 1985, but it was not until 1992 that the question began to be asked regularly. There was a flurry of interest in the question following the Massachusetts ruling and during the 2004 election campaign.
If we rely on that first poll alone, in 1985 82% of the public opposed same sex marriage, while only 11% supported it. By the early 1990s, when the data become richer, opposition was at about 65% while support stood at about 28%. Congress passed, and President Clinton signed, the federal "Defense of Marriage Act" in September 1996, but public opinion trends seem not to have noticed at all, neither rising nor falling around that time. By the week of the California ruling, May 15, 2008, opposition had declined to about 55% while support had grown to 40%. The net effect of some 16 years of public debate was a 10 point decline in opposition and a 12 point rise in support.
But that trend was not uniform. The Massachusetts ruling, and the 2004 election campaign, coincided with a sharp, if relatively short term, disruption of the previous slow but steady decade long shift of opinion. The Massachusetts Court decision placed the issue squarely on the public radar, and the 11 state ballot proposals in the 2004 election created the setting for public debate and political exploitation of the issue.
During the year from November 2003 to November 2004, opposition to same-sex marriage rose by five points, from 55% to just over 60%. Meanwhile support fell by about eight points, from 38% to 30%, then rebounded by a point or so by election day. (These shifts slightly predate the Massachusetts decision, probably reflecting the increased visibility of the issue prior to the Court's ruling.) The impact of these shifts and of the 11 referendums that were passed on the presidential election remains debatable. Initial punditry credited the referenda with helping defeat John Kerry, especially in Ohio. More careful subsequent analysis doubts much of an effect, however.
These sharp shifts in trend reversed direction immediately following the 2004 election, but took more than two years to return to pre-2004 levels. Support returned to 2003 levels in mid-2007 while opposition has only now, in May 2008, declined back to where it stood in mid-2003. Despite this slow recovery from the 2004 "shock", the 2005-08 trend lines make it clear that public opinion returned to its previous trajectory of slowly rising support and declining opposition in the aftermath of 2004. It is also interesting that the 2006 elections, with 8 states voting on referenda, made no discernible difference to the post-2004 trend. In part this may reflect the more limited number of states, but it also reflects some decline in the saliency of the marriage issue.
The California ruling, and the likely campaign over a proposition there to modify the state constitution this fall, will test whether increasing the salience of the issue will result in a replay of the 2003-04 dynamics, with opponents stimulated and supporters in retreat, or if the 2006 experience means that the issue is no longer the motivator it was in 2004. The 2003-04 data clearly show the potential for sharp changes when the marriage issue becomes extremely salient. That the fight will take place in the most populous state in the Union also guarantees national exposure. However, the fact that most states have already settled this issue through law or amendment, and that only three states (so far) are on track to have proposals on the ballot, means that the issue is more localized than it was in 2004.
Opinion now is not much different from where it was in mid-2003, so a similar reaction is possible but there may be an element of "been there, done that" as well. The novelty of the issue is surely much reduced now than it was five years ago, though the record of referenda passing in 7 of 8 states in 2006 certainly demonstrates that opposition to same-sex marriage remained strong even in a very pro-Democratic election year. (Wisconsin, for example, reelected a Democratic governor and flipped a House seat to the Democrats but also modified its constitution to ban same sex marriage or anything substantially equivalent to marriage.)
The big question is whether the marriage issue has any carry over to the presidential vote in 2008. Democratic politicians, including Senators Clinton and Obama, have tried to insulate themselves by opposing gay marriage. Instead, they support civil union or domestic partner legislation. Senator McCain opposes same sex marriage and opposes legal recognition of same sex partnerships, but also opposes a federal constitutional amendment. This line of debate, with both parties opposing marriage, but with Democrats willing to support some legal recognition short of marriage, reflects another way to framing the question, one that is significantly more favorable for limited rights for gays and lesbians.

(Note: This chart is scaled the same as the previous chart so the dynamics and time frame are directly comparable. The large white space prior to 2000 reflects the politically relevant point that in that time period the "civil union" option was not prominent enough to be included in polling questions.)
Beginning in 2004 (with one early exception in 2000), polling organizations began asking a question with three alternatives. The CBS News question wording is representative:
Which comes closest to your view? Gay couples should be allowed to legally marry, or gay couples should be allowed to form civil unions but not legally marry, or there should be no legal recognition of a gay couple's relationship?
When the "civil unions" option is added, opposition to gay rights drops significantly from about 55% to 40%. Likewise, support for gay marriage drops from 40% to 29%. The "comfortable" middle ground is then some 26% who are willing to support civil unions so long as they fall short of "marriage".
This "half a loaf" approach is acceptable to only some in the gay rights community, but it is precisely the politically acceptable position that Democratic politicians think can move them from the losing side of public opinion to the winning side. If we add supporters of marriage to supporters of civil unions, we get the chart below.

This is now a near mirror image of the balance of opinion in the first chart. Now about 53% support either civil unions or marriage, and a minority of 40% oppose any legal rights for gay and lesbian couples. By assuming supporters of marriage will not punish them for the expedient support of only civil unions, Clinton and Obama (and many other Democrats) have tried to turn a losing position into a winning one.
The remaining uncertainty is whether opponents of any legal recognition are more intense than the supporters of civil unions. If so, then opposition groups may still win the battle between intense minority and lukewarm majority. On ballot propositions, the record is strongly in favor of the opponents of marriage and in some cases of civil unions as well.
The Clinton-Obama position will certainly not win over opponents of any form of legal recognition for gays, but then they probably wouldn't win many such voters in any case (an exception is African-Americans, many of whom are quite opposed to marriage or civil unions.) Whether their position provides them popular support in response to attack ads on this issue remains to be seen.
By Charles Franklin on May 21, 2008 1:54 PM
| Permalink
| Comments (2)
| TrackBacks (0)
May 6, 2008
By Charles Franklin

Both standard and sensitive estimators are agreed in North Carolina. In Indiana there is a little bit of room between them, but not enough to affect conclusions about the probable outcome (if the polls are right!)
The gyrations the Indiana sensitive estimator for Clinton goes through, thanks to variability in polls and relatively few polls, is a good warning that the sensitive estimator may just be a bit too ready to chase after noise.

Cross-posted at Political Arithmetik.
By Charles Franklin on May 6, 2008 3:48 PM
| Permalink
| Comments (0)
| TrackBacks (0)
By Charles Franklin

With the last of the preelection polls in, we can now do our "apples to apples" comparison. Follow each pollster in the charts to see who's high, who's low and who has jumped around.
Note this is for the Obama minus Clinton MARGIN (which makes it easier to plot all the polls in one, still jumbled, chart.)
And check back tonight as the votes roll in to see who nailed it and who missed. In North Carolina all agree on the winner, only the margin is in dispute. But Indiana has a little disagreement on who is ahead. Fun!

Cross posted at Political Arithmetik.
By Charles Franklin on May 6, 2008 3:26 PM
| Permalink
| Comments (0)
| TrackBacks (0)
May 5, 2008
By Charles Franklin

One of the things we think about a lot at Pollster.com is the quality of polling. Mark Blumenthal's post on the North Carolina poll demographics here is a great example of how much variability we see among polls, all trying to hit the same target population.
This issue is also raised by those who would like to exclude some polls from our trend estimates. If one "bad apple" spoils the barrel, then this is a serious issue for our efforts to estimate the state of the races here.
We've stuck to our principle that we include all available polls without cherry picking (to shift the fruit metaphor!) but we don't do that out of blind faith. Rather we do it because the empirical evidence shows that the effects of single pollsters are generally small, certainly compared to the other sources of uncertainty about the state of the race.
Here I take a look at this issue for North Carolina and Indiana.
There are four elements that affect how much a pollster influences our trend estimate.
First, the pollster's results must be "different" from the trend we'd estimate without them. If a pollster happened to hit our trend dead on every time, their influence would reinforce our trend estimate, but not change it. So for a poll to affect the trend, it needs to be different from what we'd otherwise estimate.
Second, the pollster needs to produce results that are systematically different from the trend. If a pollster bounces around the trend, some high and some low, then the net effect is small, even if individual polls are rather far off the trend.
Since the trend is determined across all pollsters, these first two points are another way of saying that the pollster must differ from what other pollsters are getting.
Third, volume matters. In some states, a single pollster accounts for a substantial proportion of all polling, while other pollsters contribute only a single poll. The former obviously have more potential influence than the latter. But high volume of polls doesn't matter if they are consistently close to (and scattered around) the trend estimate based on other polling. The problem comes when the prolific pollster is also rather different from others, and especially if there are few other pollsters active in the state.
Fourth, polls late in the game can have more leverage on the "current" trend estimate. So a pollster that does several polls but only in the last week before election day can have more influence on the current estimate than they would if those polls were spread over the entire pre-election period. Again, such an effect is only visible if the late polls are different from other polling.
Having an effect on the trend could be a very good thing if the pollster is right while others are wrong. The problem is how do you know a priori which pollster will be right THIS TIME. Experience this year demonstrates that a good day can be followed by a bad day, or both on the same day.
It is also important to put these effects in perspective across all polls we see in a race. The individual polls are highly variable. Our data often finds polls covering plus or minus 5, 6 or even 7 points of our estimated trend for an individual candidate, and double that for the margin between two candidates. There is a lot of noise out there, and the whole point of our trend estimator is to extract the signal from the noise. Our estimator (especially the "standard" estimator I'm using here, as opposed to the "sensitive" estimator we also check) is designed to resist polls that are "way off" (i.e. outliers) but at the same time be able to follow the common trend across polls. (I'm going to not go into the details of our local regression estimator here, which is not a simple rolling average. Let's hold that for another day. The FAQ on this is coming.)
So let's take a look at the North Carolina plot way up there at the top of this post. The horizontal axis is scaled to show the range of poll results we've seen in the state since April 1. This provides perspective on how much variation you see from poll to poll in the raw results.
The red "whiskers" at the bottom of the plot are the individual polls taken over this time. There is a bit more than a 25 point range in the Obama-Clinton margin during this period. Since the trends in the state have been relatively flat, only a little of this variation is due to "real change".
Our trend estimate based on all polls is the vertical blue line, which as of Monday afternoon is +8.6 points in Obama's favor.
How much do individual pollsters matter for this estimate? PPP has done the most polling in the state. If we take them out, the trend estimate drops to 7.0, a shift of 1.6 points on the difference (or an average of .8 points for each candidate, moving in opposite directions of course).
At the opposite extreme, removing Insider Advantage from our estimator produces a 10.7 point Obama lead, a shift of 2.1 points on the difference, or 1.05 points per candidate.
For most other pollsters, the effect is far smaller, even for relatively frequent pollsters such as SurveyUSA and ARG.
So the maximum effect of removing a single pollster is a shift between a 7.0 and a 10.7 point Obama lead. A shift of 3.7 points on the difference can matter in a close race, but that difference is relatively small compared to the variation we see in individual polls. Indeed, the four polls completed 5/4 show a range of +3 to +10 for the Obama margin. (They average a +7.25, compared to our trend estimate of +8.6.)
There is less polling in Indiana, so we might expect more influence since there are fewer polls to stabilize the trend estimator.

Here the current estimate using all polls is -6.2, a lead for Clinton. The range of results we get from excluding pollsters is from -4.1 (excluding SurveyUSA) to -8.7 (excluding Zogby). That is a bit larger than North Carolina, as expected. But put this in the perspective of the range of raw poll results for Indiana, which is from -16 to +5 in polls taken since April 1. The six latest polls as of Monday, all ending on 5/4, range from -12 to +2.
To sum up. Which polls we include affect our results. That both has to be and should be. We WANT the data to matter, and of course it does. What we don't want is for individual polls to make such large differences for our results that inclusion or exclusion decisions become critical. The results we see here show that we SHOULD be somewhat uncertain as to the trend, as it depends upon which individual pollsters are included. What is somewhat different in our approach at Pollster.com is we want to emphasize this uncertainty and put it in perspective, rather than produce a single number and treat that as if it were "certain". That is why we always show the individual polls spread around our trend estimate in the charts. All estimates have uncertainty. We need to understand both the value of the estimate and the uncertainty inherent in it. Pollster effects are part of that story.
However, what is crucial is that these effects on the trend estimate are small compared to the range of variability we see across individual polls. The goal of our trend estimator is to produce a better estimate than what any single poll (or pollster) can provide. By that standard pollster effects on the trend are modest compared to the variability across individual polls.
Evaluating the accuracy of the polls is a different topic, one we'll revisit again on Wednesday.
Cross-posted at Political Arithmetik.
By Charles Franklin on May 5, 2008 6:17 PM
| Permalink
| Comments (4)
| TrackBacks (0)
By Charles Franklin

As we close in on tomorrow's primaries in North Carolina and Indiana, the "standard" and "sensitive" trend estimates have largely converged.
In North Carolina the standard estimator puts Obama at 50.1% and Clinton at 41.5%. The sensitive estimator has it Obama 49.5% and Clinton 42.2%. Or, a margin in the standard trend of +8.6 for Obama vs +7.3 in the sensitive estimate.

In Indiana, the standard estimator puts Clinton up 49.5% to 43.3% for Obama. Switching to the sensitive estimator makes it Clinton 51.2% to Obama's 43.5%. Or a Clinton advantage of 6.2% for the standard estimator versus 7.7% for the sensitive one.
Either way the polls are seeing a split decision tomorrow. Anything else will be a very interesting surprise.
Cross-posted at Political Arithmetik.
By Charles Franklin on May 5, 2008 4:11 PM
| Permalink
| Comments (2)
| TrackBacks (0)
April 23, 2008
By Charles Franklin

Pennsylvania was a pretty good night for most pollsters, certainly compared to some earlier primaries this year. A few made it into the "five-ring" of the target, while almost all were within the "ten-ring". Only two polls, one rather old, got the winner wrong.
Polls finished on or after April 14 are included in the analysis here.
These errors are based on the vote counts at the Pennsylvania Secretary of State web site as of Wednesday afternoon, with 99.44% of precincts reporting and a Clinton vote of 1,237,696 to Obama's 1,029,672, which rounded to 1 decimal point is 54.6% to 45.4%.
There are a number of different ways to compute accuracy for individual pollsters. SurveyUSA has an excellent assessment and explanation of these as well as measures for all pollsters in all primaries this year. (Their Pollster Report Card is currently masked, awaiting a 100% count from Pennsylvania, so I can't link to it right now. I don't expect the remaining precincts to change the 1 decimal point accuracy here, though I will check and update if necessary.)
The measure of accuracy I use here is being close to the "bullseye" of the target above. I think that is what most people would intuitively think of as accuracy-- getting both candidates right. A "perfect" poll would be exactly on the crosshairs in the middle of the target, which corresponds to getting both candidates' votes exactly right.
Because polls almost always include "undecided" voters, their results tend to be in the lower left quadrant, underestimating the final vote for each candidate. (And in a two candidate race, it is impossible to be in the upper right quadrant, but not so in multi-candidate primaries earlier in the year.)
To summarize a pollster's accuracy, I calculate the distance from their poll to the crosshairs of the bullseye. (The distance is the square root of the sum of squared errors for each candidate, if you recall your math about triangles and the hypotenuse.) This "Total Error" is plotted by pollster below. Smaller errors are to the left.

Quinnipiac gets the bragging rights by this measure, with their 51%-44% from polling completed 4/18-20/08. They are followed by Suffolk, ARG and SurveyUSA.
The dots become darker the closer to election day the poll was taken. In this plot, the more recent polls are usually more accurate than are older polls. This is especially clear in the Zogby/Newsmax polling.
A reasonable complaint about this measure is that if a poll finds more "undecided" voters, they will tend to be further away from the bullseye and so this measure penalizes pollster who are more sensitive to potential uncertainty among voters, while possibly rewarding those who push respondents harder for an answer. Deciding how hard to push for a preference is part of the "art" of polling and reasonable pollsters may differ on how hard to push.
An alternative measure focuses on the "margin" between the candidates in the poll compared to the vote. By this approach a poll with a 10 point margin is "right on" if the vote margin is 10 points. But this is true for a poll that has 55-45 as well as for one that has it 45-35 or even 25-15. Despite this drawback, the margin measure doesn't penalize for undecided rate and so it has fans. By that measure the pollsters line up as below.

Here two pollsters can each claim victory. Suffolk and Zogby/Newsmax each had a 10 point margin in their final polls, just a bit over the 9.2 point margin in the vote count. The Insider Advantage final poll has a larger error, with a 7 point margin in their 4/21 poll, while the 10 point margin was for their 4/20 poll. Likewise, Rasmussen's 9 point margin came from a very old poll taken 4/14 (50%-41%) while their final poll taken 4/20 had a 5 point margin (49%-44%).
Cross-posted at PoliticalArithmetik.com.
By Charles Franklin on April 23, 2008 5:22 PM
| Permalink
| Comments (2)
| TrackBacks (0)
April 22, 2008
By Charles Franklin

Clinton has increased her lead in the trend estimates over the course of the last polls to 6.6 points using the standard estimator, and to 8.4 points using the sensitive estimate. Last minute polls have given her bigger margins.
Now the key question is whether undecideds push her over a 10 point win, or whether increases in turnout by new "unlikely" voters raises Obama's total.

Still a good bit of variation and some pollsters see a strong trend, others not so much.

Pollster variation doesn't make a lot of difference in our trend estimates.
But remember, since the polls don't allocate undecided, both they and the trend estimates are leaving some 8 percent of voters on the table. They will go somewhere, and if they break disproportionately for Clinton you have a "huge win", while if they go overwhelmingly for Obama you have a nail biter or a dramatic come-from-behind win. In previous primaries, the "winner" has usually enjoyed a significant increase in support beyond what the last polls showed.
Cross-posted at PoliticalArithmetik.com.
By Charles Franklin on April 22, 2008 8:25 AM
| Permalink
| Comments (7)
| TrackBacks (0)
April 21, 2008
By Charles Franklin
Senator Clinton currently holds a 6 point lead over Senator Obama in Pennsylvania, based on our Pollster Trend Estimate, 49%-43%. But that leaves about 8 percent undecided. What they do will determine whether Clinton's vote expands her lead compared to the polls, or if the undecided narrow or possibly reverse, the lead.
My partner at Pollster, Mark Blumenthal, has looked at this using aggregate polling data here and in his NationalJournal.com column here.
In this post I take a look at the individual level, though using data that are three weeks old, so use caution in extrapolating to tomorrow's electorate.
Using data from the Time/SRBI poll of Pennsylvania, conducted 4/2-6/08, I estimate a model of support for Obama compared to Clinton. I use "the usual suspects" as variables predicting vote: partisanship, gender, race, Hispanic ethnicity, region of the state, age, education, religion and income. The data at that time found an eight point Clinton lead, a bit higher than today's trend estimate.
Using the coefficients for "decided" voters, I can estimate the probable vote of the undecided 11% of voters in the poll. This gives us a look at how they would be expected to behave IF they behave like those who have already picked a candidate. (Note the "if" here. As with all models, this assumes stable influence of the variables among the undecided as among the decided.)
The plot above shows the distribution of estimated probability of voting for Obama. Values close to zero are very likely to support Clinton, while values close to 1 are very likely Obama supporters. Those close to .5 are flipping a coin. The shape of the distribution gives a sense of where voters "lump up" in their estimated preferences.
The black line plots the distribution among those who reported a vote preference. The red line plots the distribution of estimated support among those who said they were undecided in early April.
The key point is that the undecided resemble the decided, with a small shift to the left, suggesting they were as a group somewhat more likely to support Clinton. In these data, the primary difference between undecided and decided voters was age, with older voters more likely to say they hadn't decided. As we've seen in virtually every exit poll, older voters are more likely to support Clinton, so the result we find here, that the undecided lean a bit more towards Clinton, is consistent with this result.
Now again for the caveats. These data are three weeks old. The model requires the assumption that undecided voters ultimately behave like those who decided. Different variables as predictors can make a difference. And so on.
The goal here is NOT, NOT, NOT a prediction of tomorrow's vote. Much may have changed since the first week of April.
The point is to illustrate what we can learn about undecided voters beyond the simple fact they say "undecided". In this case, the data suggest they are not wildly different from those who decided, but their older age makes it more likely they ultimately lean more to Clinton.
The Time/SRBI data are archived at the Roper Center for Public Opinion Research. I am solely responsible for the analysis here.
Cross-posted at PoliticalArithmetik.com.
By Charles Franklin on April 21, 2008 3:46 PM
| Permalink
| Comments (1)
| TrackBacks (0)
By Charles Franklin

Judge for yourself.
Cross posted at PoliticalArithmetik.com. See also previous pollster comparison post for Pennsylvania.
By Charles Franklin on April 21, 2008 1:52 PM
| Permalink
| Comments (5)
| TrackBacks (0)
By Charles Franklin

The Pennsylvania race has turned slightly toward Clinton over the weekend, with her lead now at an even 6 points in our standard trend estimate. If you believe in taking more chances with random noise, the sensitive estimator has a 6.4 point Clinton lead.
In the rush of new polling over the weekend, it is also good to check how much any of them may be affecting our estimates.

Dropping any single pollster makes only a bit of different to our estimates. The Clinton trend ranges from 48.5% to 49.6%, while Obama ranges from 42.6% to 43.5%. So dropping your least favorite pollster can, at most, account for the difference in a 5 point race and a 7 point one.
And note that we still have about 9 percent undecided. I wonder what they will do?
Cross posted at PoliticalArithmetik.com. See also previous "sensitivity" update for Pennsylvania.
By Charles Franklin on April 21, 2008 1:15 PM
| Permalink
| Comments (11)
| TrackBacks (0)
April 19, 2008
By Charles Franklin

A new Newsweek poll gives Barack Obama a 54%-35% lead over Hillary Clinton among Democratic voters (story here, detailed results here, and thanks to Newsweek and their pollster, Princeton Survey Research Associates International, for a full and complete disclosure of the details of their survey. A model others should be encouraged to follow.)
The Newsweek poll raised a few eyebrows for its 19 point Obama lead, considerably more than other recent polls, and beyond the 10.4 point Obama lead in our trend estimator. However, a closer look at recent data shows that Newsweek is not far from other recent data. Newsweek is the 6th poll in April with Obama at or above 50%, while five April polls put him below 50%. With Clinton, Newsweek is the 4th April poll putting her at or below 40%, while eight polls have her above 40%. So Newsweek shows a larger Obama lead than others, but it is not as far out of line as may first appear. (Note in the counts of polls above, we only count independent samples of the Gallup daily tracker, so dont' count each of their daily results as new polls.)
As you can see from the plots below, we've not seen many recent outliers in the national Democratic nomination polling, and the new Newsweek is well within the 95% confidence interval.

All that said, our trend estimate for the race puts Obama at 50.2% and Clinton at 39.8%, a significant gain for Obama during the month of April. Since late March, Clinton has suffered a somewhat greater downward slope while Obama's gains have been a bit more shallow, implying a slight gain among undecided voters.
The Newsweek poll also has some interesting internal results. As with virtually all this year's polling, Obama has a substantial lead among Independents who will vote in the Democratic primary or who lean Democratic: 61% to 28% for Clinton. What is a key to Obama's strength in the Newsweek poll is he ALSO leads among self-declared Democrats 51% to 38%, a group Clinton has won in most contests. If real (and I want to see more data before I accept this change) then Obama may be winning the consensus among party rank and file that will be key to persuading Superdelegates to move strongly in his direction. So long as he trails among the strongest party identifiers, that case is less persuasive. Pennsylvania provides a new test of this possible change in support. (Obama continues to trail in our Pennsylvania estimates, so it is unlikely he has so far persuaded a majority of Democratic identifiers there, though stay tuned for Tuesday's exit polls.)
The other important shifts in this national Newsweek poll is that Obama leads among men 57%-31% but also among women 52%-38%. Again this would represent an important gain among women.
The age gradient in Obama support has been interesting all year. In the Newsweek poll, he wins 18-39 year olds by 62%-28%, as usual, but also wins 40-59 year olds by 54-36%. In past exit polls, his "break even point" has varied among age groups from as low as 40 (i.e. losing all groups over 40 years old) to as high as 59 (only losing those over 60 years old.). More astonishing here is he gains a plurality of those over 60, 47%-41%, which if true would be his best performance among older voters all year.
The area of the Newsweek poll where Obama still suffers is among working class or poor whites, where he trails badly, 35%-54%. In contrast he leads 52%-35% among upper and middle class whites. That class divide remains a critical issue for his campaign.
A caution here as well. In any poll with such high overall support, the support almost has to reach across many subgroups (not quite as a mathematical certainty, but as a strong empirical regularity.) So we should be careful not to accept the depth of Obama's support among Democrats, women and those over 40 years old until we have more evidence from additional polling. In the exit polls this year, where we see big Obama wins (VA, MD, WI) we also saw him making strong inroads among these groups. But with the margin he achieved in these states, it would have been hard NOT to have done well across groups. Be careful of the cause and effect attributions here. It is a challengin