Saturday, October 25, 2008

How to read, understand, and even trust polls

The other night, a friend asked me who I thought was going to win the election. When I started talking about polls, a second friend said, “Uhh, dude, can you really trust those? Didn’t they show Kerry up at this point?” My answer was, “Yes, you can trust them – if you know how to read them correctly.”

Indeed, while this might sound screwy with the CBS/New York Times poll showing Obama up 13 points just two days after AP/GfK had him up by one, polls can be trusted. You just have to know how to read them. It’s not enough to look at a headline and see “Obama’s lead widens to six points.” Most reporters are quite irresponsible about polls, reporting not what’s scientifically sound but what’s interesting.

To help readers with that, here are a few basic guidelines I’ve drawn up to help readers understand polls and figure out which ones they can trust. This is in no way a comprehensive guide; for a real expert, I suggest Nate Silver’s FiveThirtyEight.com. Silver also provides a handy guide to pollsters, explaining which follow the rules. This post, on the other hand, is a basic guide written off the top of my head for friends and regular readers. I’m just an amateur, or as this blog’s description says, “a young man still learning who he is and where he may be headed.” My credentials are simply an A- in a public opinion class, some campaign experience, an obsession with the news, and a decent legal mind.

Here, then, are nine things to check before knowing whether or not you can trust a poll. The first four are a little longer and more technical, but no worries. Everything is explained in simple language that even a potty-trained labradoodle could understand, because quite frankly, that’s the only language *I* can understand.

  • The sample frame, or the people polled, MUST be chosen randomly. Anything else is forced and might not actually capture the public’s opinion. You can’t trust Internet polls like those from Harris or Zogby Interactive; the sample frame is self-selecting, and reflects motivated thinking voters rather than average voters. The best way to get a random sample is to make sure the poll is RDD – random digit dialing.

  • The next question to ask is, is this a poll of Registered Voters (RV) or of Likely Voters (LV)? While LV models are historically more accurate, I actually trust RV models for this election. LV polls ask RVs if they plan to vote in the upcoming election and if they’ve voted in the past. The problem with this model is that it automatically weeds out newly registered voters and those too young to have voted in the past election, and this year, for the first time, we saw phenomenal turnout from those voters during the primaries. In my opinion, the best models are those used by NBC News/Wall Street Journal and the “expanded” (NOT “traditional”) Gallup Tracking.

  • Similar to LV/RV models is how the sample frame is weighted. If 30% of the electorate is Republican and 35% Democratic, but the sample frame is 35% for both, the pollster will use a formula to devalue the Republicans’ answers ever so slightly and increase the “weight” of the Independents’ answers (except for Zogby, who is fast losing credibility). This approach is used for every major demographic – race, geography, etc. Keep a close eye on weighting, because this is where polls can go horribly wrong. The reason this week’s AP poll shows the presidential race a virtual tie when every other major poll is up around 7-8 points is because a full 45% of respondents called themselves born-again Christians, roughly double the size of the actual evangelical population. This is also why so many primary polls were wrong – youth voters showed up in record numbers, completely smashing the turnout models on which most weighting is based.

  • The most common question about polls is, “How can a poll of 1,000 random people accurately reflect 300 million Americans? I’m never called!” But oddly enough, it works. Exactly why is getting a little beyond me – it involves statistics and formulas and that’s when my eyes glaze over – but history does show that sample frames of at least 500 people are pretty darn accurate. This is also where the margin of error comes in. The MoE tells you how accurate the poll is likely to be, assuming that the sample was representative and the weighing was done correctly. The larger the sample, the lower the margin of error, although it is curved – a poll of 1,000 people is much better than a poll of 300, but not that different from a poll of 2,000.

  • On to the less technical stuff. I only trust polls with live questioners, none of that automated junk. Speaking to an actual person tends to produce better results than talking to a cold, distant machine, so I tend to dislike SurveyUSA and Rasmussen (although both had a good performance record during the primaries).

  • When was the poll conducted? I don't like polls that were conducted over a five day period in the heat of an election, particularly tracking polls. That's too long a timeframe; a lot of things can happen. Give me just two-three days, please, and try not to split them over a weekend like Thursday, Sunday, and Monday.

  • It’s really important to look at question wording. Obviously “Do you support socialized medicine?” will draw very different answers than “Do you support single-payer health insurance?” This also matters, oddly enough, with “don’t know,” “undecided,” “not sure,” and/or “haven’t heard enough.”

  • Studies show that the first answer listed on a test or ballot always gets a slight bump. A good pollster will always rotate the answers – half of respondents should be asked about Obama then McCain, and the other half hear McCain’s name first.

  • Finally, it’s important to know who is conducting or sponsoring the poll. Universities, major non-commercial polling organizations with reputations to protect (like Gallup and Pew), and prominent news organizations are generally the most trustworthy. Political parties and marketing corporations, however, clearly have agendas, and should be taken not with a grain but with a full rim of salt.

    I used to think that excluding cell phones would leave the youth vote underrepresented, but some polls have actually included cell phones this year, and it doesn’t seem to make a difference.

    Remember also that even the bestpolls can only tell you how the electorate feels at the time the poll is taken; none actually predict the future. A lot can change in a couple weeks' time – scandals, effective ads, debates, voter turnout, etc. Don’t let such things affect a pollster’s credibility. Equally important, don't ever measure a campaign by just one poll - always look at the aggregate, or take a "poll of polls."

    FiveThirtyEight is a great place for poll aggregation and analysis, and RealClearPolitics is an excellent clearinghouse for polls and commentary that I check each morning, although I’m a little circumspect about their analysis. 538's polls of polls are probably a little better than RCP's.

    And just like that, I’ve saved you $10,000 in tuition fees. ($8,000 if you were hoping for formulas and equations, but like I said: potty-trained labradoodle.)

  • No comments: