The Day The Data Died

All our oracles failed us in 2016. Now we’re left sorting through the mess.

We didn’t see this year coming, but we heard it from all sides. In Signal & Noise 2016, you’ll find the way we made sense out of all of that sound.

The closer we got to the election, the more it felt like the whole country was being held hostage by a cell of political meteorologists. Cable chyrons turned into a cascade of probabilities, and obsessively checking FiveThirtyEight became a contagious character tic. It didn’t matter that the forecast hadn’t changed for months — people just wanted to make sure that there was still a 99 percent chance of benign sunshine and not the weather patterns of a Tom Cruise action-movie climax. If a source was less certain that November 9 would be a perfect day, there was always another place willing to predict something more pleasant. The word “poll” itself was repeated so many times that it felt like penance, the equivalent of reciting 10 Hail Marys over an abacus for the sin of forcing Americans to listen to news about the election for two years.

The term lost its mooring from any meaning after the thousandth time you heard it, right about when you realized that the word “poll” looks like a pair of opera glasses staring at a large wall. That image seems somewhat fitting, although opera glasses might be too fancy for a night of listening to yet another survey saying that Hillary Clinton is going to be the next president of the United States. Maybe the lid locks from A Clockwork Orange or Ralphie’s glasses after he almost shoots his eye out in A Christmas Story would be more appropriate.

We’re getting off topic, but it doesn’t even matter. The forecast didn’t change. The political websites thought Hillary Clinton would win. The prediction markets thought Hillary Clinton would win. Hillary Clinton thought Hillary Clinton would win. Donald Trump’s campaign thought Hillary Clinton would win. The evidence was overwhelming: November 9 would be a sunny, 70-degree day.

Instead it was practically a hurricane. And thanks to hindsight, the most useful tool humans have at their disposal for pointing out the obvious only after said revelation is useless, all the reasons we should have been a bit more skeptical became clear. Of course it wasn’t a given that the Obama coalition would vote for Clinton. Maybe spending most of your money on hats and digital outreach wasn’t the worst campaign strategy. Perhaps discounting the fact that many Americans almost always vote for the opposite party after eight years of the other one was unwise.

Maybe the biggest thing we should have known was to not trust the polls or data so much in the first place, although it’s hard to remember that when the numbers seemed so sure of themselves after a few election cycles of making the right predictions. But we’ve been here before.

It’s not that the polls or data were entirely wrong. The ballots are still being counted, but Hillary Clinton’s lead in the popular vote has surged to nearly 2 percent. Right before the election, RealClearPolitics’ polling average had Clinton up 3.2 percentage points, which means that pollsters were better at predicting the popular vote in 2016 than they were four years ago. It also seems less than smart to make any broad proclamations against anything this year, as the final electoral result was upended by a handful of voters in a few states. But many other assumptions made by pollsters and data scientists about 2016 did turn out to be incorrect, rendering the entire narrative of this year’s election false.

And worse, we probably should have known all along that this could happen.

About a year ago, Jill Lepore authored a New Yorker piece that laid out the many ways polling has gotten harder while simultaneously taking up even more real estate during the election season. She wrote that “from the late 1990s to 2012, 1,200 polling organizations conducted nearly 37,000 polls by making more than 3 billion phone calls.” These eager statistical scavengers are chasing after a smaller and smaller sample of willing respondents. Most surveys have a response rate in the single digits. In order to make the final result look more like the population writ large, the pollsters have to go all impressionistic on the data, changing the weight of certain answers until the raw numbers give off the sense of looking like America. This necessary fudging becomes a lot harder when no one is sure what the population is supposed to look like — which happens to be the case because it changes every year and is dependent on a bunch of fuzzy variables like enthusiasm or access. And most polling data didn’t seem to prepare for an influx of white, non–college educated voters — assuming that the entire population of Trump supporters was perhaps limited to those already profiled in national media outlets. The data also didn’t account for the fact that Clinton perhaps wouldn’t be able to draw record numbers of black voters like Obama did.

Those low response rates can muck up surveys in another way. The people most likely to own landlines and pick up the phone when pollsters call also happen to be the type of people most likely to head to the polls. Other less-likely voters can be harder to reach, either because they don’t own a landline — pollsters can’t autodial with cell phones — or because they don’t trust polls and refuse to take part in one. The voters who surprised everyone by voting for Trump or a third party seem like the type unlikely to pick up the phone when pollsters call. You end up relying on a small sample of opinions to understand entire demographics — which is how one young black Trump supporter ended up pushing up his candidate's popular vote chances in the USC Dornsife/Los Angeles Times poll.

The problem is even worse at the state level, where funding for polls can be scarce. Some states are lucky to get a single survey during the entire election season. Others are stuck with only one firm for all their predictions, leaving no way to compare the results. Cheaper horse-race polls that become news but do little to help understand why voters are leaning a certain way often fill the void. The methodology behind these state-level surveys can be mysterious, or it can be clear that the pollsters are merely trying to get attention — even at the expense of good math. Although the national pollsters were surprisingly good at figuring out the popular vote, those beleaguered state surveyors suffered from the lack of love and money. In Michigan, Pennsylvania, and Wisconsin, the polls all had Clinton up in the weeks before Election Day.

These worries aren’t new, however, and neither are the assumptions that drive their relevance. It’s always been hard to figure out who exactly a likely voter is supposed to be, and state polling firms have been desperate for cash for years. But things are getting worse. A decade ago, academics were worried about a 20 percent response rate instead of the single-digit one that awaited them in 2016, to say nothing of freaking out about what a hungry and cash-strapped news landscape addicted to horse-race polls would mean for our country’s ability to understand an election.

A newer problem threatens to make anxieties about accuracy secondary anyway. Just as news has slowly transmogrified into confirmation-bias comfort food, there to let people read only information that tells them what they already wanted to know, so have raw numbers become a malleable medium that can be harvested to make any possible point. And because there are more poll numbers and more news outlets analyzing said numbers than ever, it’s easy to turn data points into fairy tales.

Donald Trump, the Hans Christian Andersen of his own election cycle, was the exemplar of this field of self-delusion. Since he had no government experience, poll numbers were his chief character reference early in the primary campaign. He mentioned them during every debate, during every rally, in the middle of every TV hit. “I mean, right now I love polls,” he told ABC News in 2015, “because I’m winning everything.” The fact that early primary polls are often unreliable didn’t matter to him — and that point became irrelevant after their findings were proven correct. He cited pish-posh post-debate push polls with abandon at a time when the more accurate polls turned against him. After a while, he stopped citing them completely, telling one crowd in Colorado, “When we do badly, I don’t talk about the polls. When we’re doing well, I talk about the polls.” Many of his supporters used the same misguided logic to conclude that he would be victorious. It seems unlikely that fears of rigged polls and love for Halloween mask sales as prediction tool are about to go away now that their view appears entirely justified.

Clinton supporters were similarly guilty of using data as therapy. One political-science professor I spoke to said that one of his students stopped looking at FiveThirtyEight after the forecast started saying she only had a 65 percent chance of winning, instead preferring to look at Sam Wang’s forecast, which still had her chances at 99 percent. Near the end of the election, Wired published a story titled, “I Just Want Nate Silver to Tell Me It’s All Going to Be Fine.” Now that people feel betrayed by these predictions that were only ever probabilities, it’s clear that it would have been more useful to remember that Clinton still had a possibility of losing, instead of hiding all the evidence.

There were signs that things could be much more up in the air than people presumed in the days before the election. As Silver noted in a post defending his forecast, there were far more undecideds left to choose sides than there were in 2012. The race was too close to call, which is what some political scientists predicted might happen last year.

Embracing uncertainty, or at least acknowledging the assumptions that power polling and forecasting, still wouldn’t change the role that these predictions play in presidential elections — a job that sends supporters to the numbers like they’re ironclad Westworld theories. Even as changes in the polling industry make it wise to treat them like a side dish, many media outlets have become more reliant on them to create continual “news” for reporters to cover while simultaneously offering a sacrifice at the altar of the horse race. Unfortunately for consumers, sometimes there isn’t an easy answer for the vagaries of polling data. And in an election like this one, where all the data seemed to show that the final tally could be close, it is perhaps wisest to look at the polls or forecasts and say, “I don’t know what this means,” or “nothing has changed,” or “this means nothing,” or “why are you looking at me, I have no idea who is going to win.” These answers, however, do not take up much airtime, and such sexy analysis is unlikely to win you many fans.

Realizing that polls and data may not have all the answers — especially if the assumptions you make to use the numbers are completely off base — might be a useful lesson for campaigns to learn too. It is perhaps even more essential after multiple campaign cycles of fawning profiles about the data gurus who happened to be in the office during both of Barack Obama’s victories. Everyone loves a data oracle that was proven right in retrospect — until they get unlucky and are banished to the dustiest circle of crosstabs hell. The Clinton campaign was planning to announce after its victory that an algorithm named Ada guided all of its campaigning decisions. Her analytics director was one of the highest-paid staffers, and her campaign manager told Politico that they “relied almost entirely” on the advice given by the algorithm. The algorithm’s name was borrowed from the Countess of Lovelace, credited with being the first programmer. Her father was Lord Byron, and her mother forced the girl into science and math in order to subdue any genetic inclinations she might have for poetry. Her studious path left a woman seen as being “not devoid of imagination, but … chiefly exercised in connection with her mechanical ingenuity.” Clinton could probably relate.

In the end, however, complete faith in the data was misguided, as the assumptions used to make the algorithm correct were wrong. Clinton never visited Wisconsin during the general election, and an overpowering ground game can’t save you if deployed to the wrong places. Trump once called data “overrated,” but his campaign ended up spending millions on it in an effort called Project Alamo. But he didn’t win because of the data, no more than President Obama did. As one data scientist told Wired, “Statistics and data science gets more credit than it deserves when it’s correct — and more blame than it deserves when it’s incorrect.” And while reporters tend to obsess over the new and nifty, the hottest trend isn’t always responsible for a candidate’s victory.

Just because a prediction or tactic worked once doesn’t mean it’s foolproof. Even those who make good predictions most of the time get it wrong eventually — hi, Nate(s)! — even though the faith in said prediction maker will only grow stronger the more they are right, making their inevitable failure all the more painful.

Polls have been wrong many times before now — as Lepore notes in her story, Edward R. Murrow responded to the surprise 1952 election results by concluding that “the people surprised the pollsters, the prophets, and many politicians. They are mysterious and their motives are not to be measured by mechanical means.” They’ll be wrong again, especially in close elections like this one — and polling firms will try to make adjustments. Meanwhile, Americans will become more distrustful of the polls and the media, refusing to believe sources that interpret the data correctly, instead preferring those who use numerical toe jam to keep things interesting, or at least ideological. And if we’re stuck with polls forever, we could at least acknowledge the fact that they might be wrong, or that the data can’t always tell you who’s going to win, or that the only correct answer to a lump of polling data is sometimes “Who the heck knows?”

Ultimately, 2016 was about misunderstanding everything — even hard numbers. Judging from the past few weeks, however — there are more than 150 million Google results for “why Donald Trump won” — it seems unlikely that many lessons have been learned about not explaining or predicting things with 100 percent certainty. Unfortunately, one of the tools that would be useful in trying to figure out what actually happened in 2016 would be massive amounts of state-level polling data about the thoughts of voters in states where Trump excelled. But what we have instead is a bunch of horse-race numbers that may have seemed interesting two months ago but are now as useful as an expired bag of bedazzled Miracle-Gro. Those poll numbers might have been nice then, but they aren’t going to help you understand the reality unfolding now. And unlike less-than-comforting polls or forecasts, the world doesn’t disappear from relevance the day after they appear. At some point we need to learn some lessons instead of just steamrolling forward, or things are going to keep getting worse while we keep making assumptions deluding us into thinking the world will always go according to plan while missing what’s happening right in front of our eyes.

Check out more from the year in music, culture, politics, and style in Signal & Noise 2016.