Did Big Data Fail Us in the Presidential Election?

At's Demystifing AI conference, I gave (what was supposed to be) a lightning talk on Big Data and the Presidential Election. The awesomely engaged audience kept it going for well over time, and I came out of it with great insights and thoughts to put down in a blog post. 


What appealed to the audience was my 'post-mortem' approach, a name that I find deliciously macabre. The goal of the approach is to look at a project after the fact and analyze it's successes and failures at every step of the way. In this case, I looked at the presidential election, and called the 'project' of predicting the outcome a 'failure' in our astounding inability to predict that Donald Trump would win more Electoral College votes than Hillary Clinton. Let's unpack the discussion, slide by slide. 

SLIDE 1: Big Data Failed Us This Election

The premise of the talk. What we know is that not a single major poll was able to accurately predict the outcome of this election. In fact, we failed SPECTACULARLY. 

Worth understanding is we have standalone polls - like Gallup- groups who conduct their own polling, and we have metapolls, which are amalgamations of other polls, such as RealClearPolitics. This is an important distinction, because the former are responsible for executing their own surveys and locating their own samples, while the latter is an aggregate that relies on an 'ensemble' of polls. 

Also worth note is that this election was supposed to be the launch of Votecastr, which was supposed to provide real-time, and accurate, insights into the election outcome. This also failed to accurately predict the outcome. 

As a result, there has been a lot of post-election big data backlash, some of the more dramatic headlines being the inspiration for this talk's title. Let's unpack why. 

SLIDE 2: Our Understanding of Big Data Failed Us This Election

Even within the polling community, there was inconsistent prediction. The New York Times Upshot model, for example, gave Clinton ~85% chance of winning, while FiveThirtyEight gave her a ~72% chance. That's a pretty big difference. 

Let's look deeper. While the polls all agreed that she would win, they disagreed on the methodology to predict how. Most publicly, Nate Silver got into a heated battle with Huffington Post on his use of trend adjustment, which HuffPo called "changing the results of polls to fit what he thinks the polls are, rather than simply entering the poll numbers into his model and crunching them." 


Rather than taking a simple average -- like RealClearPolitics does -- Silver’s model weights polls by his team’s assessment of their quality, and also performs several “adjustments” to account for things like the partisanship of a pollster or the trend lines across different polls. Yet other models take historical trend into account, and demographic shifts. There is no clear consensus on a 'best' model.

SLIDE 3: Our Explanation of Big Data Failed Us This Election

Talking data to media outlets is a dangerous game of telephone. In my opinion, it is the data scientist's responsibility to be as clear as possible and as unambiguous as possible on the true meaning of their model. What is the error margin? What is the degree of confidence? What does a "75% chance" mean (hint, it doesn't mean that there is a guarantee of winning).

Of course, this is sometimes at odds with current trends of clickbait journalism. "Clinton win likely with a 62 to 89 percent probability" is not as eye-catching or click-inducing as "Clinton 90% likely to win." What to some may be semantics is to us the meat of the discussion.  

As scientists, we got caught up with selling precision. Polls are notoriously flawed, and predictive models that result from polls have a wide margin. At best, we over-reported how good our models were, at worst, people used that margin of error to their advantage. 

SLIDE 4: Our Understanding Of How We Collect Big Data Failed Us This Election

Polling, as I mentioned above, is notoriously flawed. As a masters student (about a decade ago!) I sat in on many discussions and panels about declining response rates, sample biases, the rise of do-not-call lists and how to get people to tell the truth in polls. The share of households that agreed to participate in a telephone survey by the Pew Research Center dropped to 14 percent by 2012 from 43 percent in 1997. This was before the contention and mistrust sown by the current social and political climate.

Long story short, these problems have (some) methodological workarounds, but are far from solved. In fact, some of them are worse. Depending on where you lived, there may be strong incentive to lie about your vote to align with your region's preference. 

In other words - GIGO, or garbage in, garbage out. If our data going into our models was flawed, our analyses coming out are not trustworthy. 

SLIDE 5: We Failed Big Data This Election

There was a great post-election quote by Erik Brynjolfsson along the lines of "if you understand how models work, you weren't surprised by this election" - apologies that I can't find the source. He was referring to understanding that a probability isn't a certainty, but it globally applies to this election as a prediction project. 

Ultimately, if we understand this as a data science project, we failed on all counts: 

- we failed to bring in good data that we had faith in
- we failed to build a model that was accurate and delivered good results
- we failed to validate our model
- we failed to communicate our results properly to our audience

What is our takeaway? Humility and introspection. We are only as good as the models we build and the quality of work we produce. 

The Case for Humanity in Data Science

This is a post I've been working on for some time, and is sparked by a lot of different undercurrents in data science. First is the "will algorithms replace us?" question. The next was our current talk of 'racist' algorithms. 

First, let's discuss how amazing this data science thing is. Data science is undoubtedly having a significant impact in all aspects of our lives and will continue to. At least, I hope so...I'm a data scientist. In order to continue this progress, we have to have a degree of trust in the system. We have to share our data, provide personal information, and have faith that the people and the artificial intelligence behind our constant technological advances will protect us. 

When I put it that way, being a data scientist sounds more like being a superhero. To paraphrase Uncle Ben, with big data comes big responsibility. 

When I frame data science this way, it's easy to see how I feel about the "will algorithms replace us" question. Short answer - no. Long answer -  for a job to be fully machine-replaceable, it has to fit the following criteria: it cannot make non-interfereable decisions that could have negative repercussions on a person. In other words, the decisions it makes cannot have potential to negatively affect a human being, even via 'butterfly effect.'

While that sounds easy enough, when we give more consideration to this stipulation, we are hard pressed to find cases in which this is true. One of the most well-publicized cases was by Pro Publica, where an algorithm predicted black criminals to be more likely to re-offend. In their language "There’s software used across the country to predict future criminals. And it’s biased against blacks." 

Similarly, other quotes from articles: 
"We’ve Hit Peak Human and an Algorithm Wants Your Job. Now What?" - Wall Street Journal
"Can Computers Be Racist? The Human-Like Bias Of Algorithms" - NPR
"It's no surprise that inequality in the U.S. is on the rise. But what you might not know is that math is partly to blame." - CNN Money

Political philosophy me (yes, that was my subfield) cringes at the language. It's a very subtle shift of responsibility called moral outsourcing. The subject of my talk this Thursday at the Women Catalyst group, moral outsourcing is the shifting of moral decisionmaking to another entity (in this case, algorithms).

The humanizing language in the sentences above have the convenient outcome of shifting blame from humans. Algorithms and artificial intelligence is only as unbiased as the human behind it. Data scientists have internalized the mantra borrowed from our engineering halves- GIGO (garbage in, garbage out). But the comfortable and convenient thing about code is that we know when it doesn't work - we error out, the thing we've programmed doesn't happen. 

Algorithms also suffer from GIGO, but the 'garbage out' part is significantly more difficult to discern, and sometimes cannot be understood until post-deployment if we don't know what to look for.

This leads me to the second part of this post. The data science community is wonderful in it's desire to do good. In fact, most of these 'biased' algorithms are the product of well-intentioned data science teams. 

When I started Metis for Good, our internal pro-bono group, we our first project was with Invisible Institute, the group behind the Citizens Police Data Project. Behind CPDP is public domain complaint data against police officers - complete with badge numbers and names.

One of our first considerations as a group was to think through how to use our data in modeling. We decided immediately not to build any sort of classification or predictive model on officer-level data. In data science-speak, we're not willing to have any model error. Let's say we had a model to 'predict' violent officers. I'm not willing to have our model misclassify a single officer. Instead, a team member suggested we move a level up in our unit of analysis - rather than officer-level, we use precinct-level predictions. 

We're hosting a hackathon this Saturday. We'll be building on the great work Invisible Institute has already done, and contributing some of our own. I'm proud of our strong ethical data consideration that goes into constructing and developing our data science at Metis. All data scientists know that bias is nearly unavoidable. What is not unavoidable is our moral responsibility to implement our models ethically. 



Grand internet debut!

I'm adding to the already well-developed discussion on women and the work-life balance. TPM is one of my favorite political methods blogs to follow, so I'm pleased to be making my grand internet debut with a splash. 

Diversity and Political Methodology: A Graduate Student’s Perspective

The evolution of our academic and work institutions to reflect the changing role of women (and women as mothers) is fascinating. My thoughts are currently on the role of perception. One point I wish I could have elaborated on was the fact that the majority opinion, frankly, doesn't matter in these situations. 

I think that's a hard pill for any of us to swallow. We like to think that (a) we are nice people who do not discriminate, and that (b) our opinions matter. To elaborate - when a minority (and I use minority because this applies to racial minorities as well) is "inadvertently" marginalized, for example, as the only person of that gender or color in a room, the sentiment is that unless someone is being overtly discriminatory, that it's the onus of the minority individual to "get over it." In more extreme cases, you can be accused of being hyper-sensitive - that's a favorite rebuttal for women. 

Here's a thought, people in power (which, yes, can sometimes be me) - maybe what YOU think of the minority in the situation isn't what matters. Maybe it's what the minority feels that matters more. Maybe it's the role of institutions and individuals in power to recognize that and reform so that maybe next time, that email from Mei Chen or Deepak Patel actually is answered?