The Case for Humanity in Data Science

This is a post I've been working on for some time, and is sparked by a lot of different undercurrents in data science. First is the "will algorithms replace us?" question. The next was our current talk of 'racist' algorithms. 

First, let's discuss how amazing this data science thing is. Data science is undoubtedly having a significant impact in all aspects of our lives and will continue to. At least, I hope so...I'm a data scientist. In order to continue this progress, we have to have a degree of trust in the system. We have to share our data, provide personal information, and have faith that the people and the artificial intelligence behind our constant technological advances will protect us. 

When I put it that way, being a data scientist sounds more like being a superhero. To paraphrase Uncle Ben, with big data comes big responsibility. 

When I frame data science this way, it's easy to see how I feel about the "will algorithms replace us" question. Short answer - no. Long answer -  for a job to be fully machine-replaceable, it has to fit the following criteria: it cannot make non-interfereable decisions that could have negative repercussions on a person. In other words, the decisions it makes cannot have potential to negatively affect a human being, even via 'butterfly effect.'

While that sounds easy enough, when we give more consideration to this stipulation, we are hard pressed to find cases in which this is true. One of the most well-publicized cases was by Pro Publica, where an algorithm predicted black criminals to be more likely to re-offend. In their language "There’s software used across the country to predict future criminals. And it’s biased against blacks." 

Similarly, other quotes from articles: 
"We’ve Hit Peak Human and an Algorithm Wants Your Job. Now What?" - Wall Street Journal
"Can Computers Be Racist? The Human-Like Bias Of Algorithms" - NPR
"It's no surprise that inequality in the U.S. is on the rise. But what you might not know is that math is partly to blame." - CNN Money

Political philosophy me (yes, that was my subfield) cringes at the language. It's a very subtle shift of responsibility called moral outsourcing. The subject of my talk this Thursday at the Women Catalyst group, moral outsourcing is the shifting of moral decisionmaking to another entity (in this case, algorithms).

The humanizing language in the sentences above have the convenient outcome of shifting blame from humans. Algorithms and artificial intelligence is only as unbiased as the human behind it. Data scientists have internalized the mantra borrowed from our engineering halves- GIGO (garbage in, garbage out). But the comfortable and convenient thing about code is that we know when it doesn't work - we error out, the thing we've programmed doesn't happen. 

Algorithms also suffer from GIGO, but the 'garbage out' part is significantly more difficult to discern, and sometimes cannot be understood until post-deployment if we don't know what to look for.

This leads me to the second part of this post. The data science community is wonderful in it's desire to do good. In fact, most of these 'biased' algorithms are the product of well-intentioned data science teams. 

When I started Metis for Good, our internal pro-bono group, we our first project was with Invisible Institute, the group behind the Citizens Police Data Project. Behind CPDP is public domain complaint data against police officers - complete with badge numbers and names.

One of our first considerations as a group was to think through how to use our data in modeling. We decided immediately not to build any sort of classification or predictive model on officer-level data. In data science-speak, we're not willing to have any model error. Let's say we had a model to 'predict' violent officers. I'm not willing to have our model misclassify a single officer. Instead, a team member suggested we move a level up in our unit of analysis - rather than officer-level, we use precinct-level predictions. 

We're hosting a hackathon this Saturday. We'll be building on the great work Invisible Institute has already done, and contributing some of our own. I'm proud of our strong ethical data consideration that goes into constructing and developing our data science at Metis. All data scientists know that bias is nearly unavoidable. What is not unavoidable is our moral responsibility to implement our models ethically.