Core Values for Data Science

Donald Trump’s surprise 2016 electoral victory forced me to reflect on the role of data in election prognostication, but more specifically on my own experiences helping organizations use data to make predictions and decisions. In doing so, I recognized two values critical to analytical success that were not only missing from many of the election forecasts, but have also been absent in the media hype around big data and advanced analytics. Both data scientists and decision makers who use data need to revisit humility and empathy as core values as they increasingly use data to understand, predict, and act.

Humility

When trying to answer a question using data, we make a series of decisions about collecting, analyzing, and interpreting the data will have an enormous effect on what the data tells us. Many of these are judgment calls around methodology, tool set, weighting, assumptions, and cost/quality tradeoffs.

For six years, I led analytics projects that measured the opinions and attitudes of a diverse range of people. Often, when I shared the findings with clients or friends, they’d be shocked at the results. “There’s no way that can be the case – nobody I know thinks that way,” they’d exclaim, and I’d answer back with a question:

“How many people do you know without a college degree?”

Like me, most of my friends and clients are professionals. Nearly all of them have a college degree, and most have some sort of advanced degree. But what’s also fascinating is that they know few, if any people who don’t have a college degree.

But less than a third of Americans have a bachelor’s degree or higher. Only one-third of those (1 in 9 Americans) has an advanced degree.

This is usually an eye-opening statistic. And I’ve found others with the same effect, often only using basic demographic data (for instance, people are notoriously poor at estimating the size of small populations).

I bring this up because many of the judgments we make in our analyses are informed by our own experiences, and our attempts to reconcile the limited data we have available with our previous understanding of the world.

Successful data scientists need the humility to recognize that our own lived experience is not representative of others’, and that the decisions we make when collecting, analyzing, and interpreting data will have an enormous effect on our ability to understand those experiences.

Empathy

Quantitative data gives you a large volume of information, but constrained to describe something very narrow. Consider this anecdote from a former client:

A customer from a large retail bank gets a call to answer questions about his service. He rates every driver question – things like wait time, problem resolution, and helpfulness-  a “5” – extremely satisfied . But he answered that his overall satisfaction level, and many of the other more emotional drivers, were “extremely dissatisfied.” The branch manager insisted that it was a mistake. How could every indicator be perfect and his overall score so low. So the consultant called the customer and asked, and the customer told him that while his transaction was fine, the branch manager was rude to his dog, his dog was important to him, and this made him feel less valued by the bank. He even described the branch manager specifically.

Not only did the quantitative data fail to tell the whole story, but the branch manager couldn’t step out of his own experience enough to understand that something else might be driving the customer’s reactions. An even funnier anecdote comes from the show Silicon Valley, when the founder of a tech startup gives his product to customers for the first time and sees through the glass of a focus group how enraged they are with a user interface he thought was perfect.

One argument in favor of empathy: while all of the pre-election quantitative forecasts were predicting a Hillary win, qualitative forecasts were showing all of the ways that voters’ interpretation of events differed from those of the quants.

If humility in the face of data involves the recognition that our own experiences are not representative and that our judgments in designing our analyses have consequences, then empathy is our ability to understand the lived experiences of others and how they inform what we find in our data, even if it differs from our own experiences.

Context: Putting These Values Into Play

Underlying the drive for empathy and humility is a need for greater context around the data that we’re collecting and analyzing.

One of my clients told me a story from early in his career where he was working for a casino helping them improve gaming revenue. He crunched all the numbers, ran focus groups and talked to guests, and all of the data he had available pointed to one obvious conclusion: the casino should put slot machines next to the gaming tables. Excited, he ran to tell his boss, confident in his analysis. Until his boss sent him to the floor of the casino and he realized that the slot machines were already next to the tables.

Context is our ability to understand the stories behind the data, that people are more than a data point, and that any summary statistic represents thousands of individual stories. Further, it reveals the real world constraints, opportunities, and tradeoffs behind any data-driven decision.

In the excitement around big data and advanced quantitative tools, many analysts are neglecting important capabilities to give them necessary context to be humble and empathetic in the face of their data. They neglect secondary research, qualitative insights, or even things as simple as personally experiencing the thing they plan to study. And in doing so, they make a series of minor analytical mistakes that ladder up into larger errors, and ultimately, poor decisions for their stakeholders.