The importance of domain expertise in data practice.

There’s a short yet wonderful story that perfectly encapsulates how many of today’s businesses use data. It’s a simple parable that all of us can learn from, regardless of our background, level of experience, or field of practice. In fact, if you’re already working with data, you might have a similar story to share. It goes like this:

A data scientist holds up a chart.
Everyone believes him.
End of story.

In today’s data-supercharged world, data is the law and the data practitioner is taken as the de facto expert. Ignore the fact that Ben just got hired last week—he has a MA in Statistics and a PhD in Machine Learning, so he must have all the answers, right?

(To clarify, said Ben is a hypothetical person. If you happen to know a Ben or are one, we apologize in advance. If you happen to be a woman, please don’t take our usage of a traditionally male name as a vote in favor of the patriarchy. This is purely for emphasis. We support all women, especially women in data. With everything cleared up, let’s get back to the matter at hand…)

Of course people will listen to the data guy. Numbers are compelling, especially when presented in chart form. Who are we mortals to question an interactive, multi-colored bubble chart? What power does one man hold over a regression line with an R-square well over 0.90?

In the modern boardroom, data is gospel truth. Everything else is mere conjecture.

We seldom stop to consider whether the data is flawed, or if the data guy understands the subject matter enough to draw insights or conclusions. Maybe the regression model is accurate, but what if it uses the wrong variables, or maps out the wrong features? What if the chart displays absolute figures in places where a logarithmic scale is more appropriate? What if the time series shows periods that are either too long or too short? What if the final analysis is inconsequential to the use case at hand?

When all is said and done, data can be just as flawed as the people who work with it.

Consider the infamous case of the NASA’s $125 million Mars Climate Orbiter. A simple conversion mishap—the failure to convert pound-force (lbf) to Newtons (N)—had the spacecraft flying within 37 miles of the Martian surface, dangerously below the 53-mile minimum. What followed was an epic fail of astronomic (no pun intended) proportions: Mars’ atmospheric friction burned the poor thing to a crisp before hurling its ashes deep into a cratery abyss.

Eyewitness reports allege the fire started with a contentious bar chart

Crash and burn—or, rather, burn then crash.

Mind you, this blunder happened with Lockheed Martin’s and NASA’s top brass, arguably the best domain experts in their respective fields, on the job. If even they can make mistakes like this, what makes us think we regular folk are exempt?

The next example is more down-to-earth… literally.

Applying domain knowledge could be as simple as choosing between a FIFO (first-in-first-out) and LIFO (last-in-first-out) approach, as detailed in this SuperDataScience podcast.

To explain FIFO and LIFO briefly: If element A arrives first, B second, and C last, FIFO dictates that they leave in that same order. LIFO is the complete opposite, wherein the last element, in this case C, leaves first, followed by B then A.

As you might already predict, the “right” choice varies greatly among industries.

For example, a business dealing in perishable goods like vegetables or fresh meat might prefer a FIFO approach, wherein an earlier element, say Monday’s shipment, is sent out before Tuesday’s or Wednesday’s. Conversely, a steel manufacturer may opt for convenience and use a LIFO approach wherein the steel bars at the top of the pile (i.e. the last ones in) get shipped out first. Caveat: we are experts in neither the perishable goods nor steel industries, so this is, again, purely for illustration purposes.

Yes, data skills can be applied to nearly every domain. However, we cannot discount the fact that data practitioners need domain expertise in order to truly be effective (or at least to avoid $125 million blunders). Data in retail can differ from data in healthcare or economics or agriculture or any other industry.

This is no different from other jobs. In the same manner we demand industry experience from management professionals and sub-specializations from doctors and engineers, we need to push for domain expertise and domain knowledge in the data practice.

What does this entail?

For the data practitioner, this means building years of experience and knowledge in a specific domain. Go deep rather than broad.

For the company looking to fill a data position, this means hiring a data practitioner with an industry background, or grooming one from the existing workforce (the second is an option we highly encourage).

For schools and institutions offering data courses, this means creating industry-specific courses and tracks, or encouraging students to pursue a minor in a field of interest.

Parting thoughts

Data does not exist in a vacuum and neither do data experts. To make data impactful, we need to encourage data practitioners to look beyond the spreadsheet and out into the real world.

Doing so might just save all of us from another epic crash and burn.

%d bloggers like this: