Are Data Scientists Real Scientists?



Data scientist is The Sexiest Job of the 21st Century, said Harvard Business Review in 2012.

Data Scientist vs Data Engineer Google Trends
Google Search Trends for Data Scientist vs. Data Engineer
Borrowed from the Analytics Association of the Philippines (AAP)

Seven years later, and we haven’t quite worked it out: Are data scientists real scientists? What do they really do? Is the work really as sexy as it sounds?

Much like Fatal Attraction, we begin our data science journey with doe-eyed infatuation and all kinds of butterflies—until we realize that data science is a psycho bitch who kidnaps our kid, boils our pet bunny (fluffy fur and all), and hacks us bloody with a blunt kitchen knife; we try to drown her in a tub but she…Just. Won’t. Die.

But I’m getting way ahead of myself.

Basically, the point I’m trying to make is that everyone wants to be a data scientist (or hire one) without fully understanding what data science is and what we really want out of it.

I think Dan Ariely, a psychology and behavioral economics professor at Duke University, hits the nail on the head:

“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…”

Part of the reason could be that data science is still in its confused adolescence. The field has only been around since 2008, and it has a ways to go in terms of defining and organizing itself.

It certainly doesn’t help that majority of data projects fail (see: 85-Percent of Big Data Projects Fail) and companies don’t know how to utilize data science talent (see: Why Data Scientists Are Leaving Their Jobs).

There are two issues I want to address: first, that data science has more to do with algorithms and statistical techniques than formal scientific work; and second, that most data scientist jobs are less sexy that people imagine.

Science, schmience

Neil Degrasse Tyson and Bill Nye Do You Even Science Meme
Do you even??

The dividing line between data science and real science is research methods—if you design experiments, and prove and propose formal hypotheses, your work is closer to a scientific role. Even generalizing conclusions from empirical data using algorithms can qualify as science when used  to augment research.

However, let’s not confuse that with people who draw pretty charts and run Python or R scripts for a living, without any research involved.

This isn’t to be exclusionary or pedantic about “real scientists” is, but the abuse of the term “data scientist” for recruitment, staffing, or marketing purposes does place a stain on the practice.

Let’s get real and call a spade, a spade.

Majority of a data scientist’s working hours are spent on arguably the least scientific parts of the job: cleaning and organizing data (60%) and collecting data sets (19%).

Incidentally, these are also the least enjoyable parts of the job.

What data scientists spend the most time doing infographic
Grab a magnifying glass to see the “science” part
Borrowed from the Analytics Association of the Philippines (AAP)
What is the least enjoyable part of data science infographic
Real data science isn’t as sexy as it sounds
Borrowed from the Analytics Association of the Philippines (AAP)

The sexy “science” stuff comes much, much later.

More marriage than sex

The idea of data science is sexy—especially when we hear stories about the Ubers, Netflixes, and Amazons of the world. But real data science work is a lot less so.

When data scientists take on jobs and companies hire data scientists, the expectation is to produce mind-blowing insight from vast amounts of data (read: Big Data). Truth be told, the road to data-driven disruption is not as straightforward as many think.

It takes a lot of work to get data that works.

Before a data scientist can even arrive at said mind-blowing insight, they need to make it through the mundane: hours and hours of ensuring the data sets are available and prepped for analysis, scripts are thoroughly debugged, and libraries are compatible—not to mention the professional Googling and GitHubbing involved before we can find the right algorithms.

If you recall the charts above, data scientists are actually spending most of their time on the least enjoyable parts of the job (see comparison below).

Tasks data scientists spend the most time on vs Least enjoyable tasks
Oh, so it’s just like a real job…*sad*

Anyone looking to pursue data science or employ a data scientists ought to consider the work, especially the mundane mind-numbing aspect of it, before leaping. If you’re not crazy about data, data science will drive you crazy.

The Future of Data Science

This isn’t to discourage aspiring “data scientists”—quite the contrary. As the global ocean of data expands, we need more data experts with the skills and knowledge to navigate it.

That said, we still have a ways to go in terms of defining the whos, hows, and whats of data work. In the process of figuring it all out, we should also avoid sugar-coating data job titles lest we get disappointed (this goes for both employers and potential employees).

A decade from now, we may well witness the extinction of the data scientist. Data jobs will only get increasingly specific, and catchall data scientists (the ones who juggle 8 different programming languages and 3 different job functions) may no longer be enough to meet the narrower and deeper demands of future data work. This is the same reason we rarely see job openings for “computer expert” or “business manager” anymore.

As data continues to stretch the limits of our imagination, we can’t possibly expect a handful of people to handle all of it.

Not even real scientists do that.


If you’re less concerned about job titles and more concerned about real, applicable skills, why not book a Business Analytics Masterclass. It’s not exactly science, but it works!