We’ve already discussed the hype surrounding data science (Are Data Scientists Real Scientists?) and dissected what makes a good data scientist (How To Tell If You’re A Data Scientist Worth Hiring).
Today, we’ll talk about a decision modern businesses need to make: hiring data scientists vs. data engineers.
While data science is clearly the clearly the sexier job, data engineering is more in demand with five times more job openings, as per Glassdoor. This may come as a surprise, especially since much of the fanfare surrounding data has to do with data scientists.
This article will compare and contrast data scientists and data engineers, and, hopefully, provide some insight on which one your company needs more.
Hot take: many businesses want data scientists when they clearly need data engineers.
Data Science vs. Data Engineering
Data science is the discipline of cleaning, organizing, and modeling (big) data. It makes use of advanced mathematical, statistical, and algorithmic techniques to extract insight and perform analysis on different data types including numbers, text, images, audio, and video.
In the last decade, we’ve seen data science skyrocket in popularity, as commercial use-cases for predictive analytics, machine learning, and statistical modeling have become more mainstream.
Data engineering, also called data architecture, is the practice of generating, storing, processing, and curating data. It involves a variety of programming languages to design databases, build data pipelines, manage data warehouses, and optimize data processing.
Like data science, we’ve seen data engineering rise in popularity in recent years as businesses have begun to handle more data in terms of volume, velocity, and variety.
Data Scientist vs. Data Engineer: Education
Data scientists often come from mathematical and quantitative backgrounds and data engineers, from programming backgrounds.
Popular courses among data scientists include statistics, mathematics, economics, and physics, although we’ve seen data scientists come from other courses like computer science and IT. However, it’s important to note that many data scientist jobs require post-graduate education in advanced mathematics, statistical modeling, machine learning, or other similar programs.
We may see these requirements change in the near future as colleges and universities continue to open data science specific degrees for undergraduates.
Data engineers, on the other hand, usually hold degrees in computer science, computer engineering, or IT. Moreover, most data engineering jobs require proficiency in multiple programming languages like Java, C#, SAP, Python, SQL, and NoSQL.
Companies hiring data engineers may also ask for professional certifications in Hadoop, Apache Spark, MapReduce, HIVE, and other similar systems.
Since data science and data engineering are both relatively new (well, they really aren’t but data-specific education is), many of today’s data scientists and data engineers are products of MOOCs and other short certificate programs.
Data Scientist vs. Data Engineer: Skills
In general, data scientists have stronger analytical skills, and data engineers have stronger programming skills.
The former use mathematics, statistics, advanced analytics, machine learning, and artificial intelligence to formulate hypotheses, run tests, analyze data, and translate and/or visualize the results.
The latter employ advanced programming, database architecture, and distributed systems to deliver data to data scientists, data analysts, and business users.
Data Scientist vs. Data Engineer: Job Function
A data scientist’s main job is to analyze and model data in order to make decisions, and a data engineer’s is to source, clean, manage, and deliver that data for analysis and modeling.
Quick aside: At CirroLytix, there’s something we call the Data Value Chain. The Data Value Chain is a framework that outlines the process by which raw data is transformed to data-driven decisions, not unlike other value chains, where, a raw material, say, a cow, is undergoes a series of steps that transform it into a finished product, i.e. a world-class dish.
The data engineers come in during the earlier stages of the data value chain. They’re in charge of turning raw data, which may contain human, machine, or instrument errors, into neatly arranged data that can be digested and understood by a data storage system.
The responsibility of a data engineer is to design and implement solutions that improve data reliability, quality, and efficiency. In other words, they are in charge of getting useable data to data scientists, data analysts, and business users.
In an operation, this includes retrieving data from various sources, merging multiple tables, deduplicating data entries, and storing data in the proper format (e.g. transforming images into numerical pixel codes or separating full names into first, middle, and last names).
Data scientists come in much later, after the data has passed initial cleaning and manipulation. While they might do some data cleaning and manipulation of their own, they are more concerned with exploring data to extract insights and model predictions using statistical techniques and machine learning algorithms.
Additionally, a data scientist may be asked to visualize data and present findings to various stakeholders for consideration and decision-making, a task that requires business domain expertise and knowledge of data visualization and data storytelling techniques.
As you might’ve guessed, data engineers and data scientists are both key to a cohesive data value chain, and their jobs sometimes overlap.
This infographics from DataCamp sum it up well (data engineer is on the left and data scientist, on the right):
Data Scientist vs. Data Engineer: Salary
Part of what makes data science so lucrative is the pay. Obviously, people (especially smart people) tend to flock towards high-powered, high-paying jobs.
However, when you look at the data, data engineers get the better end of the salary stick. Here’s a salary comparison from major listing sites:
- Data Engineer: $63K – $131K
- Data Scientist: $79K – $120K
- Data Engineer: $172K
- Data Scientist: $80K – $130K
- Data Engineer: $43K – $364K
- Data Scientist: $34K – $341K
Data Scientist vs. Data Engineer: One or both?
A data scientist will give you the insights and analysis you need to make sound, data-driven decisions, but without a data engineer to deliver and manage the data, a data scientist might not even have any data to work with in the first place.
Our view is that most mature businesses need a healthy mix of both in order to successfully manage a company’s data value chain. However, we suggest hiring data engineers first to build your data infrastructure and data scientists later on, once there is already data to model and analyze.
The data scientist vs. data engineer debate really depends on your company’s needs. If your immediate need is better data quality and delivery (which is the main data problem businesses face), then hiring a data engineer might make more sense. However, if your data warehouses and data pipelines work and you’re ready for advanced analytics, it may be the right time to invest in a data scientist.
That said, for most businesses, a data scientist is a luxury, not a necessity. Unless you’re doing advanced predictive modeling and machine learning, a couple data engineers and data analysts may do the trick.
Data science gets all the love, but, watch out, because data engineering may just be the new sexiest job in town.
Find data science and data engineering challenging? Contact us here and let us know how we can help.