Data Scientist vs. Data Engineer: Who should you hire?

We’ve already discussed the hype surrounding data science (Are Data Scientists Real Scientists?) and dissected what makes a good data scientist (How To Tell If You’re A Data Scientist Worth Hiring).

Today, we’ll talk about a decision modern businesses need to make: hiring data scientists vs. data engineers.

While data science is clearly the clearly the sexier job, data engineering is more in demand with five times more job openings, as per Glassdoor. This may come as a surprise, especially since much of the fanfare surrounding data has to do with data scientists.

Ooh, big data!

This article will compare and contrast data scientists and data engineers, and, hopefully, provide some insight on which one your company needs more.

Hot take: many businesses want data scientists when they clearly need data engineers.

Data Science vs. Data Engineering

Data science is the discipline of cleaning, organizing, and modeling (big) data. It makes use of advanced mathematical, statistical, and algorithmic techniques to extract insight and perform analysis on different data types including numbers, text, images, audio, and video.

In the last decade, we’ve seen data science skyrocket in popularity, as commercial use-cases for predictive analytics, machine learning, and statistical modeling have become more mainstream.

Data engineering, also called data architecture, is the practice of generating, storing, processing, and curating data. It involves a variety of programming languages to design databases, build data pipelines, manage data warehouses, and optimize data processing.

Like data science, we’ve seen data engineering rise in popularity in recent years as businesses have begun to handle more data in terms of volume, velocity, and variety.

Data Scientist vs. Data Engineer: Education

Data scientists often come from mathematical and quantitative backgrounds and data engineers, from programming backgrounds.

Popular courses among data scientists include statistics, mathematics, economics, and physics, although we’ve seen data scientists come from other courses like computer science and IT. However, it’s important to note that many data scientist jobs require post-graduate education in advanced mathematics, statistical modeling, machine learning, or other similar programs.

We may see these requirements change in the near future as colleges and universities continue to open data science specific degrees for undergraduates.

Data engineers, on the other hand, usually hold degrees in computer science, computer engineering, or IT. Moreover, most data engineering jobs require proficiency in multiple programming languages like Java, C#, SAP, Python, SQL, and NoSQL.

Companies hiring data engineers may also ask for professional certifications in Hadoop, Apache Spark, MapReduce, HIVE, and other similar systems.

Data scientist vs data engineer tools and software
Image from DataCamp

Since data science and data engineering are both relatively new (well, they really aren’t but data-specific education is), many of today’s data scientists and data engineers are products of MOOCs and other short certificate programs.

Data Scientist vs. Data Engineer: Skills

In general, data scientists have stronger analytical skills, and data engineers have stronger programming skills.

The former use mathematics, statistics, advanced analytics, machine learning, and artificial intelligence to formulate hypotheses, run tests, analyze data, and translate and/or visualize the results.

The latter employ advanced programming, database architecture, and distributed systems to deliver data to data scientists, data analysts, and business users.

Data Scientist vs. Data Engineer: Job Function

A data scientist’s main job is to analyze and model data in order to make decisions, and a data engineer’s is to source, clean, manage, and deliver that data for analysis and modeling.

Quick aside: At CirroLytix, there’s something we call the Data Value Chain. The Data Value Chain is a framework that outlines the process by which raw data is transformed to data-driven decisions, not unlike other value chains, where, a raw material, say, a cow, is undergoes a series of steps that transform it into a finished product, i.e. a world-class dish.

The Data Value Chain by CirroLytix
The Data Value Chain by CirroLytix

The data engineers come in during the earlier stages of the data value chain. They’re in charge of turning raw data, which may contain human, machine, or instrument errors, into neatly arranged data that can be digested and understood by a data storage system.

The responsibility of a data engineer is to design and implement solutions that improve data reliability, quality, and efficiency. In other words, they are in charge of getting useable data to data scientists, data analysts, and business users.

In an operation, this includes retrieving data from various sources, merging multiple tables, deduplicating data entries, and storing data in the proper format (e.g. transforming images into numerical pixel codes or separating full names into first, middle, and last names).

Data scientists come in much later, after the data has passed initial cleaning and manipulation. While they might do some data cleaning and manipulation of their own, they are more concerned with exploring data to extract insights and model predictions using statistical techniques and machine learning algorithms.

Additionally, a data scientist may be asked to visualize data and present findings to various stakeholders for consideration and decision-making, a task that requires business domain expertise and knowledge of data visualization and data storytelling techniques.

As you might’ve guessed, data engineers and data scientists are both key to a cohesive data value chain, and their jobs sometimes overlap.

This infographics from DataCamp sum it up well (data engineer is on the left and data scientist, on the right):

Job responsibilities: data scientist vs. data engineer
Image from DataCamp

Data Scientist vs. Data Engineer: Salary

Part of what makes data science so lucrative is the pay. Obviously, people (especially smart people) tend to flock towards high-powered, high-paying jobs.

However, when you look at the data, data engineers get the better end of the salary stick. Here’s a salary comparison from major listing sites:


  • Data Engineer: $63K – $131K
  • Data Scientist: $79K – $120K


  • Data Engineer: $172K
  • Data Scientist: $80K – $130K


  • Data Engineer: $43K – $364K
  • Data Scientist: $34K – $341K

Data Scientist vs. Data Engineer: One or both?

A data scientist will give you the insights and analysis you need to make sound, data-driven decisions, but without a data engineer to deliver and manage the data, a data scientist might not even have any data to work with in the first place.

Our view is that most mature businesses need a healthy mix of both in order to successfully manage a company’s data value chain. However, we suggest hiring data engineers first to build your data infrastructure and data scientists later on, once there is already data to model and analyze.


The data scientist vs. data engineer debate really depends on your company’s needs. If your immediate need is better data quality and delivery (which is the main data problem businesses face), then hiring a data engineer might make more sense. However, if your data warehouses and data pipelines work and you’re ready for advanced analytics, it may be the right time to invest in a data scientist.

That said, for most businesses, a data scientist is a luxury, not a necessity. Unless you’re doing advanced predictive modeling and machine learning, a couple data engineers and data analysts may do the trick.

Data science gets all the love, but, watch out, because data engineering may just be the new sexiest job in town.

Find data science and data engineering challenging? Contact us here and let us know how we can help.

Are Data Scientists Real Scientists?



Data scientist is The Sexiest Job of the 21st Century, said Harvard Business Review in 2012.

Data Scientist vs Data Engineer Google Trends
Google Search Trends for Data Scientist vs. Data Engineer
Borrowed from the Analytics Association of the Philippines (AAP)

Seven years later, and we haven’t quite worked it out: Are data scientists real scientists? What do they really do? Is the work really as sexy as it sounds?

Much like Fatal Attraction, we begin our data science journey with doe-eyed infatuation and all kinds of butterflies—until we realize that data science is a psycho bitch who kidnaps our kid, boils our pet bunny (fluffy fur and all), and hacks us bloody with a blunt kitchen knife; we try to drown her in a tub but she…Just. Won’t. Die.

But I’m getting way ahead of myself.

Basically, the point I’m trying to make is that everyone wants to be a data scientist (or hire one) without fully understanding what data science is and what we really want out of it.

I think Dan Ariely, a psychology and behavioral economics professor at Duke University, hits the nail on the head:

“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…”

Part of the reason could be that data science is still in its confused adolescence. The field has only been around since 2008, and it has a ways to go in terms of defining and organizing itself.

It certainly doesn’t help that majority of data projects fail (see: 85-Percent of Big Data Projects Fail) and companies don’t know how to utilize data science talent (see: Why Data Scientists Are Leaving Their Jobs).

There are two issues I want to address: first, that data science has more to do with algorithms and statistical techniques than formal scientific work; and second, that most data scientist jobs are less sexy that people imagine.

Science, schmience

Neil Degrasse Tyson and Bill Nye Do You Even Science Meme
Do you even??

The dividing line between data science and real science is research methods—if you design experiments, and prove and propose formal hypotheses, your work is closer to a scientific role. Even generalizing conclusions from empirical data using algorithms can qualify as science when used  to augment research.

However, let’s not confuse that with people who draw pretty charts and run Python or R scripts for a living, without any research involved.

This isn’t to be exclusionary or pedantic about “real scientists” is, but the abuse of the term “data scientist” for recruitment, staffing, or marketing purposes does place a stain on the practice.

Let’s get real and call a spade, a spade.

Majority of a data scientist’s working hours are spent on arguably the least scientific parts of the job: cleaning and organizing data (60%) and collecting data sets (19%).

Incidentally, these are also the least enjoyable parts of the job.

What data scientists spend the most time doing infographic
Grab a magnifying glass to see the “science” part
Borrowed from the Analytics Association of the Philippines (AAP)
What is the least enjoyable part of data science infographic
Real data science isn’t as sexy as it sounds
Borrowed from the Analytics Association of the Philippines (AAP)

The sexy “science” stuff comes much, much later.

More marriage than sex

The idea of data science is sexy—especially when we hear stories about the Ubers, Netflixes, and Amazons of the world. But real data science work is a lot less so.

When data scientists take on jobs and companies hire data scientists, the expectation is to produce mind-blowing insight from vast amounts of data (read: Big Data). Truth be told, the road to data-driven disruption is not as straightforward as many think.

It takes a lot of work to get data that works.

Before a data scientist can even arrive at said mind-blowing insight, they need to make it through the mundane: hours and hours of ensuring the data sets are available and prepped for analysis, scripts are thoroughly debugged, and libraries are compatible—not to mention the professional Googling and GitHubbing involved before we can find the right algorithms.

If you recall the charts above, data scientists are actually spending most of their time on the least enjoyable parts of the job (see comparison below).

Tasks data scientists spend the most time on vs Least enjoyable tasks
Oh, so it’s just like a real job…*sad*

Anyone looking to pursue data science or employ a data scientists ought to consider the work, especially the mundane mind-numbing aspect of it, before leaping. If you’re not crazy about data, data science will drive you crazy.

The Future of Data Science

This isn’t to discourage aspiring “data scientists”—quite the contrary. As the global ocean of data expands, we need more data experts with the skills and knowledge to navigate it.

That said, we still have a ways to go in terms of defining the whos, hows, and whats of data work. In the process of figuring it all out, we should also avoid sugar-coating data job titles lest we get disappointed (this goes for both employers and potential employees).

A decade from now, we may well witness the extinction of the data scientist. Data jobs will only get increasingly specific, and catchall data scientists (the ones who juggle 8 different programming languages and 3 different job functions) may no longer be enough to meet the narrower and deeper demands of future data work. This is the same reason we rarely see job openings for “computer expert” or “business manager” anymore.

As data continues to stretch the limits of our imagination, we can’t possibly expect a handful of people to handle all of it.

Not even real scientists do that.


If you’re less concerned about job titles and more concerned about real, applicable skills, why not book a Business Analytics Masterclass. It’s not exactly science, but it works!

Predictive Lead Scoring 101

A Marketer’s Guide by CirroLytix

Not all customers are created equal. Some are practically begging to be sold to, while others won’t give you the time of day even if their life depended on it.

Any seasoned marketer will agree.

In this regard, it’s absolutely important to prioritize—to focus fire on leads likely to enjoy your product and save yourself the trouble of convincing everyone else. Marketers call this lead scoring.

Lead scoring is the practice of ranking leads according to their likelihood to perform a favorable (or unfavorable) action, usually purchasing a product.

Marketers use it for setting priorities (read: budgeting).

Whatever the size of your budget, it’d be borderline criminal to waste any amount on low-priority leads when there are better alternatives available.

This article will walk you through the ins and outs of lead scoring and provide a clear instructional reference for practicing it yourself.

Traditional vs Predictive Lead Scoring

There are two common methods to lead scoring, each with its own strengths and weaknesses.

Traditional lead scoring is a manual effort done by marketers to sort their leads into two categories: those who qualify, and those who don’t. Qualified leads are those worth chasing—they get the attention, the slice of the budget, and with luck, the place among your closed sales.

They do this by identifying key factors that correlate with an inclination (or disinclination) to convert, and deriving scores based on the weights of those factors.

Traditional Lead Scoring Sample Table
Borrowed from SalesForce

Example: BiroLytix

BiroLytix is a training service dedicated to helping stand-up comedians hone their craft.

They have a compilation of leads taken from various sources, and fine data on each of those leads such as their sex, age, professional background, and past training experience.

As per their lead scoring model, they assign people between the ages of 20 and 40 an extra +15 points, they give people in creative industries +10 points, and deduct 500 points from anyone working in data analytics.

Lead scoring is a matter of identifying your most and least promising leads, so it stands to reason that  the more important a value, the higher the bonus or penalty that comes along with it.

Predictive lead scoring is functionally the same as traditional lead scoring, with one major difference: while it’d be a marketer’s job to calculate for scores in the traditional method, predictive scoring relies on algorithms to do the heavy lifting.

Good predictive lead scoring algorithms make use of statistical techniques like logistic regression and multivariate regression to rank leads. In English, these model likelihood by assigning weights to each factor depending on importance. Some advanced lead algorithms use machine learning to “learn” and improve as more leads pour in.

Predictive Lead Scoring on MS Excel

Building a Leads Database

You can’t perform lead scoring without data. How much data depends on your appetite for precision: on one hand, total precision is near impossible; on the other hand, a faulty model can confuse good leads and bad leads.

We suggest taking as deep of a dive as possible into your data sources, and gathering as much useful information as you can without getting creepy.

Start by doing an inventory of existing customer databases—your website, CRM, social media pages, etc. This will allow you to do some preliminary data analysis and clue you in on how much more data you need to collect.

This task is definitely easier than it sounds (for most cases). A simple Contact Us form, for example, can be the foundation for a customer database. That said, we suggest you squeeze content gates for all they’re worth by asking for specifics like age, sex, birthday, and occupation.

Once you are able to correlate that personal data with behavioral data such as on-page behavior, social media activity, and previous purchases, you’ll be well on your way to building your own lead scoring machine.

It bears mentioning at this point that lead scoring isn’t a solution for everyone. Naturally, you’d have to be managing a large number of leads for this to make sense. Moreover, predictive lead scoring is only possible with a large enough data set to feed into an algorithm –if you’ve got data but need to process it with intuition and observation, stick to traditional.

How Lead Scoring Ends in Sales

Lead scoring would be pointless if it didn’t contribute to greater ROI on marketing efforts. Needless to say, it does—by over 70% according to a report by Marketing Sherpa.

There are two likely reasons for this: a quicker response time to urgent leads, and a quicker time with trend-spotting.

Urgent leads who are closer to the purchase stage have done their homework on your solution, which means they’ve probably considered your competitors as well. Time is of the essence, so knowing which of your leads are worth your time can go a long way.

With big, shiny numbers, lead scoring points you to late-stage customers who are ripe for the attention, and who are at the biggest risk of falling into the clutches of a rival business.

Conversion Funnel by Moz
Conversion Funnel
Image borrowed from Moz

Trend-spotting is also easier with lead scoring, because the mere process alone of refining a winning formula will lead you to great insight.

As mentioned, lead scoring depends on your ability to spot factors that influence the likelihood of closing a lead. Getting into the habit of identifying signs of a closer and weighing them against each other is invaluable practice in mindfulness when it comes to marketing.

In practice, lead scoring makes life significantly easier for any given marketing team. Targeted CTAs, emails, and alerts to schedule call can be set to fire for your most promising leads. All it takes is to determine a threshold, and program your automation software accordingly.

Neil Patel has an article covering a handful of automation techniques that align perfectly with a lead scoring strategy, and they’re worth looking into for inspiration as you write your own scoring playbook.

The cardinal rule for lead scoring is to follow the data. It can and will be tempting to abandon the method in favor of good, old fashioned intuition, but it pays to remember that numbers never lie. It may take some trial and error, but the persistence pays off when you’re closing leads like never before.

Lead scoring is a powerful marketing technique that helps you turn a lead list into a conversion machine. It’s one of many tools worth learning, and we’ll be covering plenty more in our upcoming Business Analytics Masterclass series. If you like what you see so far, you’ll love what we keep in the back.

Enroll in a class to reserve your spot.

5 Cybersecurity Tips For The Average Human

Not everyone can outsmart a hacker, but most of us can certainly out-dumb the average guy. It’s statistics, duh.

Avoiding a hacker is like The Running of the Bulls—keeping safe is, for the most part, not being the slowest schmuck in the game. If there are a handful of less secure people around you, chances are the hacker will go for them instead of you.

The Running of the Bulls in Pamplona, Spain

On a more serious note, cybersecurity is among today’s most pressing concerns, especially as everything—including our most sensitive information (we’re looking at you, Facebook)—moves to the digital realm.

The FBI recently reported that ransomware attacks number 4,000 each day, and a study by cybersecurity firm PandaLabs estimated over 230,000 samples of malware produced in each day of 2015 (their prognosis said the number should be much larger today).

The situation being what it is, it’s about time most of us developed the skills needed to navigate life online. Here are six handy tips to make sure you don’t get hacked. Think of these as your very own digital rape whistle.

1. Know that everyone is a target

The very first step to being cyber-secure is the awareness that it can happen to anyone. Even the Amish have hackers among their ranks! But I digress.

Truth be told, most people won’t even bother with the necessary precautions unless they’ve already been compromised or someone close to them has. I implore everyone to be just a tad more paranoid, especially as today’s hackers try to penetrate anything and everything—smartphones, webcams, ATMs, oven toasters, nuclear power plants, you name it.

2. Your password is your life

Most men won’t even share their passwords to their wives and girlfriends. It baffles me that many of them don’t exercise the same vigilance against hackers, who, frankly, can do much, much worse. Remember Ashley Madison?

Three things to remember:

  • Don’t use an easy password—especially not “password”
  • Change your passwords regularly—as often as you change sheets
  • Never use the same password repeatedly—keep threats contained
Laptop password is incorrect Steve Carell
Borrowed from Windows Stuff

3. Watch your Wi-Fi

Treat public Wi-Fi the way you would public toilets–only go if you absolutely have to and use everything you can to stay clean.

Hackers love public Wi-Fi because, one, there’s a lot of information to steal, and, two, public networks have vulnerabilities they can exploit. Once a hacker is able to penetrate a public network, they can intercept the information flowing in and out of it, including browser activity, personal data, and passwords. This is what’s called a “Man in the Middle” or  MITM approach.

The next level to a MITM attack is rogue Wi-Fi, also called an Evil Twin.

Hackers often set up rogue Wi-Fi in public places to steal information. They usually name the access point similar (e.g. Mall Wi-Fi Free) to the legitimate hotspot (hence, evil twin). Once you connect to a hacker’s network, they own you—your apps, emails, passwords, everything. They can even send out all kinds of filth from your device.

If you absolutely need to go online, at least have a VPN installed. A VPN is basically a service that encrypts your data (i.e. turns your data into jumbled up stuff that hackers can’t read) before it even reaches a Wi-Fi network or access point.

4. SSL or bust

If you continue to access a website that says “Not Secure”, you’re pretty much begging for it. An SSL (or Secure Sockets Layer) is a security technology used to encrypt data entered in browsers before it is sent out to servers, like a secure passageway to make sure the hackers can’t get to you. Sites with SSLs usually have an SSL certificate to prove that they have SSL installed.

Image result for website not secure
What’s the worst that could happen?

Not much thinking involved here. Most modern browsers will alert you if a website isn’t secure, so what you need to do is really simple: don’t enter any information, especially not sensitive information, in a site without an SSL Certificate.

5. Share at your own risk

As much as we try to protect ourselves, hackers will always find new ways to exploit our data. The most important thing we need to acknowledge is that anything we put online is at risk (of course, to varying degrees).

Always be vigilant that your data can be stolen, and keep your eye out for any kind of harm that might come your way. You may not be an expert, but as long as you’re alert and aware, you’ve already solved 90-percent of the problem.

Remember, you don’t have to be a genius to keep yourself cyber-secure. You just need to be just a bit more difficult to hack than the average Internet user.

Data Scientist or Know-It-All?

The importance of domain expertise in data practice.

There’s a short yet wonderful story that perfectly encapsulates how many of today’s businesses use data. It’s a simple parable that all of us can learn from, regardless of our background, level of experience, or field of practice. In fact, if you’re already working with data, you might have a similar story to share. It goes like this:

A data scientist holds up a chart.
Everyone believes him.
End of story.

In today’s data-supercharged world, data is the law and the data practitioner is taken as the de facto expert. Ignore the fact that Ben just got hired last week—he has a MA in Statistics and a PhD in Machine Learning, so he must have all the answers, right?

(To clarify, said Ben is a hypothetical person. If you happen to know a Ben or are one, we apologize in advance. If you happen to be a woman, please don’t take our usage of a traditionally male name as a vote in favor of the patriarchy. This is purely for emphasis. We support all women, especially women in data. With everything cleared up, let’s get back to the matter at hand…)

Of course people will listen to the data guy. Numbers are compelling, especially when presented in chart form. Who are we mortals to question an interactive, multi-colored bubble chart? What power does one man hold over a regression line with an R-square well over 0.90?

In the modern boardroom, data is gospel truth. Everything else is mere conjecture.

We seldom stop to consider whether the data is flawed, or if the data guy understands the subject matter enough to draw insights or conclusions. Maybe the regression model is accurate, but what if it uses the wrong variables, or maps out the wrong features? What if the chart displays absolute figures in places where a logarithmic scale is more appropriate? What if the time series shows periods that are either too long or too short? What if the final analysis is inconsequential to the use case at hand?

When all is said and done, data can be just as flawed as the people who work with it.

Consider the infamous case of the NASA’s $125 million Mars Climate Orbiter. A simple conversion mishap—the failure to convert pound-force (lbf) to Newtons (N)—had the spacecraft flying within 37 miles of the Martian surface, dangerously below the 53-mile minimum. What followed was an epic fail of astronomic (no pun intended) proportions: Mars’ atmospheric friction burned the poor thing to a crisp before hurling its ashes deep into a cratery abyss.

Eyewitness reports allege the fire started with a contentious bar chart

Crash and burn—or, rather, burn then crash.

Mind you, this blunder happened with Lockheed Martin’s and NASA’s top brass, arguably the best domain experts in their respective fields, on the job. If even they can make mistakes like this, what makes us think we regular folk are exempt?

The next example is more down-to-earth… literally.

Applying domain knowledge could be as simple as choosing between a FIFO (first-in-first-out) and LIFO (last-in-first-out) approach, as detailed in this SuperDataScience podcast.

To explain FIFO and LIFO briefly: If element A arrives first, B second, and C last, FIFO dictates that they leave in that same order. LIFO is the complete opposite, wherein the last element, in this case C, leaves first, followed by B then A.

As you might already predict, the “right” choice varies greatly among industries.

For example, a business dealing in perishable goods like vegetables or fresh meat might prefer a FIFO approach, wherein an earlier element, say Monday’s shipment, is sent out before Tuesday’s or Wednesday’s. Conversely, a steel manufacturer may opt for convenience and use a LIFO approach wherein the steel bars at the top of the pile (i.e. the last ones in) get shipped out first. Caveat: we are experts in neither the perishable goods nor steel industries, so this is, again, purely for illustration purposes.

Yes, data skills can be applied to nearly every domain. However, we cannot discount the fact that data practitioners need domain expertise in order to truly be effective (or at least to avoid $125 million blunders). Data in retail can differ from data in healthcare or economics or agriculture or any other industry.

This is no different from other jobs. In the same manner we demand industry experience from management professionals and sub-specializations from doctors and engineers, we need to push for domain expertise and domain knowledge in the data practice.

What does this entail?

For the data practitioner, this means building years of experience and knowledge in a specific domain. Go deep rather than broad.

For the company looking to fill a data position, this means hiring a data practitioner with an industry background, or grooming one from the existing workforce (the second is an option we highly encourage).

For schools and institutions offering data courses, this means creating industry-specific courses and tracks, or encouraging students to pursue a minor in a field of interest.

Parting thoughts

Data does not exist in a vacuum and neither do data experts. To make data impactful, we need to encourage data practitioners to look beyond the spreadsheet and out into the real world.

Doing so might just save all of us from another epic crash and burn.

Customer Segmentation 101

Customer Segmentation 101: A Marketer’s Guide by CirroLytix

The secret to effective marketing is to know your audience inside and out. When you know what your market base wants, needs, and fears, capturing them through targeted marketing is simple.

This guide will teach the fundamentals of customer segmentation: how to sort your prospects into categories that make it possible to run targeted marketing campaigns based on shared similarities like age, purchase history, or even price sensitivity.

What is Customer Segmentation?

Before we take a deep dive into the how-to’s, we’ve got to get on the same page regarding the what is—specifically, what is customer segmentation?

As we’ve mentioned in a previous article, customer segmentation is the process of sorting leads and existing customers into useful categories.

It’s a subset of the wider process of segmentation. You can segment a lead list, and you can even segment your whole market base.

For the purposes of this article, we’ll talk about segmenting customers: the set of people who’ve already made a decision to spend on your business.

We’ll be discussing lead and market segmentation during our Business Analytics Masterclass series later this year. If you’re interested in our full playbook, you can download our primer or sign up here.

Finally, customer segmentation works for businesses of all kinds. It doesn’t matter if you offer products or services, and it works just as well for B2B marketing as it does for B2C.

Data Sources

Customer segmentation is an exercise in data analytics, meaning you’ll need to peek into the numbers behind your sales activity and customer acquisition. For basic segmentation, you won’t need a sprawling data infrastructure; basic information like age, gender, or home industry should do the trick.

One premise whenever a business wants to perform customer segmentation is that they have existing pools of data from which they can draw.

If you’re a business operating in the 21st Century, you’re more than likely to be sitting on some measure of data about your past customers, and the leads you’re currently chasing.

For example, if you’re into B2C, then you probably have data on customers including their age, sex, location, place or industry of work, and a list of their past purchases. In this case, a point of sale (POS) system or customer-relationship management (CRM) software is always a good place to start.

If you’re a B2B firm, then you probably store data on your customers’ locations, industries, purchasing power, and marketing cycles (i.e. the amount of time from capturing a lead to conversion). So long as this data is digital, you can already do basic segmentation.

Naturally, more data lends itself to better insight—-but don’t let your current situation dissuade you from exploring analytics. Learn enough, and you should already be able to make a stellar case for why your outfit should invest in data down the line.


The end product of segmentation is to create a reliable set of marketing personas: profiles of imaginary people who perfectly represent the different segments you’ll be targeting.

Personas aggregate the characteristics of your segments that matter most when it comes to actual marketing. These are things like what they need out of your product or service, what marketing tactics get their attention, and what habits they’ve expressed in their purchases.

Now, there’s a tendency among marketers to base the personas they use on gut feel and surface memory. This is a good start, but far from optimal.

Data-driven marketing expects a more precise method behind persona generation. At the very least, you should have some idea of what sex or age group lends the most to your market share, and a sense of what geographic factors work in your favor (ex. Your best customers come from X, or Your strongest location is at Y.)

You can start your segmentation efforts by assessing the personas you already use, or start with your goals and strategies, building personas as you perform segmentation.


CirroKnittix is a business that sells custom sweaters. At the broadest level, their market is, “Anyone who would buy a custom sweater.” Today, they want to take a data-driven approach and make smarter marketing decisions.

In their 4 years of doing business, they’ve noticed that their top 3 most frequent customers are the following:

  1. People who frequently travel to countries with cooler weather,
  2. People who work office jobs with a casual dress code policy, and
  3. People who put in orders for novelty sweaters to be given out as gifts.

CirroKnittix examines their data and discovers the following personas:

Persona A, a.k.a. The Traveler is a young man with a fortune in frequent flier miles. He shops at, or right before, peak travel season (May-June) and prefers to shop online. He spends an average PHP 4,000.00 per order, and is best attracted using print ads.

Persona B, a.k.a. The Employee is a young to middle-aged woman who believes in expressing herself through her fashion choices. She shops at various points throughout the year, and prefers to shop in-store. She spends an average PHP 2,000.00 per order, and is best attracted using Instagram ads.

Persona C, a.k.a. The Joker is a young man who enjoys giving gifts that leave an impression. He shops at various points throughout the year (mostly during the holidays) and prefers to shop online. He spends an average PHP 8,000.00 per order, and is best attracted using Facebook ads.

Segmentation Characteristics

There are four types of characteristics used in profiling customers:

  1. Demographic – Who your customers are
  2. Psychographic – How your customers think
  3. Behavioral – What your customers do (and their habits)
  4. Environmental – Where your customers are
Segmentation characteristics

These characteristics vary depending on whether you’re selling to a B2B or B2C market.

At a glance, you can see how these characteristics can be useful in marketing. They give you an idea of your clientele’s needs, habits, purchasing power, and preferences.

When segmenting your customers, these characteristics will form the variables that you’ll be examining.

In the next few paragraphs, we’ll discuss two popular segmentation methods. Both of these are great starting points for marketers looking to use data for better targeting.

RFM Segmentation

One of the most straightforward ways to segment customers is to study their purchasing patterns. RFM segmentation looks at purchases according to their Recency, Frequency, and Monetary value.

Customers who purchase more recently, frequently, and with larger monetary value are usually more valuable, while those who do the opposite are usually less valuable to a brand. There are also dozens of others types in between.

Let’s use the CirroKnittix example again.

If we were to rank Personas A, B, and C according to RFM segmentation, it would look like this for the month of May, 3 being the highest score and 1 being the lowest:


Recency score

Frequency score

Monetary value score













However, the table above is just the tip of the iceberg. Once we start going deeper into the data, more detailed segments should emerge. Below are actual RFM segments we did for a retail company. These are based off of POS data, and include some recommendations for each segment.

Customer SegmentActivityActionable Tip
ChampionsBought recently, buy often and spend the most!Reward them. Can be early adopters for new products. Will promote your brand.
Loyal CustomersSpend good money with us often. Responsive to promotions.Upsell higher value products. Ask for reviews. Engage them.
Potential LoyalistRecent customers, but spent a good amount and bought more than once.Offer membership / loyalty program, recommend other products.
Recent CustomersBought most recently, but not often.Provide on-boarding support, give them early success, start building relationship.
PromisingRecent shoppers, but haven’t spent much.Create brand awareness, offer free trials
Customers Needing AttentionAbove average recency, frequency and monetary values. May not have bought very recently though.Make limited time offers, Recommend based on past purchases. Reactivate them.
About To SleepBelow average recency, frequency and monetary values. Will lose them if not reactivated.Share valuable resources, recommend popular products / renewals at discount, reconnect with them.
At RiskSpent big money and purchased often. But long time ago. Need to bring them back!Send personalized emails to reconnect, offer renewals, provide helpful resources.
Can’t Lose ThemMade biggest purchases, and often. But haven’t returned for a long time.Win them back via renewals or newer products, don’t lose them to competition, talk to them.
HibernatingLast purchase was long back, low spenders and low number of orders.Offer other relevant products and special discounts. Recreate brand value.
LostLowest recency, frequency and monetary scores.Revive interest with reach out campaign, ignore otherwise.

RFM segmentation will allow you to prioritize your marketing activity. It should help you identify your most valuable (and least valuable) segments using data so that you can budget your efforts and funding accordingly.

Correlation Clustering

While RFM segmentation analysis is a great place to start, it falls short on two counts: (1) it only looks at three factors, and (2) it’s only useful for segmenting customers that are already engaged with your brand (i.e. those with existing transaction records).

What if you want to predict purchase size based on gender, age, and location, or find out which characteristics your most valuable customers share?

Then you’ll need to cross-examine, cross-tabulate, and analyze more factors.

Correlation clustering groups data points according to their many similarities—that is, by examining all existing factors and figuring out which data points are most alike.

This is done by identifying centroids (think of these as the reference points for each cluster), and grouping data points according to their closest centroids or “nearest neighbors”. A centroid is identified by measuring the distances between data points. The point with the smallest aggregate distance (i.e. the one closest to its surrounding points is usually the centroid We illustrate this in the charts below.

Data points and centroids

Data points clustered according to nearest neighbors

Although the chart appears flat, in reality, clustering can involve multiple dimensions. As a matter of fact, many clusters involve far more dimensions than the three we’re used to seeing. It really depends how many fields you want to analyze.

Correlation clustering can tell you which factors tend to go together—and the results may surprise you. Remember beer and diapers?

A Quick Example (Segmentation Using Pivot Tables)

This isa simple customer segmentation exercise using the Pivot Table function on MS Excel. For this exercise, we’ll use the table below:

Our objectives are to find out Average Spend for the following segments:

  • Age
  • Gender
  • Age & Gender

Check the video below for a step-by-step guide.

Ready to do some analysis of your own? Download the Excel file here and let us know what you find in the comments below.


Customer segmentation is one of many tools that data-driven marketers use to monitor, predict, and optimize their work.

Once you’ve discovered and understood key similarities among your customer base through segmentation, it becomes much easier to reach your most valuable audiences with targeted ads and campaigns.

We’ll be exploring more marketing analytics techniques in the weeks leading up to our Business Analytics Masterclass this coming April. Check back regularly, or subscribe to our mailing list for live updates.

Marketing Analytics 2019: 6 Use Cases for Better ROI

Who should read this?

C-Level Executives – Why invest in marketing analytics?
Marketing Managers – How can our analytics team help with marketing?
Analytics Professionals – What marketing-related problems can I solve?
Analytics Job-Seekers – How can I improve my pitch during interviews?

The old saying, “nobody counts the number of ads you run; they just remember the impression you make” is truer now than it has ever been.

Although marketers run more ads today than ever before, many continue to struggle to reach their respective markets. As a result, the average marketer burns through their budget for minimal returns.

The reason?

People hate ads. Specifically, people hate ads that suck—and the same can be said about any kind of marketing activity.

When a field is as saturated as digital marketing, you’re bound to have your digital space polluted with crappy content. There are plenty of people who claim to know what the market wants, but only a handful can walk the talk.

Rather than letting loose generic ads upon an undefined audience (which is a great way to waste cash btw), marketers should aim to understand their customers on a deeper level.

Making the right impression takes intelligent decision making, which in turn depends on having access to the kinds of facts and evidence that only a foothold in data analytics can provide.

This article will outline six use-cases that illustrate how working with data can yield greater returns. If you’re sick of feeding your budget to the black holes of guesswork and intuition, read on.

1. Customer Segmentation

Customer segmentation is the practice of splitting your customer base into groups that share common characteristics that matter when it comes to marketing.

Think of customer segmentation as a magnifying glass that allows marketers see unique groups within a broader audience. Segments might be based on factors like age, gender, interests, or spending habits—anything that helps you tap into the specific demands of your target market.

In the UK, skincare brand NIVEA Sun was able to divide their market based on demographic and attitudinal characteristics. This allowed the brand to grow its portfolio to over 40 products, increase its market share by an average of 6.4% per year, and expand into new markets such as men’s facial care, aftershave, and deodorants.

Customer segmentation is a marketing analytics use-case that can answer important questions you may have about your market such as:

  • Which customers generate the highest lifetime value for my brand?
  • What features does each customer group look for in my brand?
  • Which customers provide the highest upside?
  • Who should I target for a discount campaign?

Once you can identify groups within your larger audience base, you can easily set up campaigns that speak directly to each of them and create content that generates genuine interest instead of passive disgust.

The data says all our customers hate our products

Looking to learn customer segmentation?

Visit our Customer Segmentation 101 blog for some hands-on exercises—we also attached a downloadable dataset you can work with.

2. Market Basket Analysis

If customer segmentation is about finding people who share similar traits, market basket analysis is about finding patterns in what people buy.

The underlying theory behind market basket analysis is this: if you buy a certain set of items, you are more (or less) likely to buy another set of items.

For example, let’s say people who buy underwear are more likely to buy socks during the same shopping trip. A department store that knows this might choose to place their sock aisle closer to other undergarments, or offer a bundle featuring both products.

Market basket analysis was first applied in supermarkets and groceries to predict which items were likely to go together in a single cart or basket (hence, “market basket”). Today, it’s applied to all forms of retail, including consumer goods, fashion, and others.

One of the best known applications of market basket analysis is in the fast food industry. Restaurants like McDonald’s and Burger King know that burger buyers are highly likely to order fries and soda on the side, which is why they upsell nearly every burger purchase. In fact, we’re at a point where the upsell has become the default.

This simple bundling practice has made fast food chains billions—and it’s doubly impressive because they make far better margins on the upsell than on the initial purchase!

3. Marketing Mix Modeling

One of the biggest questions marketers face is, finding an optimal way to allocate their budget and effort across a slate of different marketing channels.

Marketing mix modeling is a marketing analytics technique that predicts the ROI of each marketing channel, and suggests the best combination of marketing channels given a certain budget.

It helps marketers know how best to spend, and can be used to predict sales, track campaign effectiveness, and create data-driven budget plans.

Any marketer that can apply marketing mix modeling will have a breeze with budget approval, effectively waving goodbye to awkward conversations about wasting company money.

4. Propensity-to-Purchase Modeling

As the name implies, propensity to purchase modeling (a.k.a. lead scoring or likelihood-to-buy modeling) is method used to determine the likelihood of a customer to buy a product or perform a predefined action.

Marketing Analyst on a computer

These graphs are too upward sloping. They must be true.

Most models under this use-case classify prospects on a spectrum of “highly likely” to “highly unlikely”—the more similar a prospect is to previous buyers, the higher the likelihood of a purchase (or action), and vice versa.

Propensity to purchase modeling is especially helpful for calibrating prospects. In practice, you’d be able to direct your ads to customers who are more likely to convert, while deprioritizing those who are less likely to make a purchase.

This can potentially save you tens of thousands in ad expenses, increase your conversion rates, and boost ROI. If you ask us, that’s a pretty good reason to go data-driven.

5. Sentiment Analysis

In our social media-driven world, it pays to know exactly how people feel about your brand: Are they happy with your product? Is your service enjoyable? Are your sales reps leaving a positive impression on your customers?

Sentiment analysis (also called opinion mining) is defined as the practice of extracting insights about the emotions within a series of words.

It uses a method called natural language processing (NLP) to help machines uncover meanings behind certain words or groups of words.

Unlike the above use-cases that measure probability and magnitude, sentiment analysis attempts to measure human emotion (e.g. happy, sad, angry, excited, etc.) by looking at unstructured data in the form of social media posts, reviews, blogs, and the like.

Popular applications of this use-case include: social listening, intent analysis, and contextual search.

Good marketing is about listening as much as it is about broadcasting ideas. Using sentiment analysis, you can get better and faster feedback so that you can meet your market where it needs you.

6. Time-to-Next-Purchase Modeling

Time-to-next purchase modeling draws its inspiration from survival analysis or time-to-event analysis. However, instead of solving traditional problems like mortality, morbidity, and half-life, it is used in marketing analytics to forecast the time to a purchase or conversion.

This could be the time it takes for a driver to change car tires, for an organic restaurant to restock on fresh produce, or for the average woman to shop for more makeup.

Using time-to-next purchase modeling, you can anticipate your customer’s needs ahead of time and plan accordingly.

Let’s say you work for an optical shop and know that it takes roughly two months until contact lenses need to be replaced. A month and a half after a customer purchases a set of lenses from your store, you can send them an email reminder to restock and offer them free delivery while you’re at it.

According to Listrak, replenishment emails have the highest conversion rate (28.59%) of all post-purchase emails. That’s an easy return for very minimal investment.


As we said from the start, getting positive returns from your marketing activity isn’t a matter of running the most ads or having the widest reach; it’s about hitting the right notes to appeal to your target market.

These six use-cases illustrate how data-driven marketing does precisely that. By cross-checking the numbers, you can make reliable conclusions based on past encounters and experiences.

Intelligent guessing is no longer the intelligent approach to marketing. From here on out, it’s data or bust.


Over the next few months, we’ll be dedicating an article each for the six marketing analytics use-cases. These will include how-to’s on MS Excel and Google Analytics so that you can begin using data to drive real-life marketing strategies and campaigns.

Like us on 
Facebook, and you’ll be the first to know when a new article drops…or, if you’re after more than just snippets from our playbook, you can sign up for our upcoming Marketing Analytics Masterclass and leave with entire pages’ worth of strategies and techniques.