Customer Segmentation

  1. Introduction

Customer Segmentation is the process by which you divide your customers based on common characteristics and behavior. It is a sub-part of the bigger niche called Customer Personality Analysis. Customer Personality Analysis is a detailed analysis of a company’s ideal customers. It helps a business to better understand its customers and makes it easier for them to modify products according to the specific needs, behaviors, and concerns of different types of customers.

Customer personality analysis helps a business to modify its product based on its target customers from different types of customer segments. For example, instead of spending money to market a new product to every customer in the company’s database, a company can analyze which customer segment is most likely to buy the product and then market the product only on that particular segment.

In this article, I will talk about Customer Personality Analysis and Customer Segmentation using Machine Learning. It is the classical use case of Unsupervised Learning. Before starting with the use case, let us first talk about Unsupervised Learning.

2. Unsupervised Learning

Machine Learning systems can be classified according to the amount and type of supervision they get during training. There are four major categories: Supervised Learning, Unsupervised Learning, Semisupervised Learning ad Reinforcement Learning. In this section, we will be talking about Unsupervised Learning.

In Unsupervised Learning, as you might guess, the training data is unlabeled. The system tries to learn without a teacher. One of the most important unsupervised learning algorithms is Clustering (which we will be using in our use case as well). Clustering is the task of identifying similar instances and assigning them to clusters, or groups of similar instances. A Clustering Algorithm tries to detect similar groups within a population. For example, If you have your website and you may want to run a clustering algorithm to find a group of similar visitors. Below is an image that shows the use of a Clustering Algorithm.

This is how a clustering algorithm is able to cluster the similar groups together.

We will be using Clustering for Customer Segmentation problems. We can cluster our customers based on the purchases and their activities on our website/in our stores etc. This is useful to understand who our customers are and what they need, so you can adapt your products and marketing campaigns to each segment. For example, if we decide to launch a new product, it would be much better to market it to the cluster which is most likely to buy it instead of wasting money in marketing the product to the whole customer base.

The Clustering Algorithm that we will be using for our use case is called the K-Means Algorithm.

3. K-Means Algorithm

The K-Means Algorithm is a simple algorithm capable of clustering spherical and equidense datasets very quickly and efficiently, often in just a few iterations. By equidense, I mean every cluster is roughly of the same size.

Source: Sklearn Documentation (check References Section)

In K-Means, the “K” stands for the number of centroids or clusters. Note that you have to specify this “K” that the algorithm must find. So, let us now try to get an intuition on how does K-Means algorithm work. In the case of unsupervised learning, we are provided neither with the labels nor with the centroids. So, we just start by placing the centroids randomly then label the instances, update the centroids, label the instances, update the centroids, and so on until the centroids stop moving. Then it can be said that the algorithm has converged.

Now the question arises, how do you decide the number of centroids or clusters? In some cases, it could be visually clear but in other cases, we use something known as the elbow method. You can refer to this article or my Kaggle Notebookfor further reference. Now, let’s just skip to the good part, Customer Segmentation. Shall we?

4. Customer Segmentation

The problem statement here is,

Given the database of a Supermarket, perform a detailed analysis of the customer’s behavior and segment them into separate clusters to help the business better understand its customers and make it easier for them to modify products according to the specific needs, behaviors, and concerns of different types of customers. The dataset for the above problem is attached for your reference.

The first step towards solving a Data Science problem is to understand the data well this includes drawing the insights, handling the null values, taking the necessary steps for preprocessing, etc. The dataset looks like this when stored in a Dataframe.

Marketing Data in a Pandas Dataframe

From the initial analyses, it can be made out that the “Income” column has 24 Null Values i.e ~1% of the total data.

The next step involves imputing these null values. Although the null values are just ~1% of the data dropping these rows won’t cause much information loss and can be one of the approaches in this use case. But still, we proceed by imputing these values. Income columns generally follow a skewed distribution and the same can be observed in this use case as well. The following histogram shows the distribution of the income in our dataset.

Frequency Distribution of Income

Next, we simply take the log of this column, find the mean of the log of Income then we take the exponent of this value and impute it in place of the null values. The following code cell demonstrates the same.

Null Values Handling

Next, we create some additional features using the existing features. This step is known as feature engineering. The following new features are created:

  1. Age of Customers
  2. Number of Children
  3. Number of Family members
  4. Whether a customer is a parent or not
  5. Number of Years since customer
  6. Total amount spent
  7. Amount Spent and Income Ratio

The next step in Data Preprocessing is encoding the categorical variable. A Machine Learning model cannot interpret “string” or “text” values. Hence, it must be encoded into numerical values. I am using pandas get_dummies() function to encode the nominal variables and manually encoding the ordinal variables. You can also use OneHotEncoder and LabelEncoder classes to perform the nominal and ordinal encoding respectively. The following code cell demonstrates the same.

Categorical Variables Encoding

After encoding, the next step is to scale all the variables. I will be using Scikit Learn’s StandardScaler class, which scales the data by making the mean as 0 and standard deviation as 1.

Scaling the data using Standard Scaler

This is how the final scaled and encoded dataframe looks.

Data Ready for Modelling

Though our data is ready for modeling (clustering) it will be impossible for us to visualize this data because it contains 24 variables that are impossible to plot. Hence, we must reduce the dimensions of the data. For this, we use a dimensionality reduction technique called Principal Component Analysis or PCADimensionality Reduction is extremely useful for Data Visualisation. Along with this, it speeds up training as well. Reducing the dimensions to two or three makes it possible to plot a condensed view of a high-dimensional training dataset on a graph and often gain some important insights by visually detecting patterns as clusters. The following code cells demonstrate the dimensionality reduction.

Let us plot the dimensionally reduced data on a 3D plot.

Finding the Number of Clusters

As discussed earlier, we can decide the number of clusters using the elbow method, the following code snippet shows the implementation of the same in python.

From the elbow method plot, we can make out that there should be 4 clusters. There is another method to determine the number of clusters, it is called the silhouette score analysisSilhouette Score is defined as the mean silhouette coefficient over all the instances. Silhouette Coefficient is given as

where a is the mean distance to the other instances in the same cluster (i.e., the mean intra-cluster distance) and b is the mean nearest cluster distance (i.e., the mean distance to the instances of the next closest cluster.)

The silhouette coefficient value signifies how similar a point is to its own cluster compared to other clusters.

The silhouette score calculation is implemented as follows:

The silhouette score is maximum for 3 clusters. We will visualize both 3 and 4 clusters but perform the cluster profiling using k = 3 i.e., 3 clusters. Next, we run the Clustering Algorithm with both k = 3 and 4 and visualize the data with clusters.

K-Means implementation with K=3 and K=4

Plotting the reduced data with clusters.

5. Customer Profiling

The clusters are almost evenly distributed. The following count plot shows the distribution of each cluster.

Cluster Distribution

From the analysis of clusters, the following can be made out:

So, this was the Customer Personality Analysis Project. I hope you liked my detailed explanation. Please refer to my Kaggle Notebook for the entire code and analysis. Please do support if you find it helpful this motivates to bring more informative blogs and notebooks. For further information and knowledge on topics that I covered in this project please refer to the below-attached references these would really help you increase your knowledge.

6. References


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: