Customer Segmentation

ML/Analysis Techniques

  • Unsupervised clustering (KMeans)
  • PCA
  • Feature Engineering
  • Interactive Data Viz


  • Python
  • Pandas
  • Plotly
  • scikit-learn


For this project, I wanted to segment the customers of an actual UK based online store. The dataset contains roughly 500k orders that took place between 01/12/2009 and 09/12/2011. Since each row in the original dataset contained an individual transaction, after some EDA, I needed to convert the dataset to a unique customer level for modeling.

The customer level features I created were:

  • average order value per customer
  • number of orders per customer
  • customer country
  • most frequently purchased product by customer
  • total spent by customer
  • most active month per customer
After creating the customer level features, I performed PCA to determine how many components were actually needed. Five of the six components were able to explain more than 90% of the variance, so I reduced the dataset to just those five components. Afterward, I used the elbow and silhouette methods to determine the optimal number of clusters to use with KMeans, which came out to four clusters.

I enjoyed using the Plotly library for the data visualizations in this project due to its simplicity and interactivity. The final customer segments (clusters) can be viewed interactively here or you can click on the static image below to be brought to the interactive version.

Please see my GitHub repository for the project files.