
Customer Segmentation
ML/Analysis Techniques
- Unsupervised clustering (KMeans)
- PCA
- Feature Engineering
- Interactive Data Viz
Libraries/tools
- Python
- Pandas
- Plotly
- scikit-learn
Overview
For this project, I wanted to segment the customers of an actual UK based online store. The dataset contains roughly 500k orders that took place between 01/12/2009 and 09/12/2011. Since each row in the original dataset contained an individual transaction, after some EDA, I needed to convert the dataset to a unique customer level for modeling.
The customer level features I created were:
- average order value per customer
- number of orders per customer
- customer country
- most frequently purchased product by customer
- total spent by customer
- most active month per customer
I enjoyed using the Plotly library for the data visualizations in this project due to its simplicity and interactivity. The final customer segments (clusters) can be viewed interactively here or you can click on the static image below to be brought to the interactive version.

Please see my GitHub repository for the project files.