Customer Churn Prediction

ML/Analysis Techniques

  • Supervised learning
  • Most modern classfication algorithms
  • Class balancing methods
  • Feature Engineering

Libraries/tools

  • Python
  • Tableau
  • Matplotlib/seaborn
  • scikit-learn

Overview

For this project, I started with a dataset from Kaggle that contained product, service, and customer information from a telecommunications company. After cleaning the data, I engineered a number of features using deviation and binning techniques. The deviation features represent a feature value relative to members of a category it belongs to and the binning technique entailed converting continuous variables into "bins" or discrete groups.

After performing EDA and trying to create as much linear separation as possible between the feature and target variables, I used random oversampling to balance the classes. Next, I tuned several ML classification algorithms on the enhanced dataset to predict customer churn. The models I used consisted of logistic regression, KNN, Naive Bayes, random forest, and XGBoost. After tuning the XGBoost algorithm, it was the best performing model with 86.9% accuracy, 91.6% recall, and 87.8% F1 score. Finally, I created a comprehensive Tableau dashboard containing all of the key insights which can be seen in the presentation below.

See my GitHub repository for the project files.