NFL Game Outcome Prediction

ML/Analysis Techniques

  • Supervised learning
  • Logistic regression, SVM and XGboost
  • Exponential Moving Average Features
  • Time series cross validation and grid search

Libraries/tools

  • Python
  • Web scraping
  • Matplotlib/seaborn
  • scikit-learn

Overview

For this personal project, I web scraped several years of NFL game, vegas odds, and team power ranking data to create a dataset that is unlike anything I was able to find online. After cleaning the data and merging all of the dataframes into one, I created exponential moving average (EMA) features for some of the important numerical variables like power rank and team scores. Through research, I found EMA features to be commonly used for such objectives since they put more weight on recent observations.

Next, I began modeling the data using SVM, logistic regression and XGboost to predict the binary target class of home team win or loss. In order to prevent data leakage, I performed cross-validation and a grid search of parameters using sklearn's TimeSeriesSplit and a custom grid search process I adapted from an online resource to preserve the chronology of the data.

To my surprise, a tuned SVM model provided the best results on the test data (one full NFL season) with 64.8% accuracy and precision, 76.1% recall, and 70% F1 score.

Please see my GitHub repository for the project files.