
NFL Game Outcome Prediction
ML/Analysis Techniques
- Supervised learning
- Logistic regression, SVM and XGboost
- Exponential Moving Average Features
- Time series cross validation and grid search
Libraries/tools
- Python
- Web scraping
- Matplotlib/seaborn
- scikit-learn
Overview
For this personal project, I web scraped several years of NFL game, vegas odds, and team power ranking data to create a dataset that is unlike anything I was able to find online. After cleaning the data and merging all of the dataframes into one, I created exponential moving average (EMA) features for some of the important numerical variables like power rank and team scores. Through research, I found EMA features to be commonly used for such objectives since they put more weight on recent observations.
Next, I began modeling the data using SVM, logistic regression and XGboost to predict the binary target class of home team win or loss. In order to prevent data leakage, I performed cross-validation and a grid search of parameters using sklearn's TimeSeriesSplit and a custom grid search process I adapted from an online resource to preserve the chronology of the data.
To my surprise, a tuned SVM model provided the best results on the test data (one full NFL season) with 64.8% accuracy and precision, 76.1% recall, and 70% F1 score.
Please see my GitHub repository for the project files.