Next Steps

We believe that our work aptly demonstrates an effective Bayesian approach to the problem of college basketball prediction. Our model succeeds both intuitively (demonstrating the fundamental importance of offensive and defensive efficiency) and empirically in its ability to predict game outcomes. But there are several logical extensions to our work that we or others may wish to explore in the future.

While binary (win/loss) game predictions are interesting, we considered tackling the much more complex problem of predicting victory margins or actual game scores. On the surface, this change appears to be as simple as transforming our model from a logistic regression model to a linear one. Practice is usually not so simple; restating the problem would likely require careful consideration of how to model the stated outcome. Predicting margins appears relatively straightforward, but predicting two teams' scores for each game might be more difficult when considering differences in playing styles. Still, a Bayesian approach to either problem would have a distinct advantage in its ability to simulate games like we did above, and therefore use our model's results as a rough measure of accuracy.

Finally, we settled on a fairly narrow set of model features, and only considered features that describe the game's location and the playing teams' aggregate performance statistics. This was in part by design, as our variable selection process indicated that these were viable features. But the KenPom dataset is extremely rich, providing game box scores by player, aggregate and trailing player statistics, lineup information (what players tend to play together), and granular game-by-game statistical breakdowns. But this information is not stored in a clean, accessible format, and would have required extensive cleaning and wrangling to model properly. With time, we would have enjoyed tackling this process, as it might reveal very specific strategic insights that could swing the outcomes of close games.