Featuring 351 first division men’s teams, college basketball is a major source of excitement in American sports. The annual NCAA Tournament regularly attracts more advertising spending than the Super Bowl, and tens of millions of Americans pore over box scores and statistics to win a share of the $12 billion in associated gambling transactions. In that tradition—making predictions, not indulging our vices—we are interested in applying Bayesian analysis of basketball team characteristics to predicting game outcomes. We will use past years’ data when building our model, and evaluate its performance using actual game results from the 2014-15 season and tournament.

Related Work

Naturally, our interest in this predictive task is far from unique. Mainstream news organizations regularly publish probabilistic predictions of game outcomes, especially during March Madness. There exist academic treatments of the topic as well—Lopez and Matthews recently applied an MLE logistic regression model to game outcomes that performed quite well when predicting winners in the 2014 tournament. In addition, paired comparison models are also used to predict outcomes of binary data such as the Coke/Pepsi Challenge and basketball games, and can provide a probabilistic interpretation using logistic regression.

From a Bayesian angle, our most direct inspiration for this project was the winning submission of the 2013 UseR Data Analysis Contest, featuring a hierarchical Bayesian Poisson model to predict Spanish soccer match scores. Allen Downey implemented a similar model in Think Bayes to predict the Boston Bruins’ Stanley Cup chances.

Our work draws inspiration from the Bayesian bent of the latter works (and what we have learned in class thus far) and attempt to solve the problems of the former in a more sophisticated (and Bayesian!) manner.