AM 207 Final Project Website

Simulations

Having drawn several thousand samples from the posterior distribution of our model parameters and assessed their predictive power against a testing set, we were interested in our their ability to predict outcomes for the 2015 NCAA Tournament. Such simulations have become a popular fixture of the tournament, drawing multitudes of both hardcore basketball fans and casual watchers seeking an edge in their office betting pools. Nate Silver’s 2011 effort (for a wildly improbable tournament year) is still available at the New York Times. He also broke down his methodology in a blog post at the time; he made use of Ken Pomeroy’s statistics like we did, though his predictions appear to be a more straightforward application of probability theory in comparison to our fully Bayesian approach. Silver and this team at FiveThirtyEight continued this tradition for the latest tournament, simulating games and forecasting how far deep team will go before elimination.

Methodology

For our effort, we used sampled covariates to simulate each round of the 2015 Tournament. Winners from previous rounds are used to simulate subsequent rounds (i.e., we do not use real-life outcomes to select matchups for any round after the first), allowing us to estimate how far any team could have progressed in the tournament. Our initial simulation method applied each sampled logistic regression model to each matchup and declared the winner to be the team with the highest probability of winning (i.e., that team with probability greater than 0.5). That works decently for evently-matched teams, but performs poorly when estimating win probability for significant underdogs, who should win sometimes against statistically superior opponents. To fix this, we treated each game as a Bernoulli trial; we used the logistic regression model output (a probability) to parameterize its distribution and draw randomly from it using SciPy.

Code

Tournament simulations are executed with a complex set of recursive functions. /src/model/game_predictions.py contains all simulation code and comments to help the reader decipher its operation. /src/model/tournament_simulations.ipynb contains the actual execution and analysis of tournament outcomes.

Results

Our simulations showed the following teams to be the most likely to advance deep in the tournament: Kentucky, Villanova, Wisconsin, Duke, Virginia, Arizona, and Gonzaga. These rates pass a simple test of believability, as those teams were generally regarded as the best in the tournament. And with two major exceptions (discussed further below), these teams did perform well in the tournament.

Overall team success is summarized in the table below. Teams at the top won the most games in the actual tournament; i.e., Duke won the tournament and the bottom 32 teams were eliminated in the first round. Percentages represent the portion of simulations in which each team won in that round. Green percentages show rounds where teams won; red percentages show rounds where they lost. For example, Duke’s 98% in the upper left indicates that they won in the first round in 98% of simulations; the green color indicates that they won that game in real life. The table is scrollable; scroll down to see more results.

Team	Round of 64	Round of 32	Sweet Sixteen	Elite Eight	Final Four	NCAA Championship
(1) Duke	98%	83%	60%	36%	18%	6.5%
(1) Wisconsin	98%	89%	75%	49%	25%	16%
(1) Kentucky	100%	94%	88%	74%	48%	35%
(7) Michigan St.	67%	19%	10%	3.1%	0.9%	0.2%
(2) Arizona	99%	84%	69%	37%	17%	10%
(2) Gonzaga	97%	77%	54%	31%	16%	5.4%
(4) Louisville	80%	32%	7.1%	2.3%	0.7%	0.2%
(3) Notre Dame	93%	67%	43%	12%	4.0%	1.8%
(4) North Carolina	86%	60%	15%	5.2%	1.4%	0.4%
(8) North Carolina St.	56%	7.1%	2.4%	0.5%	0.1%	<0.1%
(3) Oklahoma	91%	62%	23%	7.7%	2.8%	0.7%
(11) UCLA	35%	10%	2.2%	0.5%	0.1%	<0.1%
(5) Utah	74%	53%	22%	11%	4.2%	1.1%
(5) West Virginia	64%	40%	4.4%	1.4%	0.3%	0.1%
(7) Wichita St.	72%	40%	19%	3.9%	1.1%	0.4%
(6) Xavier	59%	25%	4.9%	1.0%	0.2%	<0.1%
(5) Arkansas	79%	32%	5.4%	1.3%	0.2%	<0.1%
(6) Butler	46%	14%	5.3%	0.8%	0.1%	<0.1%
(8) Cincinnati	56%	3.8%	2.0%	0.6%	0.1%	<0.1%
(11) Dayton	40%	12%	2.3%	0.3%	<0.1%	0%
(4) Georgetown	85%	33%	9.1%	2.8%	0.8%	0.1%
(14) Georgia St.	19%	6.0%	0.7%	0.1%	<0.1%	0%
(7) Iowa	50%	11%	4.7%	1.4%	0.3%	0.1%
(2) Kansas	87%	48%	22%	4.6%	1.2%	0.4%
(4) Maryland	66%	31%	3.0%	0.9%	0.2%	<0.1%
(5) Northern Iowa	89%	62%	19%	8.0%	3.2%	0.8%
(10) Ohio St.	58%	10%	5.0%	1.1%	0.2%	0.1%
(8) Oregon	50%	5.1%	1.8%	0.3%	0.1%	<0.1%
(8) San Diego St.	52%	10%	3.6%	1.0%	0.3%	0.1%
(14) UAB	7.6%	1.3%	0.1%	<0.1%	0%	0%
(1) Villanova	98%	88%	70%	47%	30%	13%
(2) Virginia	96%	75%	56%	29%	16%	6.1%
(14) Albany	9.1%	1.6%	0.1%	0%	0%	0%
(3) Baylor	81%	55%	16%	4.5%	1.1%	0.4%
(15) Belmont	3.8%	0.6%	0.1%	<0.1%	0%	0%
(12) Buffalo	36%	17%	1.2%	0.3%	<0.1%	<0.1%
(16) Coastal Carolina	2.0%	0.3%	0.1%	<0.1%	<0.1%	0%
(10) Davidson	50%	12%	4.5%	1.2%	0.3%	0.1%
(13) Eastern Washington	15%	1.6%	0.2%	<0.1%	<0.1%	0%
(10) Georgia	33%	5.5%	2.0%	0.3%	0.1%	<0.1%
(16) Hampton	0.4%	<0.1%	0%	0%	0%	0%
(13) Harvard	14%	4.3%	0.3%	<0.1%	0%	0%
(10) Indiana	28%	10%	2.9%	0.4%	0.1%	0%
(3) Iowa St.	92%	63%	27%	12%	4.4%	1.0%
(9) LSU	44%	4.8%	1.5%	0.3%	<0.1%	0%
(16) Lafayette	1.7%	0.2%	0.1%	<0.1%	0%	0%
(11) Mississippi	41%	14%	2.4%	0.4%	0.1%	<0.1%
(15) New Mexico St.	13%	2.1%	0.3%	<0.1%	0%	0%
(15) North Dakota St.	3.0%	0.4%	<0.1%	<0.1%	0%	0%
(14) Northeastern	6.8%	1.1%	0.2%	<0.1%	0%	0%
(9) Oklahoma St.	50%	5.2%	1.8%	0.4%	0.1%	<0.1%
(6) Providence	60%	24%	6.0%	1.4%	0.4%	0.1%
(9) Purdue	44%	2.2%	0.9%	0.2%	<0.1%	<0.1%
(16) Robert Morris	2.1%	0.2%	<0.1%	<0.1%	0%	0%
(6) SMU	65%	26%	8.0%	2.6%	0.6%	0.1%
(9) St. John’s	48%	6.8%	2.1%	0.4%	0.1%	<0.1%
(12) Stephen F. Austin	26%	12%	2.8%	0.7%	0.2%	<0.1%
(11) Texas	54%	18%	7.5%	1.2%	0.2%	0.1%
(15) Texas Southern	1.2%	0.1%	<0.1%	<0.1%	0%	0%
(13) UC Irvine	20%	3.3%	0.2%	<0.1%	<0.1%	0%
(7) VCU	42%	5.5%	2.1%	0.4%	0.1%	<0.1%
(13) Valparaiso	34%	11%	0.7%	0.2%	<0.1%	0%
(12) Wofford	21%	3.6%	0.2%	<0.1%	0%	0%
(12) Wyoming	11%	2.4%	0.1%	<0.1%	0%	0%

Discussion

Some results from our tournament simulations stand out:

With the exception of Michigan State, the actual Final Four teams were among the most probable to reach that round according to our model. Our model gave Michigan State very little at any round after the first.
We were clearly unable to foresee Virginia’s and Villanova’s early losses in the tournament. In our simulations, they advanced to the Sweet 16 75% and 88% of the time (respectively). The teams that beat them (Michigan State and North Carolina State) had correspondingly low probabilities of victory in our simulations.
Based on pre-tournament data, Michigan State reaching the Final Four is the unlikeliest observed outcome in terms of our simulation. The numbers show that Tom Izzo (MSU’s coach) is known to outperform in the NCAA Tournament. Adding a “coached by Tom Izzo” dummy variable may have been the single best thing we could have done to improve our model.
The University of Alabama-Birmingham’s upset of Iowa State was the biggest upset in terms of our simulation prediction.
Lowly Harvard had little chance against North Carolina, let alone potential later tournament matchups. They did, however, manage to make the Final Four 4 times in our simulation (0.2% of the time).

Ultimately, our simulations proved satisfying in some regards and less so in others. The latter cases illustrate the Madness of March: in a big enough tournament field, even the unlikeliest team has a chance to prevail. The 1998 women’s Crimson-Cardinal matchup shows that anything can happen.

Certain tournament rounds are particularly interesting for analysis: the Final Four and championship game. We were interested in how our model would perform when predicting the actual Final Four (Kentucky, Wisconsin, Michigan State, and Duke), as well as Duke’s besting of Wisconsin in the championship game. As only a small number of our simulations from above might actually feature these matchups, we opted to re-simulate both rounds using our entire posterior sample.

As expected, even when Michigan State had reached the Final Four, its chances of advancing to—or winning—the championship were remote. According to our model, Duke’s real-life victory was not a surprise. However, our model predicted a Kentucky victory over Wisconsin, and that either of those two teams would defeat Duke. Wisconsin’s victory over Kentucky and Duke’s win in the championship was therefore the unlikeliest of all outcomes that featured a Michigan State loss. Simulating the actual championship matchup, Wisconsin beat Duke 65% of the time in our simulations.