Glossary:

  • OBPR: Offensive Bayesian Performance Rating reflects the offensive value a player brings to his team when he is on the court. This rating incoroporates a player's individual efficiency stats, and also accounts for the offensive strength of other teammates on the floor with him, along with the defensive strength of the opponent's players on the floor. A higher rating is better.
  • DBPR: Defensive Bayesian Performance Rating reflects the defensive value a player brings to his team when he is on the court. This rating incoroporates a player's individual efficiency stats, and also accounts for the defensive strength of other teammates on the floor with him, along with the offensive strength of the opponent's players on the floor. A higher rating is better.
  • BPR: Bayesian Performance Rating is the sum of a player's OBPR and DBPR. This rating is the ultimate measure of a player's overall value to his team when he is on the floor. A higher rating is better.
  • Off Poss: Number of meaningful offensive possessions played
  • Def Poss: Number of meaningful defensive possessions played
  • PER: Player Efficiency Rating, a metric created by John Hollinger, which attempts to capture all aspects of a player's performance into one number. While it has its shortcomings, it's widely used.
  • Team Off Eff: Team offensive efficiency (points scored per 100 possessions) with player on the court. A higher value is better.
  • Team Def Eff: Team defensive efficiency (points allowed by opponent per 100 possessions) with player on the court. A lower value is better.
  • Team Off-Def Eff: Difference between team offensive and defensive efficiency with player on the court. A higher value is better.
  • +/-: Number of points scored for the player's team with him on the court, minus the number of points scored by the opponent with him on the court.


Glossary:

  • OBPR: Offensive Bayesian Performance Rating reflects a team's true offensive efficiency. A higher rating is better.
  • DBPR: Defensive Bayesian Performance Rating reflects a team's true defensive efficiency. A higher rating is better.
  • BPR: Bayesian Performance Rating is the sum of a team's OBPR and DBPR This rating is the ultimate measure of a team's overall strength. A higher rating is better.
  • True Tempo: A measure of a team's game pace.
  • Off Rank: A team's rank in OBPR.
  • Def Rank: A team's rank in DBPR.
  • Tempo Rank: A team's rank in game pace.


Glossary:

  • OBPR: Offensive Bayesian Performance Rating reflects the offensive value a player brings to his team when he is on the court. This rating incoroporates a player's individual efficiency stats, and also accounts for the offensive strength of other teammates on the floor with him, along with the defensive strength of the opponent's players on the floor. A higher rating is better.
  • DBPR: Defensive Bayesian Performance Rating reflects the defensive value a player brings to his team when he is on the court. This rating incoroporates a player's individual efficiency stats, and also accounts for the defensive strength of other teammates on the floor with him, along with the offensive strength of the opponent's players on the floor. A higher rating is better.
  • BPR: Bayesian Performance Rating is the sum of a player's OBPR and DBPR. This rating is the ultimate measure of a player's overall value to his team when he is on the floor. A higher rating is better.
  • Off Poss: Number of meaningful offensive possessions played
  • Def Poss: Number of meaningful defensive possessions played
  • PER: Player Efficiency Rating, a metric created by John Hollinger, which attempts to capture all aspects of a player's performance into one number. While it has its shortcomings, it's widely used.
  • Team Off Eff: Team offensive efficiency (points scored per 100 possessions) with player on the court. A higher value is better.
  • Team Def Eff: Team defensive efficiency (points allowed by opponent per 100 possessions) with player on the court. A lower value is better.
  • Team Off-Def Eff: Difference between team offensive and defensive efficiency with player on the court. A higher value is better.
  • +/-: Number of points scored for the player's team with him on the court, minus the number of points scored by the opponent with him on the court.
  • Avg Opp BPR: The average BPR of the opponent's players on the floor at the same time as the player. A higher rating indicates that the player played against tougher opposition.

Glossary:

  • Team Off Eff: Team offensive efficiency (points scored per 100 possession) with those two player on the court at the same. A higher value is better.
  • Team Def Eff: Team defensive efficiency (points allowed by opponent per possessions) with those two players on the court at the same time. A lower value is better.
  • Team Off-Def Eff: Difference between team offensive and defensive efficiency with those two players on the court. A higher value is better.
  • Off Poss: Number of offensive possessions with those two players on the court at the same time.
  • Def Poss: Number of defensive possessions with those two players on the court at the same time.
  • Chemistry: A score that reflects how much better than average the team performs when these two players on the court together, compared to team averages when they are on the court individually.
  • Weighted Chemistry: This is a more reliable metric for teammate chemistry. The Chemistry score is multiplied by the number of possessions shared by the two players, to give more weight to player pairs who were on the floor more.

Glossary:

  • Team Off Eff: Team offensive efficiency (points per possession scored) with those two player on the court at the same. A higher value is better.
  • Team Def Eff: Team defensive efficiency (points per possessions by opponent) with those two players on the court at the same time. A lower value is better. - Team Off-Def Eff: Difference between team offensive and defensive efficiency with those two players on the court. A higher value is better.
  • Team Off-Def Eff: Difference between team offensive and defensive efficiency with those two players on the court at the same time. A higher value is better.
  • Off Poss: Number of offensive possessions with those two players on the court at the same time.
  • Def Poss: Number of defensive possessions with those two players on the court at the same time.
  • Above / Below Average: A measure of how much better the teammate played when he was on the court with the player, compared to the teammate's average play. This calculates the team's efficiency when these two players were on the court, minus the team's efficiency for all possessions when the teammate was on the court, regardless of who he played with.




Glossary:

  • Weighted Score: The team's score after filtering out possessions when the game was already decided. See “How It Works” for details
  • Weighted Opp Score: The opponent's score after filtering out possessions when the game was already decided.
  • Off Eff: The team's offensive efficiency (points scored per 100 possessions), adjusted for home court advantage. A higher value is better.
  • Def Eff: The team's defensive efficiency (points allowed by opponent per 100 possessions), adjusted for home court advantage. A lower value is better.
  • Weighted Off Eff: The team's offensive efficiency after filtering out possessions when the game was already decided. This is a more accurate assessment of how the team played on offense than Off Eff.
  • Weighted Def Eff: The team's defensive efficiency after filtering out possessions when the game was already decided. This is a more accurate assessment of how the team played on defense than Def Eff.
  • Off Poss: Number of offensive possessions
  • Def Poss: Number of defensive possessions
  • Weighted Off Poss: Number of offensive possessions after filtering out possessions when the game was already decided.
  • Weighted Def Poss: Number of defensive possessions after filtering out possessions when the game was already decided.

Quick Overview

Welcome to EvanMiya College Basketball Analytics! The main objective of our work is to assess college basketball team and player strength. We have created an advanced statistical metric, Bayesian Performance Rating (BPR), which quantifies how successful a team or player is, using advanced box-score metrics and play-by-play data. This metric is predictive in nature, which means that each rating is fine-tuned to predict performance in future games.

There are several pages of analysis:

  • Team Ratings: We assess the strength of each team by calculating offensive and defensive ratings that reflect the team's offensive and defensive efficiency, while accounting for other factors, such as game pace and opponent strength.
  • Player Ratings: We quantify the value of each player to his team on both offense and defense. A player's ratings incorporate his efficiency statistics, along with his impact on the court for his team, which is assessed by looking at how successful his team was in every possession he played. These ratings account for the strength of all other players on the floor with that player in each of his possessions that he played.
  • Team Breakdown Tool: This tool provides a more detailed look into the effect that each player had on his team performance.
  • Game Analysis: This page looks at advanced efficiency statistics from every game a team played in the season.

Now for some more detail into how we get these numbers:

Collecting Data

We have box score data available for every game played in the each college basketball season, along with play-by-play data, which includes substitutions. The possession by possession data is the main component used to drive our analysis.

Discarding unhelpful possessions

One key step that we take to gain the best predictions from our data is to only look at possessions in a game that “mattered”. Analyzing possessions when the game is already well out of hand isn't as valuable to us as possessions when the winner hasn't been decided yet. Through Luke Benz's R package ncaahoopR, we used the in-game naive win probability (which assumes that teams are equally matched) in order to assess when a game was out of hand. Once a team has a win probability of at least 99%, we start downweighting the possessions until the win probability is greater than 99.99%, at which point we discard all possessions entirely. In the rare situation where the losing team mounts a comeback and the win probability of the winning team sinks below 99%, we start giving each possession full weight again.

From a coach's perspective, every possession matters, even when your team has seemingly won or lost with minutes to spare. However, for predictive purposes, we can't properly assess the strength of a team when both teams aren't putting their normal lineups in or aren't playing as hard as they might if the outcome of the game were still in question.

Team Ratings

The purpose behind the Bayesian Performance Rating (BPR) at a team level is to provide each team a true offensive and a true defensive rating that best explains all of the real game results that we observed from the season.. These can be used, along with the BPR ratings of the opposing team, to estimate each team's expected offensive and defensive efficiency (points scored per 100 possessions) in a game. Taking the possession by possession results from each game, and adjusting for home court advantage (more on that in a moment), we run a bayesian regression to find the offensive (OPBR) and defensive (DBPR) coefficients for each team. These coefficients are designed to have 0 as the national average. Thus, very good teams will have higher positive offensive and defensive ratings. A team's overall BPR is just the sum of its OBPR and DBPR.

For example, from the 2019-20 season, 4th ranked Baylor's calculated OBPR was 30.2, and their DBPR was 35.9. On the other hand, 319th ranked Idaho had an OBPR of -23.0 and a DBPR of -13.5.

Calculating Team Efficiencies

In a neutral court setting, the expected efficiencies for the home and away teams can be calculated using the OBPR and DBPR as follows: \[ E[H_{OffEff}] = (H_O - A_D) / 2 + 100\] \[ E[A_{OffEff}] = (A_O - H_D) / 2 + 100\]

In the above formulas, \(H_O\) and \(H_D\) are the home team's Offensive BPR and Defensive BPR respectively, and \(A_O\) and \(A_D\) are the away team's OBPR and DBPR. To calculate the expected home team offensive efficiency, we take the home team's offensive rating, subtract the away team's defensive rating, then divide by 2 and add 100. (If you are unfamiliar with the notation, \(E[]\) just means “Expected”).

For example, if Michigan State is playing Kansas on a neutral court, and Michigan State has an Offensive BPR of 40 and a Defensive BPR of 30, and Kansas has an offensive rating of 30 and a defensive rating of 50, then we can calculate Michigan State's expected offensive efficiency as

\[ E[\textrm{MSU}_{OffEff}] = (40 - 50) / 2 + 100 = 95\]

and Kansas's expected offensive efficiency is

\[ E[\textrm{Kansas}_{OffEff}] = (30 - 30)/ 2 + 100 = 100\]

If Michigan State and Kansas get 70 offensive possessions each in a game, then the predicted score for Michigan State would be 95 * (70 / 100) = 66.5, and the predicted score for Kansas would be 100 * (70 / 100) = 70. So Kansas would be predicted to win by 3.5 points on a neutral court.

Adjusting for home court

On average, teams playing on their home court score about 2.6 points per 100 possessions better on both sides of the ball than if they were playing on a neutral court. So, as a starting point, we can automatically assume that a home team will have a performance boost of about 2.6 points per 100 possessions in both their offensive and defensive efficiencies. In a game with 70 possessions for each team, which is near the national average, this equates to a home court advantage worth about 3.64 points \(((2.6 + 2.6) * 70/100)\).

Some teams perform better at home than others, so we can find team-specific home court advantages using a Bayesian model with a prior mean of 2.6. We sometimes utilize these team-specific home court advantages when computing the team ratings.

True Tempo

When predicting game scores, we also want to adjust for the pace of the game instead of assuming there will be 70 possessions for each team. Similar to our model for calculating expected team efficiencies, we want to calculate the expected number of possessions in a game as follows:

\[ E[\textrm{Possessions}] = (H_T + A_T)/2\]

\(H_T\) is the “True Tempo” of the home team and \(A_T\) is the True Tempo of the away team. We simply take the average of the home team true tempo and the away team true tempo to predict the number of possessions each team will have in the game.

Using BPR for Game Prediction

Let's tie all of these concepts together to predict the score of Dayton vs. Gonzaga in the 2019-2020 season, played on Dayton's home court. Note: our actual game prediction algorithm has a bit more complexity under the hood, but using the method below will get you pretty close to our prediction:

First, we will start by predicting Dayton's offensive efficiency in this game. Dayton has an OBPR of 43.1, and Gonzaga has a DBPR of 21.6. On a neutral court, we would expect Dayton's offensive efficiency (points per 100 possessions) in this game to be

\[ E[\textrm{Dayton}_{OffEff}] = (43.1 - 21.6)/2 + 100 = 110.8\]

The Zags have an OBPR of 56.2 and the Flyers have a DBPR of 19.7, which leads to

\[ E[\textrm{Gonzaga}_{OffEff}] = (56.2 - 19.7)/2 = 118.3\]

To adjust for Dayton's home court advantage, we add 2.6 points to Dayton's offensive efficiency and subtract 2.6 points from Gonzaga's, which gives us

\[ E[\textrm{Dayton}_{OffEff}] = 110.8 + 2.6 = 113.4\] \[ E[\textrm{Gonzaga}_{OffEff}] = 118.3 - 2.6 = 115.7\]

Now we need to predict how many offensive possessions each team will have. Dayton's True Tempo is 68.1 and Gonzaga's is 75.0. We take the average of these to get our expected possession count:

\[ E[\textrm{Possessions}] = (68.1 + 75.0)/2 = 71.6\]

Now we can finally predict the score by multiplying each team's expected offensive efficiency by the expected number of possessions we just calculated, divided by 100:

\[ E[\textrm{Dayton}_{Score}] = 113.4 * \left(71.6/100\right) = 81.2\] \[ E[\textrm{Gonzaga}_{Score}] = 115.7 * \left(71.6/100\right) = 82.8\]

Gonzaga is predicted to beat Dayton by 1.6 points in a nailbiter.

Player Ratings

In the Bayesian Performance Rating for players, each player has an Offensive BPR and a Defensive BPR, which are added together to make the player's overall BPR. Player BPR has two components: player impact and player efficiency.

The player impact part of BPR attempts to quantify a player's value to his team by looking at how efficiently his team performed on offense and defense for every possession he played. In addition, we want to adjust for the strength of his teammates on the court with him, along with the strength of opposing players for each possession he was on the court. There are some good existing advanced metrics that attempt to do this, such as Adjusted Plus-Minus. This type of metric focuses on the idea that a player's contribution to his team's margin of victory matters most. APM does not use any individual player statistics, but instead utilizes the score outcome of each possession to determine what players are better than others at positively affecting the outcome of the game, in the form of offensive and defensive efficiency. Our player impact ratings are created in a similar fashion, but we make a few adjustments to negate some of the weaknesses of this type of model, which we will explain later on.

Similar to the BPR team ratings, we want to assign a “true” offensive and defensive rating to each player, which indicates his value to his team when he is on the court. If we have five home players and five away players on the floor, and each player has an Offensive BPR and Defensive BPR, then we will define \(H_{1O}\) and \(H_{1D}\) as the OBPR and DBPR for home team player 1, \(H_{2O}\) and \(H_{2D}\) as the ratings for home team player 2, and so on. The same goes for away team players, as \(A_{1O}\) and \(A_{1D}\) are the ratings for away team player 1. For any 10 players on the court for a given possession, we can calculate the expected team efficiencies (points per 100 possessions) with those players on the court as follows:

\[ E[H_{OffEff}] = \frac{(H_{1O} + H_{2O} + H_{3O} + H_{4O} + H_{5O}) - (A_{1D} + A_{2D} + A_{3D} + A_{4D} + A_{5D})}{10} + 100\]

and

\[ E[A_{OffEff}] = \frac{(A_{1O} + A_{2O} + A_{3O} + A_{4O} + A_{5O}) - (H_{1D} + H_{2D} + H_{3D} + H_{4D} + H_{5D})}{10} + 100\]

In this formula, all five offensive and defensive players equally contribute to the expected outcome of a possession, based on each player's OBPR and DBPR. Using this model, we want to find an offensive and defensive rating for each player that can best explain the results from every possession that occurred from the season. Using the possession by possession results from each game, along with our information about who was on the court for each possession, a bayesian regression finds the offensive and defensive coefficients for each player. Very good players will have higher positive offensive and defensive ratings, with the average D1 player OBPR and DBPR being set at 0.

The main draw of this type of model is that we not only assess the value of a player to his team, but also account for the strength of the other teammates he shares the court with, along with the strength of the opponent players he faces. If we were to look at a more crude measure of player impact, like plus-minus or basic team efficiency when he is on the floor, it can be helpful, but doesn't answer questions such as “did he play with good teammates or bad teammates?” and “Did he play so well because he only played in garbage time against inferior opponents?”. By using a model that adjusts for the strength of all players on the court, we can more accurately assess the value that a player brings to his team when he is on the court.

There are a few shortcomings to this model the way things currently stand. One issue is that there is a lot of “noise” in this data. Due to the randomness of basketball possessions, it can be difficult to know whether a player rating estimate reflects the truth about that player's ability or is due to random chance. The model can “overfit” the data, leading to conclusions about players that just don't make sense when compared to the eye-test. For example, a deep-bench player who happened to be on the court for a handful of minutes when his team outscored the opponent 20-0 could be given an incredibly high rating because it appears that his appearance was what made the difference for his team. To account for this, we use a bayesian approach by setting a prior distribution for each player's OBPR and DBPR centered at 0, so that players who don't play many much will having ratings near 0, while those who have more substantial playing time can have their ratings move away from 0 as more information about their impact is accrued throughout the season. The informativeness of the prior distribution was decided using cross-validation.

Another issue with the player impact model is that it relies heavily on the assumption that a roster of players will frequently rotate in and out of the game so that we benefit from seeing lots of different lineup combinations, allowing us to distinguish each individual's impact on his team, when compared to his teammates. These player ratings become less reliable when there are pairs of teammates who are almost always on the court together, or rarely every share possessions together. In situations where player A and B are on the court together 95\% of the time, it is difficult to distinguish which teammate is having the larger impact for his team.

This is where the player efficiency portion of BPR comes into play. We want to use both the information about player impact on a play-by-play level, along with individual box score statistics, in the form of player efficiency metrics, to come up with the best predictive ratings possible. A widely acknowledged advanced efficiency metric used in basketball is Player Efficiency Rating (PER), created by John Hollinger. PER uses all of a player's individual statistics in a season in order to come up with a single number that best represents his contribution. Though PER isn't perfect, it is easy to calculate and can give us a good starting point for evaluating a player's statistical worth. Though we don't want to use PER as the final representation of a player's performance, we can still use the metric to help guide our final Bayesian Performance Ratings by creating an informative prior distribution on a player's rating based on his PER. By using data from the past several years, we can see how well PER functions as a predictor for BPR. Then, instead of each player's prior distribution for OBPR and DBPR being centered at 0, we can center it at a value predicted by the PER rating of that player. The graph below shows the relationship between PER and player impact rating for the 2018-2019 season:

plot of chunk unnamed-chunk-1

This technique has turned out to be incredibly beneficial at generating player ratings that more accurately represent both the value and skill of each player at the offensive and defensive end. An example of this is 2018-2019 Brandon Clarke, who had a tremendous season for Gonzaga before becoming a first round draft pick. In the player impact ratings, he is ranked 10th best in the country for 2018-2019. However, once we use his high PER rating of 37.3 to inform his prior distribution for offensive and defensive ratings, he finishes 2nd in the country in our final BPR, behind Zion Williamson. Zach Norvell, a fellow teammate of his, sees his ranking drop from 7th to 13th once we incorporate his PER for the year, which was only 21.6.

Using PER to influence our ratings doesn't change the fact that we can still easily detect good performances from players who otherwise may not fill up the stat sheet. A prime example of an underrated “intangibles” guy is 2019-2020 Alabama forward Herbert Jones, who had the highest DBPR and third highest overall BPR that year, despite only having a PER of 15.2. The degree to which he elevated his team's performance when he was on the court was astronomic, compared to Alabama's numbers without him.

Using the Team Breakdown Tool

The Team Breakdown tool is used to gain detailed insights into the performance of a team, broken down player by player. This is especially helpful when trying to explain the offensive and defensive ratings assigned to each player.

Here is the recommended approach for using the Team Breakdown:

  1. Team Overview
    • This tab provides an overview of the key ratings and stats for each player.
      • BPR is the all-encompassing rating that assesses the performance of each player and the value he brings to his team when he is on the floor. This metric accounts for the strength of all other players on the floor at the same time as that player, along with the player's individual efficiency statistics.
      • Team Off-Def Eff is a good starting point for explaining the BPR of a player. This is the difference between team offensive and defensive efficiency with player on the court. In other words, it tells us how many more points per 100 possessions the team scored compared to the opponent when that specific player was playing. However, this value does not account for who else was on the court with that player.
      • Plus-Minus is another simple way to look at player effectiveness, with more weight given to players who played more possessions. However, this metric also does not account for other players on the floor at the same time.\ \
  2. Teammate Chemistry
    • This tab provides a unique look at how effective pairs of players were when they shared the floor together.
      • Team Off-Def Eff tells us how effective the team was with that player pair on the court. By default, the rows in this page are sorted by this metric in order to tell us which player pairs were most effective for the team. Once again, this is simply looking at how well the team played compared to the opponent when those two players were on the court together.
      • Chemistry is a metric that reflects how much better than average the team performs when these two players on the court together, compared to team averages when they are on the court individually. In other words, when Player A and Player B were on the court together, was the team more efficient than what we would expect if we looked at Player A and Player B's numbers individually? A chemistry score of 0 indicates that the team performed no better or worse than expected when the two players played together.
      • Weighted Chemistry weights the Chemistry score by the number of possessions that were played by the pair of teammates to give a more accurate assessment of the teammate chemistry. The problem with the Chemistry score is that if a pair of teammates only played one possession together and outscored the opponent 2-0, they would have a ridiculously high chemistry score. The weighted chemistry gives more confidence in our assessment of chemistry to player pairs who were on the court a lot together.
    • When evaluating the effectiveness of a particular player, look at where his name appears in the Teammate Chemistry tab ranking when sorted by any of these three metrics. If his name appears in the top half of the ranking more often than the bottom half, this indicates that he was one of the more effective players on his team.\ \
  3. Individual Analysis
    • This tab takes things a step further by looking at one player at a time, seeing how the team performed when a player was paired with each of his teammates. When analyzing Player A, we can use this tab to answer two questions:
    • Which players did Player A play better or worse with when he was with them on the court?
    • Which players played better or worse when they were on the court with Player A?
      • Team Off-Eff Def tells us how effective the team was when Player A played with a particular player. If we sort by this metric, it will tell us which teammates Player A performed better with, and which he performed worse with.
      • Above / Below Average measures how much better the teammate played when he was on the court with the player, compared to the teammate's average play. A rating of 0 indicates that the player played just as well with Player A as he did in general. A higher rating indicates that Player A being on the court made his teammate perform better than normal, and a lower rating indicates the opposite. This is a powerful tool that can assess how a player impacted his teammates. A really good player will often make all of his teammates perform better than they otherwise would. One limitation to this is in the situation when Player A shared almost all of his minutes with another player, which would give an Above / Below Average rating of close to zero, since we can't tell as easily if his teammate's performance was helped or hurt by playing with Player A.
      • Off Poss and Def Poss can be helpful in determining which teammates the player played the most with. If he spent most of his playing time with the bench players instead of the starters, this could explain why his BPR is not as high as other teammates.

Further research

There are several factors that we are assessing currently or in the near future:

  • Importance of offensive quality vs defensive quality
    • Should we put more weight on the quality of the offense over the quality of the defense when predicting the outcome of a possession? We are experimenting with using different weights on offense and defense to see if this improves our prediction error. For now, we are treating offensive and defensive ratings equally when computing our player ratings.
  • Tempo, opponent strength, play style, and other covariates
    • Do certain teams or players perform better when games are played at a higher tempo? What about when they play a team that shoots a lot of threes? Does a team always play down to competition weaker than them? These are all covariates that we would like to account for in upcoming updates to our model.

About Me

My name is Evan Miyakawa, and I have my masters degree in statistics and am currently working on my dissertation for my doctorate in statistics at Baylor University. I graduated with my Bachelor's Degree from Taylor University. You can find out more on my LinkedIn page.

Note that this project is not a part of my dissertation work but is something I’ve been putting together in my free time.

Contact

Feel free to email me at evanmiyakawa@gmail.com with any questions. You can also find me on Twitter.