Posted: March 9, 2019
The inspiration for this project was the Deadspin article The Confessions of an NBA Scorekeeper. The article details the story behind a Vancouver Grizzlies scorekeeper who, in 1997, awarded 23 (mostly undeserved) assists to visting Los Angeles Laker Nick Van Exel. Everyone who plays or watches basketball knows that some statistics - particularly assists - are subjective. An assist in the NBA “is awarded only if, in the judgment of the statistician, the last player’s pass contributed directly to a made basket". The NBA has even stepped in to correct for scorekeeper errors after the fact. Thus, given this subjectivity, and the fact that NBA scorekeepers are hired by the home teams to record statistics for each game, I became interested in quantifying the inconsistencies and potential biases of NBA scorekeepers. This interest was shared by my master's research supervisor at the time, Luke Bornn, and the project was a joint effort.
Our initial exploration was focused on assists and blocks: two potentially subjective statistics identified in the Deadspin article. We examined box score data for all Regular Season games in the 2015-2016 NBA season, scraped from ESPN.com using code available in the scrape_nba_box_scores repository. Then, since assists and blocks are highly dependent on field goals made (FGM) and opponent field goals attempted (FGA), we examined the assist ratio (AR) and block ration (BR) which we defined as
We suspected that there were two distinct components of scorekeeper inconsistencies: scorekeeper bias and scorekeeper generosity.
scorekeeper bias - how much more likely a scorekeeper is to award statistics to the home team compared to the away team
scorekeeper generosity - how likely a scorekeeper is to award statistics to either team
Thus a generous scorekeeper would award many assists to both teams while a biased scorekeeper would award more assists to the home team compared to the away team (or vice-versa if they are biased against their home team).
We also suspected one of the main factors controlling the AR and BR awarded by a scorekeeper would be the teams they observe. For example, since the Golden State Warriors had the highest AR value in 2015-2016, thus the Warriors' scorekeeper likely tended to award a higher AR to the home team than the away team. Similarly, some teams may have been better at forcing opponents into low AR or BR values. Additionally, there may have been an overall home advantage common to all teams due to improved play at home.
We estimated a linear model that attempted to quantify the bias and generosity of all 30 NBA scorekeepers, while controlling for the other factors discussed above. The model produced the estimated home and away AR and BR for each scorekeeper displayed in the figures below.
While these models provided some indication of scorekeeper tendencies, the AR model and the BR model had coefficient of determination (R2 values) of 0.279 and 0.228 respectively. Additionally, it was not clear if the scorekeeper bias effect truly measured scorekeeper bias or if it measured a team speciﬁc home effect. Thus, there was significant room for improvement in these modelling approaches.
Given that the context surrounding an assist (the pass, time with the ball, defender positioning, etc.) is instrumental to its attribution, we used spatio-temporal data to estimate a contextual assist model to predict the probability that an individual pass would would recorded as an assist. Specifically, we used SportVu optical player tracking data to estimate a model for 82,493 potential assists from the 2015-2016 NBA season, where we defined a potential assist to be any completed pass from a passer to a shooter who scored a field goal within 7 seconds of receiving the pass. The shooter was permitted to dribble and move after receiving the pass, as long as he maintained possession of the ball until the successful shot (no rebounds, turnovers, or additional passes). Note that while an inbounds pass can be credited as an assist, for simplicity we will only examined passes made while the ball was in play.
We used the following contextual covariates to predict the probability of a recorded assist:
We also used the identity and the position (PG, SG, SF, PF, or C) of the passer as additional covariates, and included the covariates from the previous linear model to estimate a logistic regression model to predict the probability that a potential assist would be recorded as an assist.
Using our models as classification models for potential assists, the contextual assist logistic regression model has a misclassification rate of only 6.6%, better than an equivalent model with no scorekeeper coefficients (7.0%) and a dramatic improvement over the linear model (34.4%). Thus, we concluded that our contextual assist model was able to accurately capture the factors leading to a recorded assist. Surprisingly, the scorekeeper coefficients from the logistic regression model were highly correlated with those from the linear model (0.892 for the generosity coefficients and 0.597 for the bias coefficients), indicating that our simpler model was able to detect scorekeeper effects. The scorekeeper coefficients were also correlated across seasons. Re-estimating the models across the 2013-2014 and 2014-2015 seasons produced between season correlation values of at least 0.776 for scorekeeper generosity coefficients and 0.460 for scorekeeper bias coefficients.
Turning to the contextual coefficients, the results tended to reflect intuition. Increasing the number of dribbles or the time between the pass and the shot both tended to lower the probability a pass was recorded as an assist (with time having a greater impact). Similar passes were also more likely to be recorded as assists when made by a point guard compared to a center. Individual players also had a wide range in the likelihood of their passes being recorded as assists, with an average potential assist being 27.41% more likely to be recorded as an assist for the player with the greatest individual coefficient (Nick Collison) to the player with the least (Andre Roberson).
We discovered evidence of inconsistencies in the recording of assists and blocks across NBA scorekeepers. The greater effect was scorekeeper generosity, with scorekeepers having a range of how likely they were to award statistics to either team. However, evidence of scorekeeper bias, both in favour of and against their corresponding home team, was also uncovered. We also produced methods for calculating the scorekeeper effects on player statistics at the individual player level, as well as a method for adjusting statistics to correct for such effects (these details are not presented here but are available in the paper linked below). While assists and blocks will not affect the outcome of any individual game, box score statistics are the baseline measures for player performance. Therefore, understanding the inconsistencies of box score statistics can lead to more accurate player valuations.
The final output of this project was the paper Adjusting for scorekeeper bias in NBA box scores (PDF) published in the journal Data Mining and Knowledge Discovery as part of a special issue on sports analytics.
This project caught the attention of several media outlets at the time, and continues to be referenced in articles examining scorekeeper influence on statistics. A selection of articles referencing the work is included below: