How To: Adjusting NBA Teams' Offensive and Defensive Ratings using Strength of Schedule
Sravan January 01, 2024 [NBA] #strength-of-schedule #team-ratings #tutorialThis tutorial goes through my process of adjusting NBA Teams' Offensive and Defensive Ratings for strength of schedule (SoS). First read my blog post on the same topic, before going any further. The blog post explains the details including the math necessary for understanding the code in this tutorial.
The code for RAPM styled approach is adopted from Ryan Davis' RAPM Tutorial and I suggest you read that tutorial before continuing.
If you want to run the code yourself while reading the tutorial, you can find the notebook version of this tutorial on my github:
First let's import the necessary packages to run this code:
# for processing data
# for numerical operations on arrays
# gives up progress bar
# for time related stuff
# don't raise warnings when chaining pandas operations
= None
Then we will load the team information as two variable. There are 30 teams in the NBA and each team has a name and a team ID
teams_listwill have a list of all team IDsteams_dictis a dictionary mapping the team IDs to the team names.
=
=
=
=
Scraping the Data Required
This section will cover the scraping part of the tutorial. You can skip the tutorial and go to the next section if you wish so. The data has already been scraped and is available for the 2023-24 season in the data folder.
We will be using the nba_api to get the necessary data. It should be installed already if you followed the instructions in Readme.
The team ratings i.e. offensive, defensive and net ratings can be found for each game by using the boxscoreadvancedv3 endpoint. This endpoint needs needs the GameID to get the boxscores for both teams in that game. To get GameIDs for all games played in the 2023-24 season, we will use the leaguegamelog endpoint.
# for 2023-24 season
=
# get the information
=
# output the information as pandas dataframe
=
# get the GameIDs as a list
=
# GameIDs are repeated twich, once for home team and once for away team
# We can use numpy unique to remove the duplicates
=
Now we have a list of game_ids to use in boxscoreadvancedv3 endpoint. We just put the game_ids in a for loop to get the data for each game as a dataframe. We append the generated dataframe for each game to a list of dataframes dfa. Finally we can use pandas.concat to concatenate all the dataframes into a single dataframe for the season.
This process might take a while (10-20 minutes, depending on the number of games played), so grab a coffee or a snack and come back after some time.
There is a small (maybe big) issue, if you just run a vanilla for loop. The stats.nba.com endpoint we use to scrape the data, times out when requested too many times in a short period of time and results in a error:
HTTPSConnectionPool(host='stats.nba.com', port=443): Read timed out. (read timeout=30)
Any error will stop the for loop and we have to repeat again. To prevent this issue, we wrap the call to the endpoint in try except blocks and retry the endpoint for that gameId till it succeeds.
I found an elegant solution for this issue while creating this tutorial which is to use the tenacity package.
- We import the necessary modules from tenacity:
retry: decorator to enable retries on the functionstop_after_attempt: to define the maximum number of attempts. I set it as5wait_fixed: to wait for a certain amount of fixed time before retrying. The number I use is0.6seconds as recommended by the authors of thenba_api
- We add the
retrydecorator with the necessary options to theget_boxscoresfunction, which has thetryexceptblock to handle errors
=
=
return
- Now we run the
forloop with the decoratedget_boxscoresfunction. Finally, we save the scraped data as acsvfile in the data folder.
=
=
=
13%|█▎ | 49/363 [00:59<06:26, 1.23s/it]
HTTPSConnectionPool(host='stats.nba.com', port=443): Max retries exceeded with url: /stats/boxscoreadvancedv3?EndPeriod=0&EndRange=0&GameID=0022300050&RangeType=0&StartPeriod=0&StartRange=0 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x00000207782F5090>, 'Connection to stats.nba.com timed out. (connect timeout=30)'))
100%|██████████| 363/363 [05:27<00:00, 1.11it/s]
Loading and Pre-Processing the Data
Now lets load the data. The data has a lot of columns we don't use. So to we import only the data necessary by using the usecols option in pandas.read_csv().
=
=
=
=
=
| gameId | tId | team | ORtg | DRtg | NRtg | poss | |
|---|---|---|---|---|---|---|---|
| 0 | 22300001 | 1610612754 | Pacers | 118.6 | 112.6 | 6.0 | 102.0 |
| 1 | 22300001 | 1610612739 | Cavaliers | 112.6 | 118.6 | -6.0 | 103.0 |
| 2 | 22300002 | 1610612749 | Bucks | 110.0 | 104.0 | 6.0 | 100.0 |
| 3 | 22300002 | 1610612752 | Knicks | 104.0 | 110.0 | -6.0 | 101.0 |
As you see the printed table, each gameId has two entries, one of each team in the game. Each row has only the information for that team. But what we need is a combined row entry with the opponent information also.
We will use pandas.groupby to achieve that. The variable to apply the operation will be gameId. This operation will create a groupby object, on which further operations can be run.
=
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000207787B7710>
We then use the nth operation to get the 1st and 2nd rows of each game.
=
=
| gameId | tId | team | ORtg | DRtg | NRtg | poss | |
|---|---|---|---|---|---|---|---|
| 0 | 22300001 | 1610612754 | Pacers | 118.6 | 112.6 | 6.0 | 102.0 |
| 2 | 22300002 | 1610612749 | Bucks | 110.0 | 104.0 | 6.0 | 100.0 |
| gameId | tId | team | ORtg | DRtg | NRtg | poss | |
|---|---|---|---|---|---|---|---|
| 1 | 22300001 | 1610612739 | Cavaliers | 112.6 | 118.6 | -6.0 | 103.0 |
| 3 | 22300002 | 1610612752 | Knicks | 104.0 | 110.0 | -6.0 | 101.0 |
We can then rename the columns of the 1st dataframe, adding 1 to all its column names, except the gameId column (which is needed for the merging operation later). For the 2nd dataframe, similarly add 2 to the columns names.
= +
= +
| gameId | tId1 | team1 | ORtg1 | DRtg1 | NRtg1 | poss1 | |
|---|---|---|---|---|---|---|---|
| 0 | 22300001 | 1610612754 | Pacers | 118.6 | 112.6 | 6.0 | 102.0 |
| 2 | 22300002 | 1610612749 | Bucks | 110.0 | 104.0 | 6.0 | 100.0 |
| gameId | tId2 | team2 | ORtg2 | DRtg2 | NRtg2 | poss2 | |
|---|---|---|---|---|---|---|---|
| 1 | 22300001 | 1610612739 | Cavaliers | 112.6 | 118.6 | -6.0 | 103.0 |
| 3 | 22300002 | 1610612752 | Knicks | 104.0 | 110.0 | -6.0 | 101.0 |
We then merge the two dataframes df1_1 and df1_2 on the column gameId, generating the dataframe we need.
=
| gameId | tId1 | team1 | ORtg1 | DRtg1 | NRtg1 | poss1 | tId2 | team2 | ORtg2 | DRtg2 | NRtg2 | poss2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 22300001 | 1610612754 | Pacers | 118.6 | 112.6 | 6.0 | 102.0 | 1610612739 | Cavaliers | 112.6 | 118.6 | -6.0 | 103.0 |
| 1 | 22300002 | 1610612749 | Bucks | 110.0 | 104.0 | 6.0 | 100.0 | 1610612752 | Knicks | 104.0 | 110.0 | -6.0 | 101.0 |
One more step remaining. What we have right now is one row of each game. But, what we need is two rows for each game as described in my blog post. To get that dataframe, we repeat the process above, with 0 and 1 flipped when performing the nth operation. Finally we merge the two dataframes df1_3 and df1_6, to get the combined dataframe with two rows for each game.
=
=
= +
= +
=
=
=
| gameId | tId1 | team1 | ORtg1 | DRtg1 | NRtg1 | poss1 | tId2 | team2 | ORtg2 | DRtg2 | NRtg2 | poss2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 22300001 | 1610612754 | Pacers | 118.6 | 112.6 | 6.0 | 102.0 | 1610612739 | Cavaliers | 112.6 | 118.6 | -6.0 | 103.0 |
| 1 | 22300001 | 1610612739 | Cavaliers | 112.6 | 118.6 | -6.0 | 103.0 | 1610612754 | Pacers | 118.6 | 112.6 | 6.0 | 102.0 |
| 2 | 22300002 | 1610612752 | Knicks | 104.0 | 110.0 | -6.0 | 101.0 | 1610612749 | Bucks | 110.0 | 104.0 | 6.0 | 100.0 |
| 3 | 22300002 | 1610612749 | Bucks | 110.0 | 104.0 | 6.0 | 100.0 | 1610612752 | Knicks | 104.0 | 110.0 | -6.0 | 101.0 |
Processing the Data
To process the data in a format required by the Ridge Regression algorithm RidgeCV, we define the following functions:
maps_teams()
- Makes the matrix rows to be used in ridge regression
- The weights for each team = 1/2
- Equations per game are:
$$\frac{1}{2}\hat{Team}^1_{OFF} + \frac{1}{2}\hat{Team}^2_{DEF} = Team^1_{OFF} $$ $$\frac{1}{2}\hat{Team}^2_{OFF} + \frac{1}{2}\hat{Team}^1_{DEF} = Team^2_{OFF} $$ - The reason for doing this is that for unadjusted values of a game: $$ Team^1_{OFF} = Team^2_{DEF} $$
- So, $$ Team^1_{OFF} = 0.5\times Team^1_{OFF} + 0.5\times Team^2_{DEF} $$
- Therefore I use a similar structure for estimating adjusted ratings
=
=
=
=
=
return
convert_to_matrices()
- Converts each row of data dataframe to x stints.
- Then maps those rows using
map_teamsfunction to get matrix X rows - Gets Y rows. Here Y is
ORtg1i.e. we are trying to predict the offensive rating of the 1st team for every row
# extract only the columns we need
# Convert the columns of player ids into a numpy matrix
=
# Apply our mapping function to the numpy matrix
=
# Convert the column of target values into a numpy matrix
=
# return matricies and possessions series
return ,
lambda_to_alpha()
- In stats world (
R),glmnet()is used for Ridge Regression and uses the parameter $\lambda$. Most the NBA stats people use this parameter $\lambda$ for discussing the regularization parameter. Butsklearn.linear_model.RidgeCV()has a parameter $\alpha$, which isn't the same. - So we need to convert $\lambda$ to $\alpha$ needed for Ridge CV. More details here
return / 2.0
calculate_netrtg()
- Converts lambdas to alphas using
lambda_to_alphafunction - Defines the ridge regression problem using
scikit-learn'sRidgeCValgorithm cv=5is chosen i.e. k-fold cross-validation splitting strategy usingk=5Interceptis set as true. This value is to be added later to our estimation results to get Offensive and Defensive ratings.- Gets coefficients and intercept
- Add intercept to intercept to get adjusted ratings. Use adjusted off and def ratings to calculate adjusted net rating.
- Create and return adjusted ratings dataframe
=
# create a 5 fold CV ridgeCV model. Our target data is not centered at 0, so we want to fit to an intercept.
=
# fit our training data
=
# convert our list of players into a mx1 matrix
=
# extract our coefficients into the offensive and defensive parts
= .
= .
# concatenate the offensive and defensive values with the playey ids into a mx3 matrix
=
# build a dataframe from our matrix
=
=
=
= -
= +
= +
=
=
=
return , ,
Estimating Adjusted Ratings
Next, we run the functions defined above to generated the adjusted ratings
, =
=
, , =
Intercept = 114.2197043446658
The intercept here can be interpreted as the league average offensive/defensive rating. Here are the adjusted ratings.
| tId | Team | aOFF | aDEF | aNET | |
|---|---|---|---|---|---|
| 0 | 1.610613e+09 | Philadelphia 76ers | 121.065207 | 110.772873 | 10.292335 |
| 1 | 1.610613e+09 | Boston Celtics | 118.828331 | 108.764236 | 10.064095 |
| 2 | 1.610613e+09 | Oklahoma City Thunder | 117.702690 | 110.814878 | 6.887812 |
| 3 | 1.610613e+09 | Minnesota Timberwolves | 113.207243 | 106.628440 | 6.578803 |
| 4 | 1.610613e+09 | Denver Nuggets | 118.395144 | 113.090987 | 5.304157 |
| 5 | 1.610613e+09 | LA Clippers | 115.366218 | 111.064890 | 4.301329 |
| 6 | 1.610613e+09 | Orlando Magic | 113.446035 | 109.345141 | 4.100894 |
| 7 | 1.610613e+09 | New York Knicks | 117.214095 | 113.291210 | 3.922885 |
| 8 | 1.610613e+09 | Houston Rockets | 111.781191 | 107.967128 | 3.814063 |
| 9 | 1.610613e+09 | Milwaukee Bucks | 118.657846 | 115.338466 | 3.319381 |
| 10 | 1.610613e+09 | Brooklyn Nets | 116.937071 | 114.575413 | 2.361658 |
| 11 | 1.610613e+09 | Indiana Pacers | 122.514626 | 120.553872 | 1.960754 |
| 12 | 1.610613e+09 | Dallas Mavericks | 118.932015 | 117.355888 | 1.576127 |
| 13 | 1.610613e+09 | New Orleans Pelicans | 114.101714 | 113.092424 | 1.009290 |
| 14 | 1.610613e+09 | Golden State Warriors | 114.940190 | 114.182474 | 0.757716 |
| 15 | 1.610613e+09 | Miami Heat | 114.132399 | 113.518409 | 0.613991 |
| 16 | 1.610613e+09 | Atlanta Hawks | 118.941745 | 118.485097 | 0.456648 |
| 17 | 1.610613e+09 | Phoenix Suns | 116.960966 | 116.528575 | 0.432391 |
| 18 | 1.610613e+09 | Cleveland Cavaliers | 111.023001 | 110.911526 | 0.111475 |
| 19 | 1.610613e+09 | Los Angeles Lakers | 112.332601 | 112.222641 | 0.109959 |
| 20 | 1.610613e+09 | Sacramento Kings | 115.306403 | 115.211154 | 0.095249 |
| 21 | 1.610613e+09 | Toronto Raptors | 112.389538 | 114.521311 | -2.131772 |
| 22 | 1.610613e+09 | Chicago Bulls | 111.271702 | 115.736148 | -4.464446 |
| 23 | 1.610613e+09 | Memphis Grizzlies | 106.322251 | 113.173911 | -6.851659 |
| 24 | 1.610613e+09 | Portland Trail Blazers | 106.938770 | 114.757369 | -7.818600 |
| 25 | 1.610613e+09 | Charlotte Hornets | 112.456339 | 120.395203 | -7.938864 |
| 26 | 1.610613e+09 | Utah Jazz | 110.533878 | 118.914809 | -8.380931 |
| 27 | 1.610613e+09 | Washington Wizards | 111.230457 | 120.661742 | -9.431285 |
| 28 | 1.610613e+09 | San Antonio Spurs | 107.330475 | 117.157667 | -9.827192 |
| 29 | 1.610613e+09 | Detroit Pistons | 106.330988 | 117.557249 | -11.226261 |
Finishing Touches
We're not done yet. Now we need to compare the adjusted ratings with the unadjusted ones. But, we haven't calculated the unadjusted ratings yet. Let's do it now.
For a single game: $$ PTS_{OFF}*100 = ORtg^1 \times poss^1 $$ $$ PTS_{DEF}*100 = DRtg^1 \times poss^1 $$
Applying these operations on the data dataframe:
= *
= *
We have to use the groupby operation again, now on the tId1 column. After the groupby operation, we chain an agg (aggregate) operation, which applies a function on all rows of the group. The function we chose here is sum, which adds all the pts and and poss for a team.
=
=
The unadjusted team ratings would then be: $$ OFF = \frac{PTS_{OFF}^{Total}}{poss^{Total}} $$ $$ DEF = \frac{PTS_{DEF}^{Total}}{poss^{Total}} $$
= /
=
= /
=
We then merge these ratings to the results_adj dataframe
=
= -
=
=
=
=
=
=
= -
= -
= +
=
=
=
= + 1
Reminder
You can find the notebook version of this tutorial on my github: (https://github.com/sravanpannala/NBA-Tutorials/blob/main/sos_adjusted_ratings/how_to_adjust_nba_team_ratings_for_sos.ipynb
Final Combined Data table:
You can save it as csv file and then you some fancy visualization tool to create a pretty looking table and/or efficiency landscape graph
| Team | OFF | oSOS | aOFF | DEF | dSOS | aDEF | NET | SOS | aNET | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Philadelphia 76ers | 121.2 | -0.1 | 121.1 | 110.9 | 0.2 | 110.8 | 10.3 | 0.0 | 10.3 |
| 2 | Boston Celtics | 118.3 | 0.5 | 118.8 | 109.6 | 0.9 | 108.8 | 8.7 | 1.4 | 10.1 |
| 3 | Oklahoma City Thunder | 117.6 | 0.1 | 117.7 | 110.6 | -0.2 | 110.8 | 7.0 | -0.1 | 6.9 |
| 4 | Minnesota Timberwolves | 113.3 | -0.1 | 113.2 | 106.6 | -0.1 | 106.6 | 6.7 | -0.2 | 6.6 |
| 5 | Denver Nuggets | 117.3 | 1.1 | 118.4 | 112.6 | -0.5 | 113.1 | 4.7 | 0.6 | 5.3 |
| 6 | LA Clippers | 115.4 | -0.1 | 115.4 | 110.6 | -0.5 | 111.1 | 4.8 | -0.5 | 4.3 |
| 7 | Orlando Magic | 113.8 | -0.4 | 113.4 | 109.5 | 0.2 | 109.3 | 4.3 | -0.2 | 4.1 |
| 8 | New York Knicks | 117.3 | -0.1 | 117.2 | 113.3 | 0.0 | 113.3 | 4.0 | -0.1 | 3.9 |
| 9 | Houston Rockets | 111.4 | 0.3 | 111.8 | 107.4 | -0.5 | 108.0 | 4.0 | -0.2 | 3.8 |
| 10 | Milwaukee Bucks | 119.3 | -0.7 | 118.7 | 115.7 | 0.4 | 115.3 | 3.6 | -0.3 | 3.3 |
| 11 | Brooklyn Nets | 116.9 | 0.0 | 116.9 | 115.0 | 0.4 | 114.6 | 2.0 | 0.4 | 2.4 |
| 12 | Indiana Pacers | 122.4 | 0.1 | 122.5 | 120.1 | -0.4 | 120.6 | 2.3 | -0.3 | 2.0 |
| 13 | Dallas Mavericks | 118.5 | 0.4 | 118.9 | 116.4 | -1.0 | 117.4 | 2.2 | -0.6 | 1.6 |
| 14 | New Orleans Pelicans | 114.1 | 0.0 | 114.1 | 113.1 | -0.0 | 113.1 | 1.0 | -0.0 | 1.0 |
| 15 | Golden State Warriors | 114.0 | 1.0 | 114.9 | 114.2 | 0.0 | 114.2 | -0.2 | 1.0 | 0.8 |
| 16 | Miami Heat | 114.7 | -0.6 | 114.1 | 113.5 | -0.0 | 113.5 | 1.2 | -0.6 | 0.6 |
| 17 | Atlanta Hawks | 118.7 | 0.2 | 118.9 | 118.8 | 0.4 | 118.5 | -0.1 | 0.6 | 0.5 |
| 18 | Phoenix Suns | 116.6 | 0.4 | 117.0 | 115.4 | -1.1 | 116.5 | 1.2 | -0.7 | 0.4 |
| 19 | Los Angeles Lakers | 112.2 | 0.1 | 112.3 | 111.8 | -0.5 | 112.2 | 0.4 | -0.3 | 0.1 |
| 20 | Sacramento Kings | 114.6 | 0.7 | 115.3 | 114.9 | -0.3 | 115.2 | -0.2 | 0.3 | 0.1 |
| 21 | Cleveland Cavaliers | 111.1 | -0.0 | 111.0 | 111.5 | 0.6 | 110.9 | -0.5 | 0.6 | 0.1 |
| 22 | Toronto Raptors | 112.7 | -0.3 | 112.4 | 114.9 | 0.4 | 114.5 | -2.2 | 0.1 | -2.1 |
| 23 | Chicago Bulls | 111.6 | -0.3 | 111.3 | 115.7 | -0.1 | 115.7 | -4.1 | -0.4 | -4.5 |
| 24 | Memphis Grizzlies | 106.5 | -0.2 | 106.3 | 112.7 | -0.5 | 113.2 | -6.2 | -0.7 | -6.9 |
| 25 | Portland Trail Blazers | 107.2 | -0.3 | 106.9 | 114.2 | -0.5 | 114.8 | -7.0 | -0.8 | -7.8 |
| 26 | Charlotte Hornets | 112.6 | -0.1 | 112.5 | 120.4 | -0.0 | 120.4 | -7.8 | -0.2 | -7.9 |
| 27 | Utah Jazz | 110.4 | 0.1 | 110.5 | 118.1 | -0.8 | 118.9 | -7.6 | -0.7 | -8.4 |
| 28 | Washington Wizards | 111.8 | -0.6 | 111.2 | 121.4 | 0.7 | 120.7 | -9.6 | 0.1 | -9.4 |
| 29 | San Antonio Spurs | 107.0 | 0.3 | 107.3 | 117.3 | 0.1 | 117.2 | -10.3 | 0.5 | -9.8 |
| 30 | Detroit Pistons | 106.8 | -0.5 | 106.3 | 117.8 | 0.2 | 117.6 | -11.0 | -0.2 | -11.2 |