How To: Adjusting NBA Teams' Offensive and Defensive Ratings using Strength of Schedule
Sravan January 01, 2024 [NBA] #strength-of-schedule #team-ratings #tutorialThis tutorial goes through my process of adjusting NBA Teams' Offensive and Defensive Ratings for strength of schedule (SoS). First read my blog post on the same topic, before going any further. The blog post explains the details including the math necessary for understanding the code in this tutorial.
The code for RAPM styled approach is adopted from Ryan Davis' RAPM Tutorial and I suggest you read that tutorial before continuing.
If you want to run the code yourself while reading the tutorial, you can find the notebook version of this tutorial on my github:
First let's import the necessary packages to run this code:
# for processing data
# for numerical operations on arrays
# gives up progress bar
# for time related stuff
# don't raise warnings when chaining pandas operations
= None
Then we will load the team information as two variable. There are 30 teams in the NBA and each team has a name and a team ID
teams_list
will have a list of all team IDsteams_dict
is a dictionary mapping the team IDs to the team names.
=
=
=
=
Scraping the Data Required
This section will cover the scraping part of the tutorial. You can skip the tutorial and go to the next section if you wish so. The data has already been scraped and is available for the 2023-24 season in the data folder.
We will be using the nba_api
to get the necessary data. It should be installed already if you followed the instructions in Readme.
The team ratings i.e. offensive, defensive and net ratings can be found for each game by using the boxscoreadvancedv3
endpoint. This endpoint needs needs the GameID
to get the boxscores for both teams in that game. To get GameIDs
for all games played in the 2023-24 season, we will use the leaguegamelog
endpoint.
# for 2023-24 season
=
# get the information
=
# output the information as pandas dataframe
=
# get the GameIDs as a list
=
# GameIDs are repeated twich, once for home team and once for away team
# We can use numpy unique to remove the duplicates
=
Now we have a list of game_ids
to use in boxscoreadvancedv3
endpoint. We just put the game_ids
in a for
loop to get the data for each game as a dataframe. We append the generated dataframe for each game to a list of dataframes dfa
. Finally we can use pandas.concat
to concatenate all the dataframes into a single dataframe for the season.
This process might take a while (10-20 minutes, depending on the number of games played), so grab a coffee or a snack and come back after some time.
There is a small (maybe big) issue, if you just run a vanilla for
loop. The stats.nba.com
endpoint we use to scrape the data, times out when requested too many times in a short period of time and results in a error:
HTTPSConnectionPool(host='stats.nba.com', port=443): Read timed out. (read timeout=30)
Any error will stop the for
loop and we have to repeat again. To prevent this issue, we wrap the call to the endpoint in try
except
blocks and retry the endpoint for that gameId
till it succeeds.
I found an elegant solution for this issue while creating this tutorial which is to use the tenacity
package.
- We import the necessary modules from tenacity:
retry
: decorator to enable retries on the functionstop_after_attempt
: to define the maximum number of attempts. I set it as5
wait_fixed
: to wait for a certain amount of fixed time before retrying. The number I use is0.6
seconds as recommended by the authors of thenba_api
- We add the
retry
decorator with the necessary options to theget_boxscores
function, which has thetry
except
block to handle errors
=
=
return
- Now we run the
for
loop with the decoratedget_boxscores
function. Finally, we save the scraped data as acsv
file in the data folder.
=
=
=
13%|█▎ | 49/363 [00:59<06:26, 1.23s/it]
HTTPSConnectionPool(host='stats.nba.com', port=443): Max retries exceeded with url: /stats/boxscoreadvancedv3?EndPeriod=0&EndRange=0&GameID=0022300050&RangeType=0&StartPeriod=0&StartRange=0 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x00000207782F5090>, 'Connection to stats.nba.com timed out. (connect timeout=30)'))
100%|██████████| 363/363 [05:27<00:00, 1.11it/s]
Loading and Pre-Processing the Data
Now lets load the data. The data has a lot of columns we don't use. So to we import only the data necessary by using the usecols
option in pandas.read_csv()
.
=
=
=
=
=
gameId | tId | team | ORtg | DRtg | NRtg | poss | |
---|---|---|---|---|---|---|---|
0 | 22300001 | 1610612754 | Pacers | 118.6 | 112.6 | 6.0 | 102.0 |
1 | 22300001 | 1610612739 | Cavaliers | 112.6 | 118.6 | -6.0 | 103.0 |
2 | 22300002 | 1610612749 | Bucks | 110.0 | 104.0 | 6.0 | 100.0 |
3 | 22300002 | 1610612752 | Knicks | 104.0 | 110.0 | -6.0 | 101.0 |
As you see the printed table, each gameId
has two entries, one of each team in the game. Each row has only the information for that team. But what we need is a combined row entry with the opponent information also.
We will use pandas.groupby
to achieve that. The variable to apply the operation will be gameId
. This operation will create a groupby
object, on which further operations can be run.
=
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000207787B7710>
We then use the nth
operation to get the 1st and 2nd rows of each game.
=
=
gameId | tId | team | ORtg | DRtg | NRtg | poss | |
---|---|---|---|---|---|---|---|
0 | 22300001 | 1610612754 | Pacers | 118.6 | 112.6 | 6.0 | 102.0 |
2 | 22300002 | 1610612749 | Bucks | 110.0 | 104.0 | 6.0 | 100.0 |
gameId | tId | team | ORtg | DRtg | NRtg | poss | |
---|---|---|---|---|---|---|---|
1 | 22300001 | 1610612739 | Cavaliers | 112.6 | 118.6 | -6.0 | 103.0 |
3 | 22300002 | 1610612752 | Knicks | 104.0 | 110.0 | -6.0 | 101.0 |
We can then rename the columns of the 1st dataframe, adding 1
to all its column names, except the gameId
column (which is needed for the merging operation later). For the 2nd dataframe, similarly add 2
to the columns names.
= +
= +
gameId | tId1 | team1 | ORtg1 | DRtg1 | NRtg1 | poss1 | |
---|---|---|---|---|---|---|---|
0 | 22300001 | 1610612754 | Pacers | 118.6 | 112.6 | 6.0 | 102.0 |
2 | 22300002 | 1610612749 | Bucks | 110.0 | 104.0 | 6.0 | 100.0 |
gameId | tId2 | team2 | ORtg2 | DRtg2 | NRtg2 | poss2 | |
---|---|---|---|---|---|---|---|
1 | 22300001 | 1610612739 | Cavaliers | 112.6 | 118.6 | -6.0 | 103.0 |
3 | 22300002 | 1610612752 | Knicks | 104.0 | 110.0 | -6.0 | 101.0 |
We then merge the two dataframes df1_1
and df1_2
on the column gameId
, generating the dataframe we need.
=
gameId | tId1 | team1 | ORtg1 | DRtg1 | NRtg1 | poss1 | tId2 | team2 | ORtg2 | DRtg2 | NRtg2 | poss2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 22300001 | 1610612754 | Pacers | 118.6 | 112.6 | 6.0 | 102.0 | 1610612739 | Cavaliers | 112.6 | 118.6 | -6.0 | 103.0 |
1 | 22300002 | 1610612749 | Bucks | 110.0 | 104.0 | 6.0 | 100.0 | 1610612752 | Knicks | 104.0 | 110.0 | -6.0 | 101.0 |
One more step remaining. What we have right now is one row of each game. But, what we need is two rows for each game as described in my blog post. To get that dataframe, we repeat the process above, with 0
and 1
flipped when performing the nth
operation. Finally we merge the two dataframes df1_3
and df1_6
, to get the combined dataframe with two rows for each game.
=
=
= +
= +
=
=
=
gameId | tId1 | team1 | ORtg1 | DRtg1 | NRtg1 | poss1 | tId2 | team2 | ORtg2 | DRtg2 | NRtg2 | poss2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 22300001 | 1610612754 | Pacers | 118.6 | 112.6 | 6.0 | 102.0 | 1610612739 | Cavaliers | 112.6 | 118.6 | -6.0 | 103.0 |
1 | 22300001 | 1610612739 | Cavaliers | 112.6 | 118.6 | -6.0 | 103.0 | 1610612754 | Pacers | 118.6 | 112.6 | 6.0 | 102.0 |
2 | 22300002 | 1610612752 | Knicks | 104.0 | 110.0 | -6.0 | 101.0 | 1610612749 | Bucks | 110.0 | 104.0 | 6.0 | 100.0 |
3 | 22300002 | 1610612749 | Bucks | 110.0 | 104.0 | 6.0 | 100.0 | 1610612752 | Knicks | 104.0 | 110.0 | -6.0 | 101.0 |
Processing the Data
To process the data in a format required by the Ridge Regression algorithm RidgeCV
, we define the following functions:
maps_teams()
- Makes the matrix rows to be used in ridge regression
- The weights for each team = 1/2
- Equations per game are:
$$\frac{1}{2}\hat{Team}^1_{OFF} + \frac{1}{2}\hat{Team}^2_{DEF} = Team^1_{OFF} $$ $$\frac{1}{2}\hat{Team}^2_{OFF} + \frac{1}{2}\hat{Team}^1_{DEF} = Team^2_{OFF} $$ - The reason for doing this is that for unadjusted values of a game: $$ Team^1_{OFF} = Team^2_{DEF} $$
- So, $$ Team^1_{OFF} = 0.5\times Team^1_{OFF} + 0.5\times Team^2_{DEF} $$
- Therefore I use a similar structure for estimating adjusted ratings
=
=
=
=
=
return
convert_to_matrices()
- Converts each row of data dataframe to x stints.
- Then maps those rows using
map_teams
function to get matrix X rows - Gets Y rows. Here Y is
ORtg1
i.e. we are trying to predict the offensive rating of the 1st team for every row
# extract only the columns we need
# Convert the columns of player ids into a numpy matrix
=
# Apply our mapping function to the numpy matrix
=
# Convert the column of target values into a numpy matrix
=
# return matricies and possessions series
return ,
lambda_to_alpha()
- In stats world (
R
),glmnet()
is used for Ridge Regression and uses the parameter $\lambda$. Most the NBA stats people use this parameter $\lambda$ for discussing the regularization parameter. Butsklearn.linear_model.RidgeCV()
has a parameter $\alpha$, which isn't the same. - So we need to convert $\lambda$ to $\alpha$ needed for Ridge CV. More details here
return / 2.0
calculate_netrtg()
- Converts lambdas to alphas using
lambda_to_alpha
function - Defines the ridge regression problem using
scikit-learn
'sRidgeCV
algorithm cv=5
is chosen i.e. k-fold cross-validation splitting strategy usingk=5
Intercept
is set as true. This value is to be added later to our estimation results to get Offensive and Defensive ratings.- Gets coefficients and intercept
- Add intercept to intercept to get adjusted ratings. Use adjusted off and def ratings to calculate adjusted net rating.
- Create and return adjusted ratings dataframe
=
# create a 5 fold CV ridgeCV model. Our target data is not centered at 0, so we want to fit to an intercept.
=
# fit our training data
=
# convert our list of players into a mx1 matrix
=
# extract our coefficients into the offensive and defensive parts
= .
= .
# concatenate the offensive and defensive values with the playey ids into a mx3 matrix
=
# build a dataframe from our matrix
=
=
=
= -
= +
= +
=
=
=
return , ,
Estimating Adjusted Ratings
Next, we run the functions defined above to generated the adjusted ratings
=
, =
= , ,
Intercept = 114.2197043446658
The intercept here can be interpreted as the league average offensive/defensive rating. Here are the adjusted ratings.
tId | Team | aOFF | aDEF | aNET | |
---|---|---|---|---|---|
0 | 1.610613e+09 | Philadelphia 76ers | 121.065207 | 110.772873 | 10.292335 |
1 | 1.610613e+09 | Boston Celtics | 118.828331 | 108.764236 | 10.064095 |
2 | 1.610613e+09 | Oklahoma City Thunder | 117.702690 | 110.814878 | 6.887812 |
3 | 1.610613e+09 | Minnesota Timberwolves | 113.207243 | 106.628440 | 6.578803 |
4 | 1.610613e+09 | Denver Nuggets | 118.395144 | 113.090987 | 5.304157 |
5 | 1.610613e+09 | LA Clippers | 115.366218 | 111.064890 | 4.301329 |
6 | 1.610613e+09 | Orlando Magic | 113.446035 | 109.345141 | 4.100894 |
7 | 1.610613e+09 | New York Knicks | 117.214095 | 113.291210 | 3.922885 |
8 | 1.610613e+09 | Houston Rockets | 111.781191 | 107.967128 | 3.814063 |
9 | 1.610613e+09 | Milwaukee Bucks | 118.657846 | 115.338466 | 3.319381 |
10 | 1.610613e+09 | Brooklyn Nets | 116.937071 | 114.575413 | 2.361658 |
11 | 1.610613e+09 | Indiana Pacers | 122.514626 | 120.553872 | 1.960754 |
12 | 1.610613e+09 | Dallas Mavericks | 118.932015 | 117.355888 | 1.576127 |
13 | 1.610613e+09 | New Orleans Pelicans | 114.101714 | 113.092424 | 1.009290 |
14 | 1.610613e+09 | Golden State Warriors | 114.940190 | 114.182474 | 0.757716 |
15 | 1.610613e+09 | Miami Heat | 114.132399 | 113.518409 | 0.613991 |
16 | 1.610613e+09 | Atlanta Hawks | 118.941745 | 118.485097 | 0.456648 |
17 | 1.610613e+09 | Phoenix Suns | 116.960966 | 116.528575 | 0.432391 |
18 | 1.610613e+09 | Cleveland Cavaliers | 111.023001 | 110.911526 | 0.111475 |
19 | 1.610613e+09 | Los Angeles Lakers | 112.332601 | 112.222641 | 0.109959 |
20 | 1.610613e+09 | Sacramento Kings | 115.306403 | 115.211154 | 0.095249 |
21 | 1.610613e+09 | Toronto Raptors | 112.389538 | 114.521311 | -2.131772 |
22 | 1.610613e+09 | Chicago Bulls | 111.271702 | 115.736148 | -4.464446 |
23 | 1.610613e+09 | Memphis Grizzlies | 106.322251 | 113.173911 | -6.851659 |
24 | 1.610613e+09 | Portland Trail Blazers | 106.938770 | 114.757369 | -7.818600 |
25 | 1.610613e+09 | Charlotte Hornets | 112.456339 | 120.395203 | -7.938864 |
26 | 1.610613e+09 | Utah Jazz | 110.533878 | 118.914809 | -8.380931 |
27 | 1.610613e+09 | Washington Wizards | 111.230457 | 120.661742 | -9.431285 |
28 | 1.610613e+09 | San Antonio Spurs | 107.330475 | 117.157667 | -9.827192 |
29 | 1.610613e+09 | Detroit Pistons | 106.330988 | 117.557249 | -11.226261 |
Finishing Touches
We're not done yet. Now we need to compare the adjusted ratings with the unadjusted ones. But, we haven't calculated the unadjusted ratings yet. Let's do it now.
For a single game: $$ PTS_{OFF}*100 = ORtg^1 \times poss^1 $$ $$ PTS_{DEF}*100 = DRtg^1 \times poss^1 $$
Applying these operations on the data
dataframe:
= *
= *
We have to use the groupby
operation again, now on the tId1
column. After the groupby
operation, we chain an agg
(aggregate) operation, which applies a function on all rows of the group. The function we chose here is sum
, which adds all the pts
and and poss
for a team.
=
=
The unadjusted team ratings would then be: $$ OFF = \frac{PTS_{OFF}^{Total}}{poss^{Total}} $$ $$ DEF = \frac{PTS_{DEF}^{Total}}{poss^{Total}} $$
= /
=
= /
=
We then merge these ratings to the results_adj
dataframe
=
= -
=
=
=
=
=
=
= -
= -
= +
=
=
=
= + 1
Reminder
You can find the notebook version of this tutorial on my github: (https://github.com/sravanpannala/NBA-Tutorials/blob/main/sos_adjusted_ratings/how_to_adjust_nba_team_ratings_for_sos.ipynb
Final Combined Data table:
You can save it as csv
file and then you some fancy visualization tool to create a pretty looking table and/or efficiency landscape graph
Team | OFF | oSOS | aOFF | DEF | dSOS | aDEF | NET | SOS | aNET | |
---|---|---|---|---|---|---|---|---|---|---|
1 | Philadelphia 76ers | 121.2 | -0.1 | 121.1 | 110.9 | 0.2 | 110.8 | 10.3 | 0.0 | 10.3 |
2 | Boston Celtics | 118.3 | 0.5 | 118.8 | 109.6 | 0.9 | 108.8 | 8.7 | 1.4 | 10.1 |
3 | Oklahoma City Thunder | 117.6 | 0.1 | 117.7 | 110.6 | -0.2 | 110.8 | 7.0 | -0.1 | 6.9 |
4 | Minnesota Timberwolves | 113.3 | -0.1 | 113.2 | 106.6 | -0.1 | 106.6 | 6.7 | -0.2 | 6.6 |
5 | Denver Nuggets | 117.3 | 1.1 | 118.4 | 112.6 | -0.5 | 113.1 | 4.7 | 0.6 | 5.3 |
6 | LA Clippers | 115.4 | -0.1 | 115.4 | 110.6 | -0.5 | 111.1 | 4.8 | -0.5 | 4.3 |
7 | Orlando Magic | 113.8 | -0.4 | 113.4 | 109.5 | 0.2 | 109.3 | 4.3 | -0.2 | 4.1 |
8 | New York Knicks | 117.3 | -0.1 | 117.2 | 113.3 | 0.0 | 113.3 | 4.0 | -0.1 | 3.9 |
9 | Houston Rockets | 111.4 | 0.3 | 111.8 | 107.4 | -0.5 | 108.0 | 4.0 | -0.2 | 3.8 |
10 | Milwaukee Bucks | 119.3 | -0.7 | 118.7 | 115.7 | 0.4 | 115.3 | 3.6 | -0.3 | 3.3 |
11 | Brooklyn Nets | 116.9 | 0.0 | 116.9 | 115.0 | 0.4 | 114.6 | 2.0 | 0.4 | 2.4 |
12 | Indiana Pacers | 122.4 | 0.1 | 122.5 | 120.1 | -0.4 | 120.6 | 2.3 | -0.3 | 2.0 |
13 | Dallas Mavericks | 118.5 | 0.4 | 118.9 | 116.4 | -1.0 | 117.4 | 2.2 | -0.6 | 1.6 |
14 | New Orleans Pelicans | 114.1 | 0.0 | 114.1 | 113.1 | -0.0 | 113.1 | 1.0 | -0.0 | 1.0 |
15 | Golden State Warriors | 114.0 | 1.0 | 114.9 | 114.2 | 0.0 | 114.2 | -0.2 | 1.0 | 0.8 |
16 | Miami Heat | 114.7 | -0.6 | 114.1 | 113.5 | -0.0 | 113.5 | 1.2 | -0.6 | 0.6 |
17 | Atlanta Hawks | 118.7 | 0.2 | 118.9 | 118.8 | 0.4 | 118.5 | -0.1 | 0.6 | 0.5 |
18 | Phoenix Suns | 116.6 | 0.4 | 117.0 | 115.4 | -1.1 | 116.5 | 1.2 | -0.7 | 0.4 |
19 | Los Angeles Lakers | 112.2 | 0.1 | 112.3 | 111.8 | -0.5 | 112.2 | 0.4 | -0.3 | 0.1 |
20 | Sacramento Kings | 114.6 | 0.7 | 115.3 | 114.9 | -0.3 | 115.2 | -0.2 | 0.3 | 0.1 |
21 | Cleveland Cavaliers | 111.1 | -0.0 | 111.0 | 111.5 | 0.6 | 110.9 | -0.5 | 0.6 | 0.1 |
22 | Toronto Raptors | 112.7 | -0.3 | 112.4 | 114.9 | 0.4 | 114.5 | -2.2 | 0.1 | -2.1 |
23 | Chicago Bulls | 111.6 | -0.3 | 111.3 | 115.7 | -0.1 | 115.7 | -4.1 | -0.4 | -4.5 |
24 | Memphis Grizzlies | 106.5 | -0.2 | 106.3 | 112.7 | -0.5 | 113.2 | -6.2 | -0.7 | -6.9 |
25 | Portland Trail Blazers | 107.2 | -0.3 | 106.9 | 114.2 | -0.5 | 114.8 | -7.0 | -0.8 | -7.8 |
26 | Charlotte Hornets | 112.6 | -0.1 | 112.5 | 120.4 | -0.0 | 120.4 | -7.8 | -0.2 | -7.9 |
27 | Utah Jazz | 110.4 | 0.1 | 110.5 | 118.1 | -0.8 | 118.9 | -7.6 | -0.7 | -8.4 |
28 | Washington Wizards | 111.8 | -0.6 | 111.2 | 121.4 | 0.7 | 120.7 | -9.6 | 0.1 | -9.4 |
29 | San Antonio Spurs | 107.0 | 0.3 | 107.3 | 117.3 | 0.1 | 117.2 | -10.3 | 0.5 | -9.8 |
30 | Detroit Pistons | 106.8 | -0.5 | 106.3 | 117.8 | 0.2 | 117.6 | -11.0 | -0.2 | -11.2 |