One thing I like about the off season is it allows me more time to do research posts. This is one such post.

I've been wanting to do research on field goal shooting and field goal defense regression for Calvin, and the MIAA in general, but I just didn’t really have the required know how to get started. Until Tom Tango (blogger, co-author of “The Book”, and the reason why I love advanced stats) came out with this post on his blog. He usually sticks to baseball related topics, with some hockey posts sprinkled in (he’s Canadian, so I’ll give him a pass on the hockey love), but recently he’s been venturing into the realm of basketball.

In the post I lined to, he looks at how much skill is involved in free throw shooting percentages for NBA players. We obviously know that free throw percentage is related to skill, but what proportion of a players percentage is related to skill, and how much is related to statistical noise?

As you could probably guess, there’s a lot of skill involved in shooting free throws, so we’d rarely need to regress an NBA player’s career FT percentage (although, to be precise, you ALWAYS need to regress, but you’d gain very little benefit in this application). I’d encourage you to check out the blog post.

Anyhoo, I’m going to mirror Tom’s method to determine how much skill there is (at the team level) in defensive field goal percentage for the MIAA. Actually, I’m going to do this for Field Goal Percent Defense, Three Point Field Goal Percent Defense, Two Point Field Goal Percent Defense, and Effective Field Goal Percent Defense. If you don't like math, you can skip the boring part.

I was able to glean statistical data for the past 14 seasons (1998-2011) from the MIAA website; that gives us 108 team seasons worth of data. It’s not as much as I would like, but it’s what we have.

On average, each team faced 1531 field goal attempts (of which 495 were three point attempts). They gave up (on average) 0.440 percent in total (0.351 from three point, 0.497 eFG%, and 0.482 from two point). The observed standard deviation for the team defenses was 0.027 (0.026 for three pointers, 0.028 for eFG%, and 0.034 for two point). Don't get bogged down will all of the numbers just yet, I'll provide a table later on.

Now first we need to determine if the skill exists. To do that, we’ll need to compute the spread in percentages that we’d expect to find from random variation alone. Tom shows us how to do it with this calculation:

Where pct is the average percentage, and n is the average number of attempts.

This calculation gives us expected spreads of: 0.013 (total field goal percentage), 0.022 (three point), 0.013 (eFG%), and 0.016 (two point). In each case, we observe a larger spread than we'd expect due to random variation, so it is clear that there is a difference in defensive talent among the MIAA teams (no duh, right?).

But I want to know how much of a difference there is. I want to know how much we need to regress our observed percentages in order to get a better understanding of the actual talent of a particular team.

We keep going. We have the observed spread in percentages, we calculated the expected spread due to randomness, so now we can calculate the spread in actual (true) talent using the equation:

This gives us a spread in true talent of 0.024 (FG%), 0.014 (three point), 0.025 (eFG%), and 0.030 (two point).

Finally, we can calculate the r-squared values for our sets of data. R-squared tells us what percentage of the variation is due to actual talent given our average number of shots faced. We find this with the following calculation:

And from the r-squared numbers, we can compute the number of attempts a team needs to face to reach the 50-50 regression point. This is the point at which 50% of the variation is explained by team skill level.

For standard field goal percentage, this point is 416 attempts, for three pointers, it’s 1105 attempts, for effective field goal percentage, it’s 399 attempts, and for two pointers, it’s 276 attempts.

Here’s the chart of all of the numbers, as promised.

That bottom line is the key to this whole post. The regression equation (what percentage you need to regress for any number of shots faced) is:

For example, if a team faced 130 shots after the first two games (giving up a 0.500 FG%), and you wanted to figure out their expected team talent level, you'd figure:

416 / (416 + 130) = 0.76

You would need to regress 76% toward the league average, or:

0.76 x 0.440 + 0.24 x 0.500 =

So, even though the team had given up 0.500 from the floor (total FG%), you'd need to expect that them to be a 0.454 defense going forward.

The most interesting thing that I found through all of this is that in a typical season, you'd need to regress a defense's FG%, eFG%, and 2-pt% allowed about 21% toward the league average, but you'd need to regress their three point percentage allowed about 69% toward the league average.

So, what’s a practical application of this?

Well, let’s say a certain MIAA team, I dunno, Hope, was observed to perform much better in defending the three point shot in the second half of games than they were in the first half. Through eight games, we’ll suppose the following splits:

1st Half: 40-65 (0.615)

2nd Half: 17-60 (0.283)

They were observed to be better defensively in the second half by a margin of 0.332, but how much of that should we attribute to the team actually playing better on defense, and how much should we attribute to random variation?

Using our regression equation above, we find that after 60 or 65 shots, you need to regress 94.5% of the way toward league average. Doing that, we come up with the following regressed percentages. This is the best statistical estimate of the team’s defensive efforts in those two halves.

1st Half: 0.365

2nd Half: 0.347

So really, as best as we can tell from the stats, 0.314 of the difference is explained by random variation, and 0.018 is explained by a difference in skill or effort.

This isn't to say that 0.018 isn't significant. It's equal to the difference between the best (Albion) and worst (Calvin) three point defenses in 2011 (using the regressed stats, see below).

Here's a look at the various defensive stats from the 2010-2011 season. The observed stats are shown first, and their regressed counterparts are shown second.

I've been wanting to do research on field goal shooting and field goal defense regression for Calvin, and the MIAA in general, but I just didn’t really have the required know how to get started. Until Tom Tango (blogger, co-author of “The Book”, and the reason why I love advanced stats) came out with this post on his blog. He usually sticks to baseball related topics, with some hockey posts sprinkled in (he’s Canadian, so I’ll give him a pass on the hockey love), but recently he’s been venturing into the realm of basketball.

In the post I lined to, he looks at how much skill is involved in free throw shooting percentages for NBA players. We obviously know that free throw percentage is related to skill, but what proportion of a players percentage is related to skill, and how much is related to statistical noise?

As you could probably guess, there’s a lot of skill involved in shooting free throws, so we’d rarely need to regress an NBA player’s career FT percentage (although, to be precise, you ALWAYS need to regress, but you’d gain very little benefit in this application). I’d encourage you to check out the blog post.

Anyhoo, I’m going to mirror Tom’s method to determine how much skill there is (at the team level) in defensive field goal percentage for the MIAA. Actually, I’m going to do this for Field Goal Percent Defense, Three Point Field Goal Percent Defense, Two Point Field Goal Percent Defense, and Effective Field Goal Percent Defense. If you don't like math, you can skip the boring part.

**The Boring Part**I was able to glean statistical data for the past 14 seasons (1998-2011) from the MIAA website; that gives us 108 team seasons worth of data. It’s not as much as I would like, but it’s what we have.

On average, each team faced 1531 field goal attempts (of which 495 were three point attempts). They gave up (on average) 0.440 percent in total (0.351 from three point, 0.497 eFG%, and 0.482 from two point). The observed standard deviation for the team defenses was 0.027 (0.026 for three pointers, 0.028 for eFG%, and 0.034 for two point). Don't get bogged down will all of the numbers just yet, I'll provide a table later on.

Now first we need to determine if the skill exists. To do that, we’ll need to compute the spread in percentages that we’d expect to find from random variation alone. Tom shows us how to do it with this calculation:

sqrt(pct*(1-pct)/n)

Where pct is the average percentage, and n is the average number of attempts.

This calculation gives us expected spreads of: 0.013 (total field goal percentage), 0.022 (three point), 0.013 (eFG%), and 0.016 (two point). In each case, we observe a larger spread than we'd expect due to random variation, so it is clear that there is a difference in defensive talent among the MIAA teams (no duh, right?).

But I want to know how much of a difference there is. I want to know how much we need to regress our observed percentages in order to get a better understanding of the actual talent of a particular team.

We keep going. We have the observed spread in percentages, we calculated the expected spread due to randomness, so now we can calculate the spread in actual (true) talent using the equation:

sd(obs)^2 = sd(true)^2 + sd(random)^2

This gives us a spread in true talent of 0.024 (FG%), 0.014 (three point), 0.025 (eFG%), and 0.030 (two point).

Finally, we can calculate the r-squared values for our sets of data. R-squared tells us what percentage of the variation is due to actual talent given our average number of shots faced. We find this with the following calculation:

r^2 = sd(true)^2 / sd(obs)^2

And from the r-squared numbers, we can compute the number of attempts a team needs to face to reach the 50-50 regression point. This is the point at which 50% of the variation is explained by team skill level.

[(1 – rsquared)/rsquared] / n

For standard field goal percentage, this point is 416 attempts, for three pointers, it’s 1105 attempts, for effective field goal percentage, it’s 399 attempts, and for two pointers, it’s 276 attempts.

Here’s the chart of all of the numbers, as promised.

FG | 3FG | eFG | 2FG | |
---|---|---|---|---|

Average attempts faced per year | 1531 | 494 | 1531 | 1036 |

Average PCT allowed | 0.440 | 0.351 | 0.497 | 0.482 |

Observed Std. Deviation | 0.0274 | 0.0258 | 0.0281 | 0.0339 |

Random Std. Deviation | 0.0127 | 0.0215 | 0.0128 | 0.0155 |

True Std. Deviation | 0.0243 | 0.0144 | 0.0250 | 0.0301 |

r-squared | 0.79 | 0.31 | 0.79 | 0.79 |

50% Regression (# of attempts) | 416 | 1105 | 399 | 276 |

That bottom line is the key to this whole post. The regression equation (what percentage you need to regress for any number of shots faced) is:

50% Mark / (50% Mark + Attempts)

For example, if a team faced 130 shots after the first two games (giving up a 0.500 FG%), and you wanted to figure out their expected team talent level, you'd figure:

416 / (416 + 130) = 0.76

You would need to regress 76% toward the league average, or:

0.76 x 0.440 + 0.24 x 0.500 =

**0.454**So, even though the team had given up 0.500 from the floor (total FG%), you'd need to expect that them to be a 0.454 defense going forward.

The most interesting thing that I found through all of this is that in a typical season, you'd need to regress a defense's FG%, eFG%, and 2-pt% allowed about 21% toward the league average, but you'd need to regress their three point percentage allowed about 69% toward the league average.

**Application**So, what’s a practical application of this?

Well, let’s say a certain MIAA team, I dunno, Hope, was observed to perform much better in defending the three point shot in the second half of games than they were in the first half. Through eight games, we’ll suppose the following splits:

1st Half: 40-65 (0.615)

2nd Half: 17-60 (0.283)

They were observed to be better defensively in the second half by a margin of 0.332, but how much of that should we attribute to the team actually playing better on defense, and how much should we attribute to random variation?

Using our regression equation above, we find that after 60 or 65 shots, you need to regress 94.5% of the way toward league average. Doing that, we come up with the following regressed percentages. This is the best statistical estimate of the team’s defensive efforts in those two halves.

1st Half: 0.365

2nd Half: 0.347

So really, as best as we can tell from the stats, 0.314 of the difference is explained by random variation, and 0.018 is explained by a difference in skill or effort.

This isn't to say that 0.018 isn't significant. It's equal to the difference between the best (Albion) and worst (Calvin) three point defenses in 2011 (using the regressed stats, see below).

**2011 Stats**Here's a look at the various defensive stats from the 2010-2011 season. The observed stats are shown first, and their regressed counterparts are shown second.

Team | FG% | 3P% | eFG% | 2P% | r-FG% | r-3P% | r-eFG% | r-2P% |
---|---|---|---|---|---|---|---|---|

Adrian | 0.400 | 0.341 | 0.455 | 0.428 | 0.410 | 0.348 | 0.465 | 0.442 |

Calvin | 0.413 | 0.362 | 0.473 | 0.439 | 0.418 | 0.355 | 0.478 | 0.447 |

Kalamazoo | 0.432 | 0.321 | 0.476 | 0.475 | 0.434 | 0.343 | 0.481 | 0.476 |

Olivet | 0.435 | 0.346 | 0.480 | 0.467 | 0.436 | 0.350 | 0.484 | 0.470 |

Albion | 0.437 | 0.302 | 0.485 | 0.500 | 0.438 | 0.337 | 0.488 | 0.496 |

Trine | 0.442 | 0.351 | 0.503 | 0.490 | 0.442 | 0.351 | 0.501 | 0.488 |

Hope | 0.458 | 0.362 | 0.509 | 0.496 | 0.454 | 0.354 | 0.507 | 0.493 |

Alma | 0.463 | 0.359 | 0.514 | 0.504 | 0.458 | 0.353 | 0.510 | 0.500 |

Wow, Matt. Really glad you did this. i kept Tom's post to use for further applications after I read it the other day. So I'm glad I"m not the only one going to be toying around with it.

ReplyDeleteGreat work.