This evening as I was playing College Hoops 2K6, I suddenly realized one major flaw in my methodology for the study I did a week or so ago. Maybe I should get out more if this type of thing keeps happening, but that’s a completely different point.
I was destroying Maryland in a rivalry game as Duke, and most of my edge came on the offensive glass, where I was actually pulling down the rebounds for more than half of my missed shots. I finished the game with 19 offensive rebounds to their 18 defensive boards, while they had just one measly offensive rebound to my 22 defensive. That’s a 1000% edge for me, which of course is a very large percentage difference. Differences like that tend to throw off numbers like the ones in my study, which were typically 50% or less one way or the other.
This realization led me to a novel idea: why not just use the actual amount of the difference, rather than some crazy percentage number? That would automatically eliminate the possibility for such extreme outliers. I’m not sure why I hadn’t thought of it earlier, and as it turns out, this was just what the data needed to become much more powerful.
Let me alert you once again that if you’re not familiar with variances and regression analysis, some of the following numbers may be foreign to you. I’ll try to make sense of everything, though.
The new findings:
New variances for the individual parts of the four keys (old variances in parentheses):
TS%: .706 (.707)
OR%: .237 (.113)
TR: .134 (.118)
FTM/P: .081 (.041)
Moving on to variances for two stats together:
TS%-OR%: .798 (.764)
TS%-TR: .892 (.857)
OR%-TR: .375 (.220)
Now for the main three or four areas:
TS%-OR%-TR: .984 (.904)
TS%-OR%-TR-FTM/P: .984 (.909)
The main idea from all this is that the regression model now explains a hefty 98% of the variance in net efficiency, when before it was just 90%. Statistically speaking, that’s a pretty huge jump, explained entirely by using the raw data instead of silly percentage differences.
The new model for prediction has also changed, and it becomes increasingly more accurate. It correctly predicted four more games (159-6 vs. 155-10) in my sample set, but I get the feeling that if I had more games to sample, the difference would be wider. As it is, the model was still wrong on six games which, in reality, were decided by a total of 14 points. In other words, they were all close games that could have potentially gone the other way with just a single extra play. If you’re interested in using the model for prediction yourself, the formula is now:
[TS%]*151.3 + [OR%]*49.5 – [TR]*161.1 – 0.3
Another interesting finding I failed to mention before is that the better shooting team ended up winning 85% of the time, regardless of rebounding or turnovers. I guess that shows (if it wasn’t already obvious) that you need good philosophies and execution on offense and defense to stand a chance. Rebounding and ball control are important, but not nearly as important as getting good shots and preventing your opponent from doing the same.
What does this mean for the four keys?
After this study, I would be a lot more comfortable leaving free throw shooting out of the equation, and here’s why. True shooting percentage accounts for both three-point shooting and free throws, basically measuring points per shot. It makes sense that free throw shooting by itself would not add anything to the discussion, and in this model, free throws made per possession is not a statistically significant variable (at the .01 level).
So why did Dean Oliver include free throws in his four keys? I also have a good answer for that. Oliver just used field goal percentage for the shooting aspect of the game, which accounts for neither the extra point you get from a three nor the value of free throw attempts. All three (inside shots, outside shots, and free throws) are components of scoring, and should thus be included in some form. To do so, I have lumped them all into one shooting category, true shooting percentage, but it would be equally correct to split them up into five keys (adding three-point shooting to the original four keys).
I prefer to lump them all together because it’s much easier to calculate that way, and it’s more of an overall indicator of offensive proficiency. Some teams, because of team philosophy and player talent, will shoot more threes or get to the line more often, and the five-key display would reflect that, but it would make it harder to gauge overall proficiency. It’s really just a matter of preference, so my only real conclusion here is that the four keys as Oliver described them are somewhat insufficient. I may just refer to the key areas as “The Keys” from now on, since I can conclude that there are not four important areas, but rather three or five, depending on how you want to divide the shooting aspect up.