I’m going to try and crank out two posts today:
1) First, a few more comments on ESPN’s introduction to win probability
2) Second, my weekly Braves update
So, let’s get started.
Win Probability and ESPN
As I mentioned on Saturday night, I tuned in for the last inning of the Oregon State-Cal State Fullerton College World Series game only to see an interesting dropdown from the top score line. It included each team’s current win probability, and as far as I knew, it was the first time ESPN had acknowledged the concept, which Dave Studeman introduced to me over two years ago on The Hardball Times. I’ve been talking about it on this blog (and others) ever since, and I find it positively fascinating, although it’s certainly not the Holy Grail of Baseball Stats that it’s sometimes made out to be.
Honestly, I was surprised to see it on ESPN, since they’re typically more enthralled with unquantifiable topics like who is more overrated (not Andruw!), as well as ridiculous stats like the Player Rating. Good sabermetric writers like Rob Neyer get overshadowed there by guys like Jayson Stark, who has stirred things up lately with this Andruw debate despite not having anything worthwhile to say. I was expecting the worst when it came to their explanation; at least it wasn’t Joe Morgan explaining it, or my head might have exploded.
Over the weekend, Orel Hershiser and Mike Patrick took turns introducing this newfangled Win Probability thing, and they mostly did a decent job. They started out simple and put up a graphic explaining that it’s based on hundreds of thousands of real-life plays, and it includes pretty much every conceivable game situation (inning, base-out situation, score, etc.).
There are two minor differences that I can see between their Win Probability and the version you could find from Dave Studeman’s WPA spreadsheet or Tangotiger’s WPA tables on Fangraphs, so let me tackle those first:
- ESPN claims to include game data from NCAA baseball games, which would be a new development in win probability, as far as I know. I guess that would include some sort of accounting for the mercy rule, but otherwise it’s probably not that big a deal.
- ESPN uses “position in the batting order” as a consideration, which is not something that has been implemented in either of the above examples of widespread WPA calculation.
Let me address #2 first. There has been plenty of discussion on the FanGraphs message board about areas of the game that the existing WPA tables don’t address, and I believe this is still one of them. My opinion is that WPA should account only for the game environment, which would include park factors and perhaps home-field advantage, but not the players themselves. I know that production differs for each spot in the batting order, and in some ways those differences would be consistent across teams, but that won’t always be the case. In general, I’m not sure such things are worth taking into account. I’d rather not penalize Johan Santana by starting his team with a .650 win probability each time he takes the mound, and this is basically the same idea. So, if ESPN really is doing it this way, I might have issues with that.
As for #1, it’s great if ESPN is using additional game data for the NCAA, though I wish they would make that data public, along with the rest of their tables. The mercy rule is probably the biggest difference between the NCAA and MLB, at least from a pure gameplay standpoint, but like I said, it probably wouldn’t have a big effect on the win probability.
Knowing what ESPN was tracking, I decided I would look at Sunday’s elimination game (Louisville-Mississippi State) and see if they were on track with the existing win probability research. As it turns out, they basically are.
Louisville 12, Mississippi State 4
I was tracking the game using Dave’s spreadsheet, which allowed me to do a couple of things to manipulate the win probability tables on my own. First, I made an educated guess about the Rosenblatt Stadium park factor based on previous CWS data, so I set the run environment at 6.3 runs per game (per team). That essentially makes the win probabilities more favorable to the pitching team, since it’s harder to hold a lead when teams are generally scoring a lot of runs. Next, I was able to assign win probability to defensive players. FanGraphs does not do this, since it would require an “official scorer” of sorts for defensive purposes, and the effects are really quite small. I did it anyway and assigned credit to defensive players where I saw fit.
This really wasn’t much of a game, and as it turned out, ESPN only showed the win probability graphic twice during the broadcast:
- Starting the bottom of the 2nd inning, Louisville was already up 3-0, and they showed a 68% chance of winning for Louisville. My number was slightly more conservative at 67%, but well within a reasonable margin of error.
- In the bottom of the 6th, Mississippi State turned an 8-0 game into an 8-3 game, and the graphic showed Louisville with a 90% win probability. Again, my number was 1% more conservative at 89%.
My guess is that the park factor adjustment is the main reason for this difference, but it’s a very small difference, so ESPN appears to be basically on track with their win probability calculation. If there’s a batting order effect in ESPN’s number, it appears to be small.
Final Comments
I’m hoping that ESPN will continue using win probability in their broadcasts, and I think this is a good sign for them. The College World Series is a good way for them to test the water with a more obscure, but incredibly interesting statistic, and they’re off to a good start with it. They made it so the graphic didn’t interfere with the broadcast, as it simply dropped down from the score line occasionally and then went away. I just hope their advertisers don’t have a problem with them telling fans at a particular moment that there’s only a 10% chance the game’s going to end different from the current score. I think baseball fans are smart enough to realize that the outcome is never decided until the final out and that part of the fun of watching is seeing a team come back from a 3% chance of winning all the way to a victory. At any rate, this is a good start for WPA’s advancement into the mainstream.
UPDATE: I’ve uploaded the spreadsheet I used to track the game, so you can look at that here (right-click and save the file to your computer if you want to keep it from loading in your browser…it’s over 1MB).
You can view my previous post on this topic here and my latest ones here and here.
A thought on whether to include batting order position in win probability calculations:
Ultimately it all depends on what you’re intending to do with the win probability calculation. If you’re trying to accurately assess the probability of Team A winning the ballgame, either for entertainment or gambling purposes, then of course you make use of as much information as possible.
If you’re trying to use win probability as a tool to either evaluate past performance or to predict future performance, and you’re intending to compare players who hit in different spots in the batting order, then I’m not sure it makes as much sense.
For example, if you changed win probability figures to include batting order spot, then any decent NL hitter’s WPA would go way up if his manager (absurdly) started batting him 9th.
That’s exactly the right idea, and it’s probably the biggest question as to the direction that win probability stats will take next.
My motive, at least right now, is to figure out which players are adding win probability, as a method of evaluating performance. To me, it’s just another piece of the puzzle for player analysis. As a result, I prefer not to include all of those complicating factors.
At the same time, it would be interesting to see it implemented with all kinds of additional adjustments (home field, batting order, individual player performance, etc.) to get the “real” win probability.