Possible improvements to Win Probability stats

Since I am constantly referring to win-probability-based stats on the site, it seems fitting every now and then to take a look at what WPA actually tells us, what adjustments I’m already making to help it tell us more, and what adjustments could be made to make it even better.  Now it’s time to take such a look.

What WPA tells us

I’ve probably written this a dozen different ways in the past, so let’s go with something I’ve already written, from the glossary on the Braves WPA page:

WPA Basics

WPA – Win Probability Added – Simply put, a player either adds or subtracts from his team’s probability of winning the game based on his actions on the field. WPA measures that change in probability based on tables derived from historical results and some higher math.

LI/Leverage – The average game situation has a leverage rating of 1, so “more important” situations have leverage greater than 1, and “less important” situations are less than 1. Tom Tango should be credited for the math behind this number.

WPA/LI – WPA is great, but it’s hugely dependent on game context. High-leverage situations generate larger changes in WPA, so this number neutralizes that effect. You could call it leverage-neutral or context-neutral WPA, and it’s probably a better indicator of overall ability within the WPA framework.

Clutch – This is what FanGraphs calls clutch, at least, and it’s calculated as simply [Clutch = WPA – WPA/LI]. Even though it subtracts WPA/LI, “Clutch” is still somewhat dependent on the leverage a player faces, so I break it down further into “EC,” which is the leverage-dependent portion of “clutch” and “CP,” which is actual clutch performance.

That’s basically what you see when you look at FanGraphs, which is really a groundbreaking resource for a win probability geek like myself.  Tango gets a great deal of credit for the theory side of things, while David Appelman is the man behind the site.

My own adjustments

I prefer to look at more than just WPA, LI, WPA/LI, and “Clutch” because that doesn’t tell us as much as we can possibly know from a simple play-by-play account of a game.

There is actually a problem with calling “Clutch” a measure of clutch performance because it is dependent on leverage.  I explained how in a post last year introducing my CP (Clutch Performance) and EC (Expected Clutch) statistics:

When you look at WPA alongside its sister statistic, Leverage Index, you can theoretically remove some of the statistical “noise” of WPA and make it more meaningful.  The idea is that WPA/LI gives you a better indicator of the part of a player’s performance that the player actually controls, regardless of the situation in which he is used.  The end result is that you have WPA/LI, which is the leverage-neutral performance, total WPA, and what we have simply called Clutch – the difference between the two.

The problem with Clutch as an encompassing measure of clutch hitting is that it actually measures more than that.  We know that the part of WPA that is influenced by the leverage of a situation can be factored out, and we get WPA/LI as a result.  However, it’s also important to realize that Clutch, the resulting difference between WPA and WPA/LI is itself a product of Leverage Index.

Let me offer an example:

If Kelly Johnson always came to the plate when the LI was 1.19, as it was in the fifth inning of Saturday’s game, and he tripled in every at-bat (as he did then), his WPA would always be +.119, his WPA/LI would always be +.100, and his Clutch would always be +.019.  This is not entirely the result of Kelly’s own “clutch ability,” though.  To put it differently, if his “real ability” is +.100 in WPA/LI, you would actually expect him to get +.119 in WPA and +.019 in Clutch every time he came up to the plate, as long as the LI is 1.19.

It would seem reasonable, given that example, to conclude that a certain part of a player’s Clutch rating is to be expected, given his “real” WPA/LI performance and the leverage index at that juncture of the game.  Clutch is, therefore, a function of both the player’s innate clutch ability (however large or small that may be) and the leverage he has faced.

Now, let me try to separate those two factors of Clutch.  Seeing that Johnson’s pLI for the season is 0.91, we can “expect” a clutch rating of 9% (1.00-0.91) of his “real” WPA/LI performance below zero.  Why below zero?  If his pLI were exactly 1.00 in every plate appearance, his WPA/LI and WPA would be identical, therefore resulting in a zero clutch rating, regardless of his actual clutch ability.  With a below-average pLI, he would have a negative expected clutch.  In effect, because his plate appearances have come at less crucial times on average, his clutch rating has been penalized.  Taking -9% of his WPA/LI out as ”expected clutch” performance, we can then find his actual clutch performance.

So my adjustment essentially removes Expected Clutch from Clutch and calls it Clutch Performance.  To my knowledge, that is the best number we have today for actually quantifying a player’s performance in the clutch.  I am hesitant to call that an actual “ability” for reasons outlined in the quoted post.

That sums up the changes I make to FanGraphs’ WPA stats, so now let’s look at some areas in which current win probability stats are lacking.

Possible Improvements

Once you get beyond determining how much the win probability changed during a play (using the game situation and win probability tables), the main part of assigning positive and negative win probability  is deciding who gets credit for what on a particular play.  This is one of the primary areas of possible improvement to WPA.


On offense, that’s fairly easy.  The batter usually deserves the credit for what happens, and the numbers we have already account for stolen bases and other similar base-advancement plays in which the batter is not involved.  They do not account for plays where a runner (who was not the batter) takes an extra base, like going first-to-third on a single or scoring from first on a double (something that actually requires a degree of baserunner skill) or silly baserunning errors.  Those plays currently give (undeserved) full credit to the batter.


On defense, dividing up the credit is a little trickier.  Pitchers clearly deserve full credit for walks, strikeouts, and homers (unless you want to credit the catcher for calling for a pitch), but fielders are involved on all other plays.  Currently, the pitcher gets full credit for all defensive plays (even errors).  I recommend taking in this discussion on the FanGraphs forum, which includes a number of ideas for crediting fielders.

I like the idea of figuring out the baselines for plays hit to a certain “zone” on the field, or something akin to John Dewan’s The Fielding Bible method.  I also like the idea of asking fans to provide their own opinions on plays.  In the past, I’ve attempted to allocate percentages of credit myself when scoring a game using the spreadsheet I linked to in the discussion, but even that has its limitations (only one fielder allowed).

Once we understand more about pitch location and speed from the PitchF/X system, that knowledge could also be applicable to this area.

Other Factors

Park Factors are actually a fairly large omission from the current (FanGraphs) numbers; they are not being used as far as I can tell.  These are especially important because the value of a run drives which side (hitting/pitching) gets more credit in the long run for a certain level of performance.

If win probability tables for an average run environment (approximately 5 per game) are used at an extreme pitchers’ park like San Diego’s Petco Park, pitchers will get undeserved win probability credit because it is more difficult to score a run there than at the average park.  On the flip side, hitters will not receive enough credit at such a park.  The effect is the opposite at a hitters’ park.

While it is not easy to make the tables dynamic to reflect each park’s run environment, this effect is large enough to call into question the reliability of some of the overall leaderboards on FanGraphs.  It’s something to keep in mind when you start comparing players like Matt Holliday and Adrian Gonzalez, who lie on opposite ends of the park factor spectrum.  It’s also worth noting that the aforementioned WPA-tracking spreadsheet can account for a user-input park-specific run environment.

Of course criticism like this is all relative.  It’s better to have one table for the entire league based on 5 runs per game than not to track WPA at all.  I’m just throwing this out there to say that park factors should be one of the main focuses for improving upon the current plain vanilla version of WPA in the future.  There are other areas to address, such as home-field advantage and batter/pitcher handedness, that could also be lumped into this category.


Win Probability-based stats are still in the early stages of development as a useful baseball performance measure, but right now those stats are only as good as the play-by-play data upon which they are based.  The more detail you can pack into play-by-play game accounts, the more accurately you can divide up the credit for performance.

WPA has a long way to go, with some possible improvements to widely-available data (such as adding park factors) being easier to implement than others (allocating defensive WPA).  Taking those steps to improve it will be up to those of us who enjoy observing the game and all the willing researchers and coders out there today.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s