Thursday, August 21, 2008

The Arbitrarian: Assigning Credit for Game Outcomes

David Sparks is the Arbitrarian. His column appears every Thursday here at Hardwood Paroxysm. This week's stattastic column regards an elaboration on the Box Scores measure he discussed previously. Your feedback is welcome in the comments, puny human... I mean... dear friends.

Two weeks ago, we explored a statistical estimator of value, BoxScores, which estimates player contributions to team success at the season level. Aside from the time-honored complaint that it doesn't account for defense, there are at least two other improvements that I might wish to make to improve the accuracy of this value estimator.

First is the problem of trades, and more generally, varying team success across the duration of the season. As it stands now, if player A is traded from team X to team Y in the middle of the season, his BoxScores are calculated by finding his PVC to team X's entire season's worth of MEV and multiplying that by X's entire season's worth of wins; then adding to that the same calculation for team Y to find the player's season-cumulative BoxScores figure. This is good enough, for an estimate.

However, imagine if both teams are made significantly better by player A. Team X might be on pace for a very successful season up until the trade, and might begin to tank once he leaves. Team Y may have had an inauspicious start, but with the addition of player A, they might turn the season around. If this is the case, player A might be responsible for more success than his BoxScores indicate. Alternatively, similar situations can be envisioned in which much-injured players' contributions are over- or under-estimated, since BoxScores (using season-level counting statistics) cannot account for game-level success and variations thereof.

Another problem is with comparability, especially comparisons of good players on bad teams to good players on good teams. According to BoxScores, Al Jefferson was less valuable in 2007-08 than was Andris Biedrins. This could be true, but it could be that while Al Jefferson did more every game to help his team win, he could not, (essentially) alone, carry his team enough to get very many wins. The point was made by a commenter on a previous post that a team of Michal Jordan and eleven pre-schoolers would never win an NBA game, though Jordan could be incredibly productive. BoxScores, multiplying productivity by success, would assign Jordan and his eleven weaker teammates the same value: 0. This is certainly an extreme example, but it highlights a possible shortcoming in the BoxScores methodology--wins are discrete, binary events. Either a team wins a game, or it does not. Regardless of whether the score was 101-100, or 130-70, a win counts the same.

The solution, in the form of a more specific metric

The appeal of BoxScores has been (among other things) that it can be applied to every professional basketball player, because season-level box score stats are very widely available. The downside to a more specific, game-level estimator is that the increased accuracy comes at the cost of universality: Game-by-game box score statistics are only available going back to the 1986-87 season. Nevertheless, here I will develop a value estimator that works at the game level, to give us an even more accurate picture of just how much each player contributes.

For each game, we first must calculate each player's MEV. (See this post for a very detailed description of how this is done.) Then we calculate each player's Marginal Victories Produced (MVP):
MVP = Player MEV / total MEV sum for both teams
As you can see, in each game, there is a total of 1.00 MVP to be allocated. Each individual's contribution to the total production in the game is considered their Marginal Victory Production. This way, players on losing teams can be seen as producing valuable contributions--they might be valuable enough to get their team right to the cusp of victory--and this value shows up in MVP (but not in BoxScores).

Here is an example of MVP calculated for a game on April 11, 2008, between the LA Lakers and New Orleans Hornets:

The Lakers won, 107-104. Total MEV for the Lakers was 110.7, and for the Hornets, it was 106.0, so the Lakers' total MVP allocation was 0.511, versus the Hornets' 0.489. If we were focusing on wins and losses alone, the Lakers would get 100% of the credit for this game. Arguably, though, the Hornets produced something of value here--they got within four points of winning, and thus MVP is a much more accurate estimator of value.

One interesting way to think of MVP numbers is to note that a team needs a total of at least 0.5 MVP to win a game.¹ Thus, in the game detailed above, Bryant got his team almost a third of the way to the win (0.165 MVP), Paul/Chandler/Stojakovic together got their team 2/3 of the way to a win (0.335 MVP), etc.

MVP value at the season level

To estimate a player's value for the duration of a season or career, we need only sum their game-level MVP. One nice property of MVP is that the sum total of MVP is equal to the total number of games played--the "value of each game" is divided among each participant, so that all games are accounted for in their entirety. Further, team season-total MVP can be translated to wins and losses by a method similar to the Pythagorean win projection (more on this sometime in the future). How many marginal victories did your favorite players produce? See below...

The first tab ("07-08 MVP") lists the total number of MVP for each player last season. I would argue that this is a valid way of identifying the league Most Valuable Player. Just as in the BoxScores rankings, Chris Paul comes out on top, followed by LeBron James and Kobe Bryant. However, the differences in the two estimators can be instructive. According to BoxScores, Al Jefferson is the 74th most valuable player--by MVP, Jefferson is 11th, just behind the player for whom he was traded, Kevin Garnett. Dwyane Wade moves from 188th most valuable (BXS) to 67th (MVP) for his his injury-shortened season. Good players on bad teams are not "punished" for having low-quality teammates. Rather, everyone is rewarded based on their contributions to competitiveness, even if that competitiveness doesn't result in winning every time.

The "86-08 MVP Seasons" tab lists just that--the most valuable seasons from my limited dataset according to MVP. Unsurprisingly, Jordan dominates this list, along with other modern luminaries. Keep in mind that the MVP number is not a number of wins--it's "Marginal Victories"--but also keep in mind that teams need only 0.5 total MVP in a game to win it. One way, thus, to look at season-total MVP numbers is to say that, for example, Jordan in 87-88 contributed enough MVP to help his team win the equivalent of about 27 (13.52 / 0.5) games. Bear in mind, though, that this is just an interesting shorthand, because summing this figure for each team will not come close to matching that team's win total. If Jordan had accumulated 0.5 MVP in each of 27 games, and sat out the rest of the season, his team would have won each of those games, and he'd be credited with 13.5 MVP. However, Jordan played for a whole season, and accumulated MVP in pieces (never as many as 0.5 at a time--no player won any game "single-handedly"), so the 27 win estimate is interesting, but not literal.

The final tab, "86-08 MVP Careers," lists the most valuable players during the period covered by the data set. Thus, many of Larry Bird's and Magic Johnson's best years are excluded, as are the first years of Jordan, Olajuwon, Stockton, etc. since they came prior to the 86-87 season. This is important to keep in mind when viewing game-average MVP numbers. Larry Bird falls relatively low on the list in no small part because we're comparing his later years to the primes of LeBron James, Chris Paul, and Dwyane Wade.

Bearing this in mind, the list is still highly instructive. Since it takes a total of 0.5 MVP to win the game, players from Jordan down to Garnett are generating at least a quarter of the value their teams need to win (0.125 / 0.5= 0.25). Players with MVP over 0.1 are doing more than a fifth of the work needed to get a win, and since no player plays over a fifth of his team's minutes, these are obviously some of the most valuable players--overrepresented in value relative to playing time. The rankings on this list are unsurprising, and read like a roll-call of the best players of the last 20 years. These are the guys around which you'd want to build a team.

Greatest single-game performances

What's the point of having a game-by-game data set if you don't look at game-by-game value? Below is a list of the 100 most valuable performances of the 07-08 season, and the 500 most valuable performances from 1986-2008. These are the herculean efforts from which legends are made. Here we can see how MVP automatically adjusts for pace, and assigns value above and beyond MEV's measure of productivity. Since MVP is a percent of total production, it makes no difference how fast the game is played, how long the game is, or how much is produced in total, contributions to winning are measured against the other players in the game. Also, since as opponents' MEV decreases, a player's MVP increases, the better the player contributes defensively (i.e. outside of his box score stats, but visible in the other team's production), the better will be his MVP. The margin column, incidentally, indicates the final point spread in favor of a given player's team. If it is negative, that player's team lost the game.

Both lists are topped by players you might expect to be there, but interspersed are some surprises: John Salmons? Willie Burton? It goes to show that on any night, any player can be a hero, and that a single sample can be very misleading. Nevertheless, there is a lot of data to be gleaned here. Note that the best games played see players generating over a third of the total production, which gets their team 2/3 of the way to a win. Not even the greatest can win completely on their own.

I'd like to digress here briefly, on the subject of Kobe's 81 point game. Note that he produced about 1/3 of the total valuable contributions in that game, but look at his MEV: 68.96. That means that by missing 18 field goals, and doing very little other than shooting, he cost his team about 12 points in the final margin. The Lakers still won by 18 points, but to me the 81 point achievement is somewhat underwhelming, because of what it took to get there. Edit: Apparently, you put it one little paragraph about Kobe Bryant, and it makes your whole post about Kobe Bryant... All I'm trying to say here is that Kobe, by missing 18 shots (and turning the ball over, while not doing a lot of rebounding or box score defending) cost his team a few points. Most players couldn't dream of generating 69 points, and this is an impressive feat, but also, most other players don't even take 18 shots (doing so would put them in the 94th percentile of all games in the data set). All I'm saying is that it might be somewhat less impressive than some of the others on the list, like, for example, Jordan's incredible performance against Cleveland.

The future

In the future, I plan on developing an approximation of MVP based on season-level statistics, for those seasons in which game-by-game data is unavailable. Next week, I am planning on applying some of the methods discussed here to the performance of the US Men's Olympic basketball team. Today, I have three requests for you: First, please leave insights or any questions you might have in this post's comments. Second, please take a moment to fill out the survey below with your thoughts, ideas, and criticisms. Third, if you found this post interesting, click the little "Buzz up!" button below, to express your approval.

¹ Game-total MEV margins correlate with game-level point margins at 0.947, and looking at MEV winners correctly classifies actual point winners 92% of the time.

Add to Technorati Favorites