Thursday, July 24, 2008

The Arbitrarian: A Statistical Primer

The Arbitrarian is the resident smart kid of HP. He sits in front, answers all the questions, always shows his work on the math homework, and volunteers to clean the chalkboard. We all make fun of him and then in thirty years he's making four times what we do while we're slinging shakes at the local Chik-Fil-A. His Arbitrarian Column runs every Thursday here at Hardwood Paroxysm. You can read more of his work at his own blog. This week he begins his column with a discussion of relevant basketball stats. -MM

In many academic articles, the author often begins by citing previous work that set the stage for what he or she is writing about. This is often called a "literature review," and it discusses some of the strengths and weaknesses of past theoretical and empirical work, often with an eye toward explaining the need for their particular contribution. This post will serve as an introduction to some of the so-called "advanced" statistics, and I'm counting it as my literature review. Please forgive me if much of this is exceedingly basic or familiar to you--I hope that this single post can help most readers get "on the same page," with respect to some more recent developments in statistical analysis.

Most basketball fans are familiar with what I call the "counting" statistics, which are just simple sums of the number of times each player or team was tallied something tracked in the box score. Minutes, points, field goal attempts, personal fouls, etc. all fall under this category. Another set of statistics which almost everyone uses are what I'll call "simple ratio" statistics, wherein one counting statistic is divided by another. In baseball, people often cite batting averages, in basketball, we often see points per game, free throw percentages, even assist-to-turnover ratios. By-and-large, this level of sophistication is sufficient for most fans--scoring average is probably the single most highly-regarded estimator of player quality among the vast majority of fans, and to be sure, PPG correlates positively with productivity.

Somewhat less commonly seen is the use of per-minute statistics. Recognizing that some players, by virtue of playing longer minutes, have more opportunity to score, collect rebounds, etc., it is sometimes useful to compare statistical production at the minute-level, which allows "fairer" (in some sense) comparisons among, for example, bench players and starters, or point guards and centers (who typically play fewer minutes). Adjustments are also often made by position, with the idea that, for instance, shooting guards as a group aren't in as good a position to rebound as are power forwards, and so a shooting guard's rebounding prowess should be measured against others playing that position.

Volumes could be written detailing each permutation and variety of statistic, but I'll include only one more specific example. At the team level, team success is closely related to the scoring differential--the average difference between a team's and its opponents' points scored. Also, teams that score many points per game are not necessarily the best offenses, nor are teams that give up many points per game necessarily the worst defenses. Due the differences in the pace at which various teams play, own and opponent scoring totals are not good indicators of the quality of an offense or defense. Rather, by estimating the number of offensive and defensive possessions each team sees in a game, analysts often look at "efficiency." Offensive efficiency divides points scored by possessions (higher is better), while defensive efficiency divides opponent points scored by opponent possessions (lower is better). It turns out that this is much more useful than simple points for or points against averages on their own.

Standing on their shoulders

I originally planned on briefly listing some of the better-known basketball analysts, along with a little bit of background and a critique of their methodology. Fortunately for all of us, most of that task has been very competently accomplished already, at's "Analytics 101." I would highly recommend perusing that extensive collection of links, to familiarize yourself with some of the work that is being done, as well as some of the very capable individuals involved. Also, I would like to direct you to a post at the APBRmetrics Forum for the link to and discussion of a comparison of many of the more widely-used "advanced" metrics. In fact, while at the APBRmetrics forum, take a look around... much of the debate going on there is on the very cutting edge, and it won't take long to get a sense of the disputes that still rage: how can we measure defense? Does efficiency decrease with use? Are there diminishing marginal returns to player productivity? Etc, etc. The individuals posting in that forum constitute a large portion of the most capable and intelligent basketball analysts working today.

Since the basics have so thoroughly covered by others, I will briefly consider several of the most widely-used statistics, and offer my opinion on their strengths and weaknesses. It is important to note that I feel that the use of more than one approach (keeping in mind the strengths and weaknesses of each approach) permits a much more well-rounded and robust analysis of any problem. Further, my opinion is not authoritative on any of the material covered here, and I welcome a discussion on the relative merits of each methodology.

Plus/Minus (Raw, Adjusted, or Statistical)
Background: Rosenbaum (see also), Lewin (also), Witus, Ilardi

: Arguably the most computationally-intensive of the metrics I will discuss here, the plus/minus statistic, which is now being officially tracked by the NBA, has a lot to recommend it. One of the most useful aspects of this statistic is that it accounts for defense better than any metric based solely on box score statistics. The other nice thing is that a well-reasoned and well-applied statistical methods have been employed in converting between "raw" plus/minus, and "adjusted" plus minus, in order to control for the quality of a player's teammates and opposition. Further, plus/minus figures are often counter-intuitive. This by itself is not always a good thing, but it may indicate that this particular measure tells us things about the game that other, more conventional methods keep hidden. As this listing indicates, many of the players we might expect to top the list are at or near the top, but the orderings are sometimes somewhat surprising (see "cons" below, however). There is also the added value of having separate offensive and defensive ratings, to identify those players who are especially undervalued by offensively-oriented box score stats. As it stands currently, Plus/Minus is possibly the best single-number estimator of a player's influence on the game, at least for contemporary players.

Cons: Personally, I can offer very little to recommend against Plus/Minus, especially in its adjusted form. There are only two criticisms I can muster. First, you can see from the list linked above that each estimate is accompanied by an error term. (Unless I am mistaken,) This means that a player with an Adjusted +/- of 14 and an error term of 11 is 95% likely to have an actual +/- value between 3 and 25 (incidentally, it also means that there is a 1 in 20 chance that the actual value is outside those bounds, but this is conventionally disregarded). This example (Dwight Howard in 07-08) is a particularly egregious one, but it exemplifies the problems of relying exclusively on +/-. It means that, within the range of their error terms, it is difficult to identify the correct ordering of any set of players, much less their exact value in terms of points. It is, nevertheless, very instructive to review players' +/- ratings--certain unheralded players, possibly undervalued by box score methodologies, often show up at the top of these lists, notably the Pistons' Amir Johnson and Houston's Chuck Hayes, who lead the league in defensive +/- rating last season.

My other quibble is more pragmatic than theoretical: at this point, the public only has access to a few season's worth of plus/minus data. While this has very little to do with the validity of the statistic as an estimator of value, it makes historical comparison essentially impossible. Sadly, to the extent that one is interested in comparing players across eras, plus/minus becomes a less functional tool.

Player Efficiency Rating (PER)
Background: Hollinger (at ESPN), Wikipedia

Pros: My understanding of PER is that Hollinger developed the weightings he employs with a theoretical (rather than statistically-derived) basis (please correct me if I am wrong). This doesn't necessarily make it better than any other metric, but it is at least a somewhat unique and thoughtful approach the problem of value assessment. Also, PER adjusts for pace, unlike many of the more conventional statistics with which we are familiar, and this helps control for the advantage held by players on "run and gun" teams, due to the greater number of opportunities they have to accumulate counting statistics. Another (arguable) virtue is that PER is assessed on a per-minute basis, which accounts for the disparity in minutes played across individual players and position types.

Cons: Hollinger himself admits that PER, as a box score-based statistic, fails to account for the type of defense that, while it may not produce a block or a steal, still prevents scoring. Notably, players like Bruce Bowen and Shane Battier, of whom it is often said, "his contribution didn't necessarily show up in the box score," may be undervalued by PER and similar stats.

Another minor complaint I might lodge is that PER, with its pace adjustment, per-minute rating, adjustments for team assists, and league rebounding correction, becomes more of a "rating" than a metric. By this, I mean only that while it is straightforward to compare one players' PER to that of another, the statistic is somewhat decontextualized. What unit is PER measured in? How does it relate to scoring, or scoring prevention, or winning, or is there a direct relationship? Etc.

Finally, the decision to make PER per-minute, rather than per-game or per-season, while it does enable comparisons across players who play different minutes, comes with certain assumptions. Namely, when comparing "low-usage" players against those who play a substantial number of minutes per game, we must assume that efficiency does not vary with usage. In other words, if low-minutes player A appears to be more efficient (according to PER) than high-minutes player B, we must qualify such a comparison by saying explicitly "Player A, in the time he plays, is more efficient than is player B, in the time he plays." We cannot say, without making additional assumptions about usage versus efficiency, that player A would be as efficient as B if he were playing the same number of minutes as B.

Wins Produced
Background: Berri, calculation

Pros: I think that Berri is essentially correct in his finding that scoring may be overvalued by "laypersons" -- it is my subjective observation that scoring numbers are the most often cited by casual fans and media outlets alike in their discussion of the contributions of individual players and their relative value/quality. I have written up a very simplistic game-theoretic model in which, assuming that players want higher salaries (which I think is fairly easy to stipulate), and assuming that high-scoring players achieve higher salaries (which would need some argument and evidence), players have an incentive to eschew "team play" in favor of pursuing a high number of shot attempts. This theoretical argument would support some of the claims Berri makes.

Additionally, I agree with Berri's use of a regression model to estimate coefficients for the weighting of box score statistics--I have made the same choice, as I will discuss next week. This seems, at least a priori, more methodologically sound than guesstimating values based on a theoretical argument, as Hollinger does (again, please correct me if I am wrong about this last statement).

Cons: There are almost too many counterarguments to the WP methodology to list here. In fact, since it has already been done so well, I will refer you to this topic at the APBRmetrics forum, where a lot of smart, analytically capable people tear into Berri's work.

To this, I would only add a few specific criticisms: First, Berri's model weighs rebounding extremely strongly--many would say he overweights the value of a rebound--and this leads to findings such as Dennis Rodman being more valuable (per-minute) than Michael Jordan. (Edit: Apparently, Berri modified his methods for the publication of WoW, and at such time, he identified Jordan as the better of the two. See this post.) Findings such as these have been roundly criticized by essentially everyone, but I am willing to concede at least the theoretical possibility that they are true. My main problem is that the author's typical response to such criticism has been to refer to the econometric work performed in his book and various articles, claiming objectivity--in other words, Berri is just the messenger, the numbers themselves reveal the actual truth, and the actual truth indicates Rodman > Jordan.

This, to me, appears to be a cop-out. (Be advised that I have not read Berri's book or articles, seen his regression output, or attempted to replicate his results--as such, my critique should be taken with a large grain of salt.) Others have suggested that Berri's work fails the "smell test," as in, its results are so illegitimate as to seem suspicious. The term I would use is that Berri's model lacks "face validity;" it does not appear to measure what it purports to measure.

Further, Berri's deflection of responsibility to the regression seems somewhat delusory. It is well-known to almost anyone who has performed such analysis that regression models can be fit to support almost any conclusion. I could show you, for example, an example in which the mere inclusion or exclusion of an intercept term in a model changes the coefficients of the other predictors from insignificant to significant. It is not my intention to "pick a fight" with Berri's analysis, because his work appears very thorough and reasonably well thought-out, and I have not read it. However, passing the buck of responsibility for his results to the regression itself seems somewhat disingenuous.


I am sure there are substantial swaths of existing analytic literature that I have not covered here, as well as numerous names which I have not mentioned. Their exclusion was not intentional, except as I have only limited time to assess and discuss an essentially infinite body of work. I have elected instead to examine some of the more widely-known and used methodologies, with an eye toward familiarizing the "uninitiated" with some of the basics. I would also add the caveat that a wise person takes into account multiple sources of information and various perspectives in making any assessment or decision, and the truly wise see folly in trying to encapsulate the entirety of one player's value in a single number.

Next week, I plan on introducing a novel value metric, which has some of the strengths and some of the weaknesses embodied in each of the above-discussed measures, and blithely commits the folly of distilling value into a single number. I hope I have made some progress here toward justifying the creation of yet another statistic, and if not, I hope you will indulge me, as that's exactly what I've done.

Add to Technorati Favorites