so don’t make these stupid analytical comparisons! The same goes for Bill Russell too. People who say Russell is better than Chamberlain are all crazy! He is just another Dikembe Mutombo on a championship team. If Wilt was on that team, they would’ve won same amount of championships. Russell and Rodman are just one dimensional players. If Michael Jordan didn’t retire for 2 years, he would’ve won 8 championship in a row. That is why he is the greatest player of all time. ]]>

If we’re looking for the bizarre loophole, maybe looking at other examples of large statistical leaps from year to year+1 would yield worthwhile insights.

I also wonder if Rodman can recall any conscious change in his approach or strategy for games during those years. Worth asking him before he gets locked away in North Korea? :)

]]>The non-random selection for that “quick and dirty” model was entirely a function of me having to cut and paste data from basketball-reference at the time. I agree that it’s bad practice and would never do it these days (and you’ll see, later in the series, I move toward more “entire history of the league” datasets).

However, the margin for error for this analysis is so huge that I think the particular model is incidental and is provided mainly as an example of how estimating old RB%’s isn’t nearly as difficult as people make it out to be. In reality, any reasonable method you use to compare Rodman with Chamberlain and Russell will tell you the same thing: Rodman grabbed a much higher % of his available rebounds, and it’s not close.

]]>But I had an observation/criticism/question. A couple of times so far, I notice that you will take non-random sub-samples of your data at the player level and then use that sub-sample for further analysis. In this post, it’s when you created the PV statistic based on five players — “Dennis Rodman, Dwight Howard, Tim Duncan, David Robinson, and Hakeem Olajuwon” to create a 60 season sub-sample. Why these? You write, “I picked these names to represent top-level rebounders in a variety of different situations (and though these are somewhat arbitrary, this analysis doesn’t require a large sample).”

My concern is that you are introducing spurious correlation between rebounding, time on the court, and overall shot accuracy (team share rebounding) because of the “conditioning on a collider” problem. (You’ve no doubt seen this nice explanation of it by Gabriel Rossman but if not, it’s great and I use it now in my causal inference class every semester).

Here’s my graphical representation of the problem.

1. f(M,A,R) –> S

2. g(M,A) –> R

Where “R” is the player’s rebounding rate, “M” is number of minutes on the court in a game, “A” is the overall accuracy of players taking shots, “f” is a function that determines “top-level rebounders in a variety of different situations and “g” relates minutes and accuracy to rebounding itself. The “S” variable is the sample of five players, or the “top-level rebounders in a variety of different situations”.

So notice, while there is some joint correlation between M and A and R, which needs to be adjusted for so that some pure measure of R can be determined which conditions on those omitted variables, you are conditioning on top R individuals. Therefore you are conditioning on low A (player inaccurate shooting) and high M (minutes in the game) values.

I’m not exactly sure what to expect from this, except that if you draw out these two equations using a directed acyclical graphical model (DAG), you’ll see that when you condition on the sub-sample, you will have opened a backdoor path between R, A and M which is equivalent to introducing spurious correlations across these variables. I’m not sure we can say either that in doing so that you are preserving the ordering that you’re trying to model. Your PV variable is the ratio of (Player rebounds per team rebounds) and (Player minutes per team minutes).

Maybe you’re fine, but I worry that this non-random selection still is not very transparent and at least raises some red flags. I wonder if you shouldn’t just do this for the entire sample and not a sub sample — particularly not a sub sample that is conditioning on the dependent variables of interest (R in this case).

]]>I know this post is old, but in case you ever come back and look at this, I had a question about your analysis.

The trend line definitely makes sense. The more time you have in the possession, the choosier you can be. But arent’t there other plausible explanations. For example, we might expect that a team that is poor offensively and/or facing a strong defensive team would have fewer points per first shot than average. That same team would also likely have fewer “good” shot opportunities per possession. So wouldn’t it makes sense that those teams would find themselves stuck shooting late in the clock more often? Conversely, a good offensive team may be able to make it happen in the last few seconds of a possession, but, as a result of being good offensively, may just find themselves in that position much less often.

]]>Really liked the Randy Moss thread. The point you made about Dante Culpepper made me laugh because of all the “homer” love for Matthew Stafford hereabouts. Too bad Calvin has not had more q-backs.

If you get a chance, though, could you look at Stafford’s stats? He has a reputation here based on his comebacks. That, of course, ignores the obvious fact he and Lions have been losing through three quarters.

All those yards do not make a winner outside of a fantasy league.

Typical D sports fan…rant,rant, rant. ]]>

He’s long been a great rebounder and an under-the-rim scrapper, but he put up a truly Rodman-esque rebounding seasons (actually better than nearly every Rodman season, albeit next to Brook Lopez) and it seems he’s kicked into this high gear at the same age Rodman did, including the season year when he didn’t get very many minutes. Obviously, he was never the defender Rodman was, but how does his rebounding impact stack up? And what’s missing exactly from his game that he’s not Rodman circa 1996?

]]>Anyway, not trying to argue the numbers here, just a thought.

]]>