I described the methods I used to estimate true talent distributions in part 2. In short, I determined the maximum likelihood parameters for true talent, assuming a Gaussian distribution, then generated a simulated sample based on that distribution and compared it to data
I looked at 4 different quantities; OBP, wOBA, FIP-, and ERA-. All data was downloaded from FanGraphs leader boards, for the years 1969-2013.
For OBP in the time frame 1969-1992, and only considering players with at least 150 PA, the maximum likelihood values for and were 0.329 and 0.0308, respectively. Since the number of plate appearances you use to regress to the mean is and for OBP, it follows that you regress OBP by about 260 PA.
Here are the 1 and 2 sigma contours for OBP,
If I generate a simulated sample of true talents based on the maximum likelihood values, and then a simulated set of observations based on the binomial distribution, the comparison looks like this,
If I take the ratio of observed to simulated, I get this,
The numbers above each bar on the bar-graph represent the number in the data histogram. So this means the distributions don’t match great on the tails. For OBP about 0.400 to 0.450 I get more events in the simulation than in the real data. Right at around OBP = 0.45 I get an uptick in the data.
and if I bump it to 300, this is what it looks like,
So that seems to support my conjecture (that the tails are fat) better than the analysis with a 150 PA limit. It’s worth pointing out that my method for determining the parameters of the distribution accounts for the higher variance of players with low PA. A more likely flaw is that PAs are correlated with true talent.
Here is the same (1969-1992, 300 PA limit), using wOBA,
and here it is for OBP and wOBA for 1993-2005,
I zoomed in the y-axis of the OBP and wOBA plots, the values at OBP>0.5 that go off the top are clearly Barry Bonds, and are at about 10-20 times more likely than the simulation.