### The (fat?) tails of true talent – Part 3.1- Results for Gaussian True Talent (Batting)

I described the methods I used to estimate true talent distributions in part 2. In short, I determined the maximum likelihood parameters for true talent, assuming a Gaussian distribution, then generated a simulated sample based on that distribution and compared it to data

I looked at 4 different quantities; OBP, wOBA, FIP-, and ERA-. All data was downloaded from FanGraphs leader boards, for the years 1969-2013.

For OBP in the time frame 1969-1992, and only considering players with at least 150 PA, the maximum likelihood values for $t_0$ and $\sigma_t$ were 0.329 and 0.0308, respectively. Since the number of plate appearances you use to regress to the mean is $k^2/\sigma^2_t$ and $k \approx 0.5$ for OBP, it follows that you regress OBP by about 260 PA.

Here are the 1 and 2 sigma contours for OBP,

If I generate a simulated sample of true talents based on the maximum likelihood values, and then a simulated set of observations based on the binomial distribution, the comparison looks like this,

If I take the ratio of observed to simulated, I get this,

The numbers above each bar on the bar-graph represent the number in the data histogram. So this means the distributions don’t match great on the tails. For OBP about 0.400 to 0.450 I get more events in the simulation than in the real data. Right at around OBP = 0.45 I get an uptick in the data.

If I bump up the PA limit to only look at players with 200 or more PA, this is what the ratio looks like,

and if I bump it to 300, this is what it looks like,

So that seems to support my conjecture (that the tails are fat) better than the analysis with a 150 PA limit. It’s worth pointing out that my method for determining the parameters of the distribution accounts for the higher variance of players with low PA. A more likely flaw is that PAs are correlated with true talent.

Here is the same (1969-1992, 300 PA limit), using wOBA,

and here it is for OBP and wOBA for 1993-2005,

I zoomed in the y-axis of the OBP and wOBA plots, the values at OBP>0.5 that go off the top are clearly Barry Bonds, and are at about 10-20 times more likely than the simulation.