In part 1 I presented a few equations regarding estimating true talent, and using regression to the mean, in a Bayesian framework.
In this post I will describe what I did to estimate the true talent distribution, .
Given a true talent distribution , the probability to observe a performance is,
for example, say x is a 0.400 OBP, this says that the probability to observe OBP=0.400 is, the probability that true talent is 0.200 and luck was 0.200… plus the probability true talent is 0.400 and luck was 0…,plus the probability true talent was 0.800 and luck was -0.400… and everything else in between.
Now, given a set of observations , we can form a likelihood function,
We can find the maximum likelihood values of and from,
or if we take , we can find the minimum of from,
Now, if is Gaussian (it’s close enough), and is Gaussian (that’s the assumption I’m testing), AND we assume doesn’t depend on (the dependance on PA is much more important than the dependance on t), AND we use a fixed (effective) value for PA, then we can get an explicit solution for the maximum likelihood values, which says that equals the sample mean, weighted by plate appearances, and the the variance of the true talent distribution is the sample variance minus the statistical (noise) variance.
If depends on PA, you can still get a closed form solution for the convolution integral,
In the more general case, where may not necessarily be Gaussian, one can still compute the maximum likelihood solutions for the parameters of . To do this I used minuit (specifically, PyMinuit) to minimize the log-likelihood with respect to the true talent distribution parameters.
I looked at both Gaussian true talent distributions and non-Gaussian true talent distributions (part 4). For the Gaussian case, I then generated a set of random numbers from the maximum likelihood distribution, which serves as a simulation of the underlying true talent for a set of players. Then for each player in the set of observed performances, I generated a simulated “noise” or “luck” value based on the known true talent and the known number of plate appearances. This gives me a simulated “observed” sample that I can compare with the real observed sample.
For the non-Gaussian true talent distribution, I fit a “fat-tailed” model, but have not (yet?) generated a set of simulated true talents to compare with reality. Even without making that comparison, however, one can use numerical integration routines to figure out the mean of the posterior distribution of true talent, and compare this to the Gaussian case.