The (fat?) tails of true talent – Part 2- The Method

In part 1 I presented a few equations regarding estimating true talent, and using regression to the mean, in a Bayesian framework.

In this post I will describe what I did to estimate the true talent distribution, T(t).

Given a true talent distribution T(t), the probability to observe a performance x is,

p(x|t_0, \sigma_t) = \int dt' S(x-t') T(t'|t_0, \sigma_t)

for example, say x is a 0.400 OBP, this says that the probability to observe OBP=0.400 is, the probability that true talent is 0.200 and luck was 0.200… plus the probability true talent is 0.400 and luck was 0…,plus the probability true talent was 0.800 and luck was -0.400… and everything else in between.

Now, given a set of observations {x_i}, we can form a likelihood function,

L = \Pi_i ~p(x_i | t_0, \sigma_t).

We can find the maximum likelihood values of t_0 and \sigma_t from,

\frac{\partial L}{\partial t_0} = 0
\frac{\partial L}{\partial \sigma_t} = 0,

or if we take l = -2 \ln{L} = -2 \Sigma_i \ln{p(x_i | t_0, \sigma_t)}, we can find the minimum of l from,

\frac{\partial l}{\partial t_0} = 0
\frac{\partial l}{\partial \sigma_t} = 0

Now, if S is Gaussian (it’s close enough), and T is Gaussian (that’s the assumption I’m testing), AND we assume \sigma_x doesn’t depend on t (the dependance on PA is much more important than the dependance on t), AND we use a fixed (effective) value for PA, then we can get an explicit solution for the maximum likelihood values, which says that t_0 equals the sample mean, weighted by plate appearances, and the the variance of the true talent distribution is the sample variance minus the statistical (noise) variance.

If \sigma_x depends on PA, you can still get a closed form solution for the convolution integral,

p(x|t_0, \sigma_t) = \int dt' S(x-t') T(t'|t_0, \sigma_t)   \propto \frac{e^{[-(x-t_0)^2/2(\sigma^2_x + \sigma^2_t)]}}{\sqrt{\sigma^2_x + \sigma^2_t}}

In the more general case, where T may not necessarily be Gaussian, one can still compute the maximum likelihood solutions for the parameters of T. To do this I used minuit (specifically, PyMinuit) to minimize the log-likelihood with respect to the true talent distribution parameters.

I looked at both Gaussian true talent distributions and non-Gaussian true talent distributions (part 4). For the Gaussian case, I then generated a set of random numbers from the maximum likelihood distribution, which serves as a simulation of the underlying true talent for a set of players. Then for each player in the set of observed performances, I generated a simulated “noise” or “luck” value based on the known true talent and the known number of plate appearances. This gives me a simulated “observed” sample that I can compare with the real observed sample.

For the non-Gaussian true talent distribution, I fit a “fat-tailed” model, but have not (yet?) generated a set of simulated true talents to compare with reality. Even without making that comparison, however, one can use numerical integration routines to figure out the mean of the posterior distribution of true talent, and compare this to the Gaussian case.

Advertisements

3 thoughts on “The (fat?) tails of true talent – Part 2- The Method

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s