In parts 3.1 and 3.2 I discussed approximating true talent distributions with Gaussians, and comparing these to data. In this post I will talk about my attempts to model true talent with non-Gaussian distributions.
As a reminder, the probability to make an observation x is the convolution of the noise (or luck) distribution, S(x), with the true talent distribution, T(t).
So far I’ve limited T to be Gaussian, which is convenient because the convolution can be done analytically and all we have to do is add up the log-likelihoods and minimize.
As an alternative to a Gaussian distribution, I tried using a Cauchy and a
Student’s t distribution, but both of these are way too fat, or more precisely give you huge outliers (like FIP- = 1000 or OBP = 2) too often to be useful approximations for what I’m doing. The comparisons I showed in 3.1 and 3.2 suggest the true talent distribution may be a little wide, but not that dramatically.
So, instead I cooked up an ad-hoc model that looks a lot like a Gaussian, but basically says the standard deviation gets a bit bigger as the values get further away from the mean; specifically,
in other words, almost a Gaussian, but f (the fat-parameter) stretches out the standard deviation when gets large (relative to ).
If I fit this model for the 3 parameters, , here are the results
Let me look at the quantity and time frame with the largest value of the fat-parameter, f, which is OBP for 1993-2005. During this time, Barry Bonds had a 0.559 OBP in 2443 PA. Using the Gaussian approximation, I would regress by about 230 PA and estimate his true talent as,
on the other hand if I use the fat-Gaussian as my true talent distribution and compute the mean of the posterior probability distribution I get,
So that’s about as extreme a difference the “fat” tail of OBP can make. I started off this project thinking about Pedro’s 5-year peak of FIP-, which I’m taking to be 1999-2003, during which he had a 43 FIP- and 3644 batters faced. Based on the value of , the number of PAs to regress is and regressed answer would be,
on the other hand, with f at 2 times the standard deviation of the maximum likelihood value () it would be,
Here is how Pedro’s true talent estimate would vary as a function of the fat-parameter, f.
So in conclusion, this work shows no substantive impact due to fat tails on FIP-. For OBP and wOBA, it may make difference on the order of 10 points.