Like many baseball fans, I spent the weeks leading up to the hall-of-fame voting obsessing over the publicly released ballots, and especially the data gathered by Ryan Thibodaux (@NotMrTibbs) in his hall-of-fame tracker. This post describes how I used the data provided by Ryan to make a hall-of-fame voting prediction model. The model grew in part out of discussions on Tom Tango’s blog here, and I’ll comment on some of that below. All of my code is available on github here
https://github.com/bdilday/hofTracker
The basic idea behind the model is to take a linear combination of the public ballots to predict the public + non-public ballot overall results. I downloaded Ryan’s HOF tracker data going back to 2011 and used this for training the model. There’s a number of changes to consider between 2011 and the present that impact the choice of which data to use.