# baseball with n outs… and m bases.

This post describes some work I did on modeling baseball with n outs and m bases. The code to generate the run-expectancy matrix, and generate linear weights is available from github here,
https://github.com/bdilday/mlbMarkov_nm

This work was motivated from a few different directions. A few months back I made an interactive run-expectancy matrix using Mathematica (published on the Wolfram “demonstrations” site here http://demonstrations.wolfram.com/RunExpectancyMatrixInBaseball/ ). When doing that, I basically hardcoded the system of equations, but thought to myself there must be a better and more flexible way; I didn’t really pursue it since it was more expedient to just hardcode it. Later, Joe Poznanski answered an email asking what would happen if baseball had 4 outs (I don’t have the link handy), and Tango took on the topic on his blog also,
http://tangotiger.com/index.php/site/comments/how-many-runs-would-we-have-with-4-out-innings . I was also in the back of my mind thinking about better ways to express the Markov chains in my (American) football markov chain code
https://fivetwentyone.wordpress.com/2014/08/02/nfl-markov-1-of-n-a-basic-markov-chain/

So all of these led me to work on coding something that was more flexible. For me, the key to formulating the problem in a more flexible way was realizing that the expectation values of number of runs scored could be related not just to the transition from one state to the next, but also the value associated with that transition. Symbolically it looks like,

$e_i = e_j T_{ji} + V_{ji} T_{ji}$

where $e_i$ is the expectation value for the state i, $e_j$ the expectation value for state j, $T_{ji}$ the probability to transition from state i to state j, and $V_{ji}$ the value associated with transitioning from state i to state j. For example, with a bases empty home run, you return to the same state, but there is a value of 1 associated with the transition.

I can write this in a matrix form in the following way,

$\vec{e} = \vec{e} \cdot \vec{M_T} + \mathrm{diag}(\vec{M_V}^{T} \cdot \vec{M_T}) \\ = (\vec{M_T})^{T} \cdot \vec{e} + \mathrm{diag}(\vec{M_V}^{T} \cdot \vec{M_T})$

where $\vec{M_T}$ is the transition matrix (with elements $T_{ij}$, but renamed to avoid confusion with transposition), $\vec{M_V}$ is the matrix of values associated with each transition and the $T$ exponent denotes the transpose. So what my code does is generates the transition and value matrices, then uses linear algebra algorithms to solve the set of equations above for the expectation value of runs scored from any given state. It optionally generates linear weights also by computing the marginal run value per 27 outs for each event. The code has the limitation that it doesn’t allow runners to take an extra base. So, for example, a double with a man on first results in a man on second and a man on third and no run scores. The reason for this is that I don’t see a straight-forward way to model that in general. It is not so complicated for baseball with 3 bases, but suppose you are modeling baseball with 9 bases; does a runner scores from first on a 7-base hit? Does he take 9th on a 6-base hit? I am thinking of adding some sort of take-a-base model, but I’m not sure how much time I want to put into that since the whole project is non-realistic anyway. My hope is that as long as your reference point is a 3-base, 3-out game without runners taking any bases, you can still learn something meaningful about how things change as you add outs or add bases.

Along those lines, here are some examples and some results.

The input to the code is through command line arguments such as, -pX f, where f is the probability (a float) and X is the number of bases, i.e. -p1 0.23 means the probability for a single is 0.23, -p4 0.025 means the probability for a 4-base hit ( a home run) is 0.025, etc… As long as runners don’t take a base, there is no distention between a single and a walk, so -p1 is actually the sum of the probabilities for those two events. -p0 gives the probability for an out. If the sum of the probabilities given on the command line don’t add up to 1, the code rescales them so that they do.

This first example shows the defaults, with 3 bases and 3 outs.

 python mlbMarkov_nm.py -nbases 3 -nouts 3 -p0 0.69 -p1 0.23 -p2 0.05 -p3 0.005 -p4 0.025 prob 0 0.69 prob 1 0.23 prob 2 0.05 prob 3 0.005 prob 4 0.025 idx state expectRunsPerInn 0 000_00 0.369329 1 000_01 0.198906 2 000_02 0.076208 3 001_00 0.660603 4 001_01 0.380633 5 001_02 0.156507 6 010_00 0.814148 7 010_01 0.503800 8 010_02 0.227508 9 011_00 1.105423 10 011_01 0.685527 11 011_02 0.307807 12 100_00 1.040820 13 100_01 0.722806 14 100_02 0.386208 15 101_00 1.332094 16 101_01 0.904533 17 101_02 0.466507 18 110_00 1.485639 19 110_01 1.027700 20 110_02 0.537508 21 111_00 1.776914 22 111_01 1.209427 23 111_02 0.617807 

Here is the same, but generating linear weights also,

 python mlbMarkov_nm.py -nbases 3 -nouts 3 -p0 0.69 -p1 0.23 -p2 0.05 -p3 0.005 -p4 0.025 -doLinearWeights 1 . . . LW: 1 +0.3231 LW: 2 +0.6270 LW: 3 +1.0299 LW: 4 +1.5281 LW: 0 -0.2159 

which says the linear weight for a single is 0.323, for a double 0.627, …, for an out, -0.216.

Here is what happens if I add an out,

 python ./mlbMarkov_nm.py -nbases 3 -nouts 4 -p0 0.69 -p1 0.23 -p2 0.05 -p3 0.005 -p4 0.025 -doLinearWeights 1 idx state expectRunsPerInn 0 000_00 0.585274 . . LW: 1 +0.5477 LW: 2 +1.0017 LW: 3 +1.5437 LW: 4 +2.1174 LW: 0 -0.3430 

Here is what happens if I add a base, and make the probability for a “quadruple” 0,
 python ./mlbMarkov_nm.py -nbases 4 -nouts 3 -p0 0.69 -p1 0.23 -p2 0.05 -p3 0.005 -p4 0.00 -p5 0.025 -doLinearWeights 1 idx state expectRunsPerInn 0 0000_00 0.280176 . . LW: 1 +0.2144 LW: 2 +0.4126 LW: 3 +0.7166 LW: 4 +1.1195 LW: 5 +1.6177 LW: 0 -0.1652 

Here is baseball with walks and strikeouts only,
 python ./mlbMarkov_nm.py -nbases 3 -nouts 3 -p0 0.5 -p1 0.5 -p2 0.0 -p3 0 -p4 0 -doLinearWeights 1 idx state expectRunsPerInn 0 000_00 0.937500 . . LW: 1 +0.6563 LW: 2 +1.1562 LW: 3 +1.6146 LW: 4 +1.9062 LW: 0 -0.6563 

I’ll post some additional examples later if I think of anything interesting.