This post describes some work I did on modeling baseball with n outs and m bases. The code to generate the run-expectancy matrix, and generate linear weights is available from github here,

https://github.com/bdilday/mlbMarkov_nm

This work was motivated from a few different directions. A few months back I made an interactive run-expectancy matrix using Mathematica (published on the Wolfram “demonstrations” site here http://demonstrations.wolfram.com/RunExpectancyMatrixInBaseball/ ). When doing that, I basically hardcoded the system of equations, but thought to myself there must be a better and more flexible way; I didn’t really pursue it since it was more expedient to just hardcode it. Later, Joe Poznanski answered an email asking what would happen if baseball had 4 outs (I don’t have the link handy), and Tango took on the topic on his blog also,

http://tangotiger.com/index.php/site/comments/how-many-runs-would-we-have-with-4-out-innings . I was also in the back of my mind thinking about better ways to express the Markov chains in my (American) football markov chain code

https://fivetwentyone.wordpress.com/2014/08/02/nfl-markov-1-of-n-a-basic-markov-chain/

So all of these led me to work on coding something that was more flexible. For me, the key to formulating the problem in a more flexible way was realizing that the expectation values of number of runs scored could be related not just to the transition from one state to the next, but also *the value* associated with that transition. Symbolically it looks like,

where is the expectation value for the state i, the expectation value for state j, the probability to transition *from* state i *to* state j, and the value associated with transitioning from state i to state j. For example, with a bases empty home run, you return to the same state, but there is a value of 1 associated with the transition.

I can write this in a matrix form in the following way,

where is the transition matrix (with elements , but renamed to avoid confusion with transposition), is the matrix of values associated with each transition and the exponent denotes the transpose. So what my code does is generates the transition and value matrices, then uses linear algebra algorithms to solve the set of equations above for the expectation value of runs scored from any given state. It optionally generates linear weights also by computing the marginal run value per 27 outs for each event. The code has the limitation that it doesn’t allow runners to take an extra base. So, for example, a double with a man on first results in a man on second and a man on third and no run scores. The reason for this is that I don’t see a straight-forward way to model that in general. It is not so complicated for baseball with 3 bases, but suppose you are modeling baseball with 9 bases; does a runner scores from first on a 7-base hit? Does he take 9th on a 6-base hit? I am thinking of adding some sort of take-a-base model, but I’m not sure how much time I want to put into that since the whole project is non-realistic anyway. My hope is that as long as your reference point is a 3-base, 3-out game without runners taking any bases, you can still learn something meaningful about how things change as you add outs or add bases.

Along those lines, here are some examples and some results.

The input to the code is through command line arguments such as, -pX f, where f is the probability (a float) and X is the number of bases, i.e. -p1 0.23 means the probability for a single is 0.23, -p4 0.025 means the probability for a 4-base hit ( a home run) is 0.025, etc… As long as runners don’t take a base, there is no distention between a single and a walk, so -p1 is actually the sum of the probabilities for those two events. -p0 gives the probability for an out. If the sum of the probabilities given on the command line don’t add up to 1, the code rescales them so that they do.

This first example shows the defaults, with 3 bases and 3 outs.

python mlbMarkov_nm.py -nbases 3 -nouts 3 -p0 0.69 -p1 0.23 -p2 0.05 -p3 0.005 -p4 0.025

prob 0 0.69

prob 1 0.23

prob 2 0.05

prob 3 0.005

prob 4 0.025

idx state expectRunsPerInn

0 000_00 0.369329

1 000_01 0.198906

2 000_02 0.076208

3 001_00 0.660603

4 001_01 0.380633

5 001_02 0.156507

6 010_00 0.814148

7 010_01 0.503800

8 010_02 0.227508

9 011_00 1.105423

10 011_01 0.685527

11 011_02 0.307807

12 100_00 1.040820

13 100_01 0.722806

14 100_02 0.386208

15 101_00 1.332094

16 101_01 0.904533

17 101_02 0.466507

18 110_00 1.485639

19 110_01 1.027700

20 110_02 0.537508

21 111_00 1.776914

22 111_01 1.209427

23 111_02 0.617807

Here is the same, but generating linear weights also,

python mlbMarkov_nm.py -nbases 3 -nouts 3 -p0 0.69 -p1 0.23 -p2 0.05 -p3 0.005 -p4 0.025 -doLinearWeights 1

.

.

.

LW: 1 +0.3231

LW: 2 +0.6270

LW: 3 +1.0299

LW: 4 +1.5281

LW: 0 -0.2159

which says the linear weight for a single is 0.323, for a double 0.627, …, for an out, -0.216.

Here is what happens if I add an out,

python ./mlbMarkov_nm.py -nbases 3 -nouts 4 -p0 0.69 -p1 0.23 -p2 0.05 -p3 0.005 -p4 0.025 -doLinearWeights 1

idx state expectRunsPerInn

0 000_00 0.585274

.

.

LW: 1 +0.5477

LW: 2 +1.0017

LW: 3 +1.5437

LW: 4 +2.1174

LW: 0 -0.3430

Here is what happens if I add a base, and make the probability for a “quadruple” 0,

python ./mlbMarkov_nm.py -nbases 4 -nouts 3 -p0 0.69 -p1 0.23 -p2 0.05 -p3 0.005 -p4 0.00 -p5 0.025 -doLinearWeights 1

idx state expectRunsPerInn

0 0000_00 0.280176

.

.

LW: 1 +0.2144

LW: 2 +0.4126

LW: 3 +0.7166

LW: 4 +1.1195

LW: 5 +1.6177

LW: 0 -0.1652

Here is baseball with walks and strikeouts only,

python ./mlbMarkov_nm.py -nbases 3 -nouts 3 -p0 0.5 -p1 0.5 -p2 0.0 -p3 0 -p4 0 -doLinearWeights 1

idx state expectRunsPerInn

0 000_00 0.937500

.

.

LW: 1 +0.6563

LW: 2 +1.1562

LW: 3 +1.6146

LW: 4 +1.9062

LW: 0 -0.6563

I’ll post some additional examples later if I think of anything interesting.