baseball with n outs… and m bases.

This post describes some work I did on modeling baseball with n outs and m bases. The code to generate the run-expectancy matrix, and generate linear weights is available from github here,
https://github.com/bdilday/mlbMarkov_nm

This work was motivated from a few different directions. A few months back I made an interactive run-expectancy matrix using Mathematica (published on the Wolfram “demonstrations” site here http://demonstrations.wolfram.com/RunExpectancyMatrixInBaseball/ ). When doing that, I basically hardcoded the system of equations, but thought to myself there must be a better and more flexible way; I didn’t really pursue it since it was more expedient to just hardcode it. Later, Joe Poznanski answered an email asking what would happen if baseball had 4 outs (I don’t have the link handy), and Tango took on the topic on his blog also,
http://tangotiger.com/index.php/site/comments/how-many-runs-would-we-have-with-4-out-innings . I was also in the back of my mind thinking about better ways to express the Markov chains in my (American) football markov chain code
https://fivetwentyone.wordpress.com/2014/08/02/nfl-markov-1-of-n-a-basic-markov-chain/

So all of these led me to work on coding something that was more flexible. For me, the key to formulating the problem in a more flexible way was realizing that the expectation values of number of runs scored could be related not just to the transition from one state to the next, but also the value associated with that transition. Symbolically it looks like,

e_i = e_j T_{ji} + V_{ji} T_{ji}

where e_i is the expectation value for the state i, e_j the expectation value for state j, T_{ji} the probability to transition from state i to state j, and V_{ji} the value associated with transitioning from state i to state j. For example, with a bases empty home run, you return to the same state, but there is a value of 1 associated with the transition.

I can write this in a matrix form in the following way,

\vec{e} = \vec{e} \cdot \vec{M_T} + \mathrm{diag}(\vec{M_V}^{T} \cdot \vec{M_T}) \\ = (\vec{M_T})^{T} \cdot \vec{e} + \mathrm{diag}(\vec{M_V}^{T} \cdot \vec{M_T})

where \vec{M_T} is the transition matrix (with elements T_{ij}, but renamed to avoid confusion with transposition), \vec{M_V} is the matrix of values associated with each transition and the T exponent denotes the transpose. So what my code does is generates the transition and value matrices, then uses linear algebra algorithms to solve the set of equations above for the expectation value of runs scored from any given state. It optionally generates linear weights also by computing the marginal run value per 27 outs for each event. The code has the limitation that it doesn’t allow runners to take an extra base. So, for example, a double with a man on first results in a man on second and a man on third and no run scores. The reason for this is that I don’t see a straight-forward way to model that in general. It is not so complicated for baseball with 3 bases, but suppose you are modeling baseball with 9 bases; does a runner scores from first on a 7-base hit? Does he take 9th on a 6-base hit? I am thinking of adding some sort of take-a-base model, but I’m not sure how much time I want to put into that since the whole project is non-realistic anyway. My hope is that as long as your reference point is a 3-base, 3-out game without runners taking any bases, you can still learn something meaningful about how things change as you add outs or add bases.

Along those lines, here are some examples and some results.

 

The input to the code is through command line arguments such as, -pX f, where f is the probability (a float) and X is the number of bases, i.e. -p1 0.23 means the probability for a single is 0.23, -p4 0.025 means the probability for a 4-base hit ( a home run) is 0.025, etc… As long as runners don’t take a base, there is no distention between a single and a walk, so -p1 is actually the sum of the probabilities for those two events. -p0 gives the probability for an out. If the sum of the probabilities given on the command line don’t add up to 1, the code rescales them so that they do.

This first example shows the defaults, with 3 bases and 3 outs.


python mlbMarkov_nm.py -nbases 3 -nouts 3 -p0 0.69 -p1 0.23 -p2 0.05 -p3 0.005 -p4 0.025
prob 0 0.69
prob 1 0.23
prob 2 0.05
prob 3 0.005
prob 4 0.025
idx state expectRunsPerInn
0 000_00 0.369329
1 000_01 0.198906
2 000_02 0.076208
3 001_00 0.660603
4 001_01 0.380633
5 001_02 0.156507
6 010_00 0.814148
7 010_01 0.503800
8 010_02 0.227508
9 011_00 1.105423
10 011_01 0.685527
11 011_02 0.307807
12 100_00 1.040820
13 100_01 0.722806
14 100_02 0.386208
15 101_00 1.332094
16 101_01 0.904533
17 101_02 0.466507
18 110_00 1.485639
19 110_01 1.027700
20 110_02 0.537508
21 111_00 1.776914
22 111_01 1.209427
23 111_02 0.617807

Here is the same, but generating linear weights also,


python mlbMarkov_nm.py -nbases 3 -nouts 3 -p0 0.69 -p1 0.23 -p2 0.05 -p3 0.005 -p4 0.025 -doLinearWeights 1
.
.
.
LW: 1 +0.3231
LW: 2 +0.6270
LW: 3 +1.0299
LW: 4 +1.5281
LW: 0 -0.2159

which says the linear weight for a single is 0.323, for a double 0.627, …, for an out, -0.216.

Here is what happens if I add an out,


python ./mlbMarkov_nm.py -nbases 3 -nouts 4 -p0 0.69 -p1 0.23 -p2 0.05 -p3 0.005 -p4 0.025 -doLinearWeights 1
idx state expectRunsPerInn
0 000_00 0.585274
.
.
LW: 1 +0.5477
LW: 2 +1.0017
LW: 3 +1.5437
LW: 4 +2.1174
LW: 0 -0.3430

Here is what happens if I add a base, and make the probability for a “quadruple” 0,

python ./mlbMarkov_nm.py -nbases 4 -nouts 3 -p0 0.69 -p1 0.23 -p2 0.05 -p3 0.005 -p4 0.00 -p5 0.025 -doLinearWeights 1
idx state expectRunsPerInn
0 0000_00 0.280176
.
.
LW: 1 +0.2144
LW: 2 +0.4126
LW: 3 +0.7166
LW: 4 +1.1195
LW: 5 +1.6177
LW: 0 -0.1652

Here is baseball with walks and strikeouts only,

python ./mlbMarkov_nm.py -nbases 3 -nouts 3 -p0 0.5 -p1 0.5 -p2 0.0 -p3 0 -p4 0 -doLinearWeights 1
idx state expectRunsPerInn
0 000_00 0.937500
.
.
LW: 1 +0.6563
LW: 2 +1.1562
LW: 3 +1.6146
LW: 4 +1.9062
LW: 0 -0.6563

I’ll post some additional examples later if I think of anything interesting.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s