NFL Markov: 4 of n (the yards gained distributions (transition matrix))

In football, a basic state consists of a set of down-distance-yardline values. One could include score differential, or time I suppose, but I’m not considering those here. The transition matrix can be built by using the yards-gained distribution, along with the probabilities to run a play as opposed to punting or attempting a field goal. So the main questions I want to address here are,

  • what are the probabilities for passing versus running versus kicking?
  • given that a play is run, what is the yards gained distribution?
  • I am ignoring some context-specific distinctions such as time on the clock, score-differential and weather. This is very much in the same vein as something like RE24 (from the more developed field of baseball analytics) in that it accounts for game state, but is otherwise context neutral.

    With that being said, it is not immediately obvious what the dependence of the yards-gained distribution on down, distance, and yardline should be. So my starting point was to grab some data and start slicing and dicing and making some graphs. To clarify, yardline is encoded as “yards-from-own-goal”, abbreviated yfog, where 1 means you are backed up against your end zone, and 99 means you are 1 yard away from scoring a touchdown.

  • In between, say yfog=20 and yfog=75, mean yards gained doesn’t depend strongly on field position (Fig. \ref{fignm1}).
  • For plays originating between yfog=20 and yfog=75, mean yards gained doesn’t depend strongly on yards-to-go (Fig. \ref{fignm2}).
  • The fraction of plays that are passes depends strongly on down and distance, and less strongly on field position (Figs. \ref{fignm3} \& \ref{fignm5}).
  • On fourth down, the probabilities to punt versus try a field goal versus go for it vary rapidly as a function of yards-to-go and yards-from-own-goal. This is particularly true for yfog \sim 50 to yfog \sim 80 (Figs. \ref{figep1}, \ref{figep2}, and \ref{figep3}).
  • So now the question is, how can we model the yards-gained distribution? Since mean yards gained doesn’t depend too strongly on down, distance, and field position, it is instructive to pool a bunch of states, and look in more detail at the distribution to get a feel for what it looks like. In Fig. \ref{fignm6}, I show distributions for passes (left) and rushes (right), for all first-down plays originating between yfog=20 and yfog=75.

    After playing around with some functions, the best general agreement I could find came from the following,

    p(y) = A ~\frac{e^{(y-y_0)/\sigma_1}} {1+e^{(y-y_0) (\sigma_1+\sigma_2)/(\sigma_1 \sigma_2)})} + G ~e^{- (y-g_0)^2/ (2 \sigma_g^2)}.

    I refer to this as a “Bazin plus Gauss” function; Bazin because I first encountered the “ratio of exponentials” in a paper by Bazin, et al, that used it to model supernova light curves (http://arxiv.org/abs/1109.0948); and Gauss for obvious reasons. The first (Bazin) term basically stitches together two exponentials at the location y_0. For y \ll y_0, it looks like a rising exponential with a scale factor \sigma_1, and for y \gg y_0, like a declining exponential with a scale factor \sigma_2. The Gaussian part describes being sacked, and in the football application, G is identically 0 for rushes.

    Using this functional form, I use the function minimization package pyminuit to determine the maximum likelihood values for the parameters, y_0, \sigma_1, \sigma_2 for rushes, and additionally G/A, g_0, and \sigma_g for passes. For rushes, typical values are y_0 \sim 1, \sigma_1 \sim 1.5, \sigma_2 \sim 3.5. For passes, typical values are y_0 \sim 4.5, \sigma_1 \sim 1.8, \sigma_2 \sim 8.0, G/A \sim 0.12, g_0 \sim -6.5, and \sigma_g \sim 3.0. Figs. \ref{figye1} \& \ref{figym1} compare the model to the empirical distribution for 1st and 10 plays from the 20.

    In the next section I will describe how my model is implemented in code.

    modelYardsDist

    empYardsDist

    empProbs2

    empProbs1

    empProbs3

    nflMarkov7

    nflMarkov6

    nflMarkov4

    nflMarkov3

    nflMarkov2

    nflMarkov1

    Advertisements

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s