The availability of tracking data has been an exciting development in the world of baseball statistics and analytics in the last several years. Most significantly this includes pitch f/x and, more recently, batted ball data from MLBAM Statcast. The data I am working with here are a limited set from the hit f/x system, from April, 2009, but it is straight-forward to extend the ideas outlined here to a larger data set from Statcast. I have smoothed the data with an ad-hoc Gaussian kernel. The batted ball data include measured quantities such as launch angle off the bat, speed off the bat, hang time, and distance. A natural question to ask is how the outcome, specifically the probability for a batted ball to be a hit as opposed to an out, depends on launch angle and speed. This presents the challenge of representing three variables in a two-dimensional graphic. One way of doing this is using a heatmap, i.e. using a 2d plane for the explanatory variables and a color for the outcome. Here’s an example,
Data visualization research suggests that spatial separation and length are the most effective ways of showing quantitative comparisons, and in particular that color is better for categorical variables than quantitative variables. My goal here is to explore an alternative to a heatmap that uses a line graph instead of color to show the quantitative dependance of batting average on launch speed. One complication with that is that the way the batting average changes with launch angle depends on launch speed which gives the data interesting spatial behavior in the launch-angle / launch-speed plane. To try and keep this information, I came up with the idea of using brushing on the launch angle variable to highlight a given value of launch angle but to also highlight the neighboring few values to try and show the gradient in the launch angle direction. The result looks like this,
The idea is you can use mouseover on the bar on the left-hand side to highlight a particular value of the launch angle, and the blue-to-red color variation show the way the curve changes at adjacent values. The graphic and the source code, which uses d3.js, are available on my github page. I also have a version that uses hang time and distance as the variables, as shown below, and one that uses batted-ball wOBA instead of batted-ball batting average.