Graphicacy major league dataviz challenge – my entry and the winners

As I mentioned in my previous post, I submitted an entry to the Graphicacy major league data challenge, and I want to share my entry here.

I will describe my entry in more detail below, but here is a screenshot. A fully interactive version is available here. My entry was awarded an honorable mention, which puts it somewhere in the top 10, I think in the top 7.


Just to clarify what my chart is showing, I left my y-axes unlabeled in an attempt to make the visualization less cluttered (and time constraints) but my units are always “best of all time”. So when you hover on Rickey Henderson – Steals, you will see the thin blue line goes all the way to the top (1 in units of best of all time) and his bar chart in the top left, which plots individual year totals as a function of age, also goes to the top for the 1982 season (when he was 23). In the bar-chart inset for Babe Ruth – Home Runs shown in the screenshot you can see he had 3 seasons (1920, 1921, 1927) where he was about 0.8 in units of “best of all-time”. You can also see the big dip he had in 1925 when he had health problems of some kind.

The contest was to visualize the careers of the top players in major league baseball history. The rules were open-ended as far as which players to include. They offered a list, but also said if you had some criteria you wanted to apply and you could justify it, then go ahead. I took the top 20 batters, ordered by baseball-reference rWAR, and including only WAR accumulated from 1901 to the present. In practice this means I computed career rWAR myself, using my copy of the baseball-reference daily-updated war database, and I joined that with season-by-season statistics from the Lahman database. The original announcement said one should visualize the careers of the top 10 batter & top 10 pitchers, but after working on my design, I made the decision to use the top 20 batters. I wrote my visualization from scratch using D3js.

My first idea was to do a “mountains out of molehills” type chart, e.g.


My concern with that was that it’d be difficult to make comparisons from player to player, but it would be interesting to try it. A stacked-area or stream chart would also look pretty cool I think, e.g,


but would be challenging to make it actually informative, because there are so many statistical categories and such a large span of time.

My next design was a complicated series of grouped and stacked bar charts, kind of a mash up of these two,


As far as I got with that idea was about here,


If it’s not obvious, that’s WAR and PA, by age, for my top 10 batters (can you figure out who is who?). My next step was to make grouped bars for  hits (singles, doubles, triples, home runs) and run-related stats (runs and rbis), and then add some sorting options, and then repeat for 10 pitchers, but it seemed like the result was going to end up being informative but kind of dry. I happened to be looking at Edward Tufte’s Visual Display of Quantitative Information which features the Paris to Lyon train schedule on the cover,


It’s a really cool looking visualization and I wanted to go for a similar aesthetic; densely packed information laid over a fine grid, with some interactive elements for highlighting. After making my design, I decided making a fine grid of years/percentiles would make it too busy, so I didn’t add that part.

The gold, silver and bronze winners are listed at the graphicacy website, here (note: they have changed the URL, I updated this link on 06/01/2016). One feature I really like that the winners have in common is including a picture of the players.

The gold medal winner had a similar layout to mine, using thin lines to represent the quantities. One difference is that it uses a drop down menu to isolate one stat, instead of showing all of them and using mouse hover (via voronoi tessellation) to highlight one stat. Both designs have merit, I think. It includes a number of different viewing options also, like shifting the x-axis from calendar year to career year, and shifting from cumulative to year-by-year, which are great additions. It also explicitly always comparison between two individuals, which is a really nice touch.

The silver medal winner is quite different and has a lot of bells and whistles. They focus on WAR and use text to highlight years in which a player led the league (or black ink). They also show a little icon of a trophy to highlight years a players team won the championship, which is a really clever addition

The bronze medal winner is a static png file that basically uses stacked area charts, colored by player.


