Under the Knife: The effects of Tommy John surgery

3/15/2015 Matt Chambers 1 Comments

Lately I've been a little covered up with work and life in general, so I was ecstatic when Dash Davidson asked Matt Cobb and myself to team up for a guest blog post. With newfound inspiration, we began to look at potential data sources, and Matt Cobb found a list of all pitchers who have had Tommy John surgery. Tommy John surgery is becoming fairly common for pitchers, but there is a misconception that the surgery actually enhances performance. Uninjured pitchers have even sought out the surgery in hopes of improving their performance.
Logically, we wanted to see if there was any impact on key pitching statistics before and after surgery, but the data set we found only contained the list of pitchers that had the surgery and not their stats. However, I noticed that there was a column for mlbamid, which corresponds to the player_id used on mlb.com.

Now that I had a way to pair the list of pitchers that had surgery with their stats, I used import.io to scrape the data. import.io makes scraping pages extremely easy by allowing you to easily extract table data. I pointed import.io at mlb.com and used the player id to retrieve a page and extract the table data.

Once this was working, I needed to be able to run a batch search to find the stats of all of the pitchers. import.io does not currently support this feature, but they provide a Google Sheet that enables this functionality by allowing users to paste in a list of URLs for batch processing.

This worked great except for one issue. The API that was created by import.io was not able to follow the redirect that mlb.com issues for current players. For example, if you visit a URL for an inactive player, there is no redirect, and import.io handles this perfectly. However, for active players like A.J. Burnett, there is a redirect from mlb.com to pittsburgh.pirates.mlb.com, and the import.io API does not handle this properly. In order to follow the redirect and capture the URL, I built a script that runs the following cURL command:

curl -w "%{url_effective}\n" -L -s -S http://mlb.com/team/player.jsp?player_id=150359

This allows the redirected URL to be captured. Once I ran this for all active pitchers, I was able to run the URLs through the batch search process and finish completing the data set.

With a complete data set combining all pitchers that have had Tommy John surgery with their career stats, we were finally able to start visualizing the data. After looking at several statistics for pitchers, we decided that Walks plus hits per inning pitched (WHIP) was the most fair statistic since a pitcher only controls how many walks and hits they give up per inning. Earned run average is not always a fair statistic because it depends heavily on the defense behind the pitcher as well as the size of the pitcher’s home stadium.

We wanted to focus on building a visualization that easily lets a user compare an individual pitcher to the rest of the cohort and also see the overall differences in the statistics before and after surgery. We worked with Dave Lee on the design, and he built an awesome graphic for us to build the viz around.

As it turns out, WHIP and ERA are higher after the surgery, thus illustrating that there is no advantage from a statistical perspective from having the surgery. The surgery simply allows pitchers to come back from an injury that was once a career-ender.

1 comment:

  1. It's actually a cool and useful piece of info. I am glad that you simply shared this useful info with us. Please keep us up to date like this. Thanks for sharing
    Tableau Guru