Stats and baseball go hand in hand

Credit: Anna Boyle/Art Editor Credit: Anna Boyle/Art Editor

The Carnegie Mellon University Baseball Analytics Workshop was held last Saturday, April 7. It was the first installment in what Carnegie Mellon’s Department of Statistics & Data Science hopes to be a series of workshops that deal with sports analytics. Around 60 participants of all ages gathered to learn some of baseball statisticians’ tools of the trade.

Mark Patterson, Director of the Quantitative Social Science Scholars Program at Carnegie Mellon, gave the participants an introduction to R and RStudio, a statistics program that can be used to do a plethora of fun things with baseball data. Patterson walked the group through the massive Lahman package, an immense data dump of season-to-season statistics for both players and teams. Patterson showed the group a few ways to plot different baseball metrics such as home runs, walks, and strikeouts.

After lunch, Jim Albert, Bowling Green State University professor in mathematics and statistics, gave the afternoon’s keynote speech on the topic of new developments in advanced statistical analysis in baseball, specifically what can be done with PitchFX and Statcast. PitchFX provides analysis on every single pitch that’s thrown in a baseball game. Multiple variables are recorded, like location, movement, and velocity. Statcast, introduced in 2015, analyzes players and player movement on the field. Things like launch angle, exit velocity, and other “hot” statistics becoming more prominently discussed are because of Statcast’s integration into MLB stadiums.

Ron Yurko, a Ph.D. student in statistics at Carnegie Mellon, led the group through a more advanced exploration of RStudio. Because the Pittsburgh Pirates were due to play the Cincinnati Reds later that day, Yurko took a look at Reds first baseman Joey Votto’s statistics, creating a few different graphics in RStudio. He analyzed things like Votto’s OPS and Whiff rate in 2017 varied by pitch type, learning that Votto does incredibly well when trying to hit sinkers and curveballs.

Yurko also created a heat map of where Votto was most likely to swing at pitches. Because Votto has incredible discipline as a hitter, his strike zone heat map was hot in the strike zone, and it sharply cooled off outside the strike zone. To compare, Yurko referenced the heat map of Reds center fielder Billy Hamilton, whose swing probabilities extended well outside the strike zone. Finally, Yurko brought up a spray chart of where Votto’s batted balls were most likely to land.

After the presentation, the group traveled to PNC Park for a pregame Q&A session with Pirates Director of Baseball Informatics Dan Fox and his analytics staff, where they answered many questions concerning their day-to-day workflow, resume advice, and personal opinions on baseball statistics.

After the session, those who were a bit more daring stayed to catch a chilly game between the Pirates and the Reds. Ultimately, the Bucs fell 7–4.