__STYLES__
This is a project for DATA 115, Introduction to Data Analytics, at Washington State University. I was tasked with picking data that I found interesting and perform all the necessary steps in doing data analysis.
The data that I chose to use was from the MLB regular seasons of 2009-2021, excluding 2020.
It would seem that MLB’s focus on increased velocities stems from increased focus on both biomechanics and specialized training. This is proven with programs such as Driveline Baseball which has a significant roll in using biomechanics and data analytics to implement their training programs. However, could this increased focus on velocity be having a negative effect on control? Could increased velocity lead to more hit by pitches?
The data collecting, cleaning, and transforming were all fairly simple. I downloaded the CSV from https://www.fangraphs.com/. From the homepage of fangraphs I clicked on Team Stats. From there I went down to the Multiple Seasons and entered 2007 to 2020. Then I went down to the Custom Leaderboards and created my own Leaderboard by selecting the columns Season, Team, Win, Loss, ERA, Base_on_Balls, HBP, WP, Balls, Strikes, K/9, BB/9, K/BB, and Velo. After looking at the data I realized that the 2020 season where there were only 60 games played would be an outlier. I chose to exclude this data from my analysis. In doing this I chose to only use odd numbered years in the data starting with 2009 and ending with 2021.
My analysis of the data concludes that a higher average velocity does lead to a higher number of hit by pitches. As we can see the graphs both trend upwards being very similar to each other. For every two mile per hour increase we can expect our hit by pitches to increase by one.
We know that pitchers’ average velocity can’t increase exponentially. I would expect there to be a ceiling for the increase in velocity. The upper end of the average velocity graph seems like it may be flattening out which might be the start of the ceiling for average velocity. More time and data will have to be collected to see if this is true. However, with more specialized training thanks to increased use of data and biomechanics we can expect more pitchers to have higher average velocity that can be sustained for longer periods of time. This may ultimately lead to increases in hit by pitch incidents.
One thing to note is that more analysis needs to be done on individual pitches to see which pitches have the most number of hit by pitches attributed to them. There also needs to be more analysis done on individual pitchers as opposed to teams to see if there is any correlation between the various pitcher’s roles and hit by pitches.