This project was used as my undergraduate mathematics senior project. The goal of this project was to extract baseball pitch data from public sources into RStudio, and use the tools within R to tidy and analyze the data, and plot the results. This was the process:
- Download the data from baseballsavant.mlb.com into RStudio on pitches faced by 30 MLB hitters
- Clean the data into a more workable form using Tidyverse package
- Score each outcome of the pitches by creating a calculated field.
- Group the hitters based on the pitches they were most successful against.
- Group the pitchers based on the pitches they threw the most.
- Perform a multidimensional linear regression on each of the pitcher types against the hitter types.
- Re-group the pitchers using k-means clustering on their success against each of the hitter types.
- Plot the results.