Skip to contents

The latest release of the baseballr includes a function for acquiring player statistics from the NCAA’s website for baseball teams across the three major divisions (I, II, III).

The function, ncaa_scrape, requires the user to pass values for three parameters for the function to work:

school_id: numerical code used by the NCAA for each school year: a four-digit year type: whether to pull data for batters or pitchers

If you want to pull batting statistics for Vanderbilt for the 2013 season, you would use the following:

library(baseballr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
ncaa_scrape(736, 2021, "batting") %>%
  select(year:OBPct)
#> ── NCAA Baseball Team Stats data from stats.ncaa.org ───────────────────
#>  Data updated: 2022-04-30 07:17:50 UTC
#> # A tibble: 41 × 12
#>     year school     conference division Jersey Player  Yr    Pos      GP
#>    <int> <chr>      <chr>         <dbl> <chr>  <chr>   <chr> <chr> <dbl>
#>  1  2021 Vanderbilt SEC               1 51     Bradfi… Fr    OF       67
#>  2  2021 Vanderbilt SEC               1 25     Noland… So    INF      66
#>  3  2021 Vanderbilt SEC               1 99     Gonzal… So    INF      61
#>  4  2021 Vanderbilt SEC               1 9      Young,… So    INF      61
#>  5  2021 Vanderbilt SEC               1 12     Keegan… Jr    UT       60
#>  6  2021 Vanderbilt SEC               1 8      Thomas… Jr    OF       59
#>  7  2021 Vanderbilt SEC               1 5      Rodrig… So    C        58
#>  8  2021 Vanderbilt SEC               1 16     Bulger… Fr    UT       50
#>  9  2021 Vanderbilt SEC               1 6      Kolwyc… Jr    INF      43
#> 10  2021 Vanderbilt SEC               1 19     LaNeve… So    OF       37
#> # … with 31 more rows, and 3 more variables: GS <dbl>, BA <dbl>,
#> #   OBPct <dbl>

The same can be done for pitching, just by changing the type parameter:

ncaa_scrape(736, 2021, "pitching") %>%
  select(year:ERA)
#> ── NCAA Baseball Team Stats data from stats.ncaa.org ───────────────────
#>  Data updated: 2022-04-30 07:17:52 UTC
#> # A tibble: 41 × 12
#>     year school     conference division Jersey Player  Yr    Pos      GP
#>    <int> <chr>      <chr>         <dbl> <chr>  <chr>   <chr> <chr> <dbl>
#>  1  2021 Vanderbilt SEC               1 51     Bradfi… Fr    OF       67
#>  2  2021 Vanderbilt SEC               1 25     Noland… So    INF      66
#>  3  2021 Vanderbilt SEC               1 99     Gonzal… So    INF      61
#>  4  2021 Vanderbilt SEC               1 9      Young,… So    INF      61
#>  5  2021 Vanderbilt SEC               1 12     Keegan… Jr    UT       60
#>  6  2021 Vanderbilt SEC               1 8      Thomas… Jr    OF       59
#>  7  2021 Vanderbilt SEC               1 5      Rodrig… So    C        58
#>  8  2021 Vanderbilt SEC               1 16     Bulger… Fr    UT       50
#>  9  2021 Vanderbilt SEC               1 6      Kolwyc… Jr    INF      43
#> 10  2021 Vanderbilt SEC               1 19     LaNeve… So    OF       37
#> # … with 31 more rows, and 3 more variables: App <dbl>, GS <dbl>,
#> #   ERA <dbl>

Now, the function is dependent on the user knowing the school_id used by the NCAA website. Given that, I’ve included a ncaa_school_id_lu function so that users can find the school_id they need.

Just pass a string to the function and it will return possible matches based on the school’s name:

ncaa_school_id_lu("Vand")
#> # A tibble: 10 × 6
#>    school     conference school_id  year division conference_id
#>    <chr>      <chr>          <dbl> <dbl>    <dbl>         <dbl>
#>  1 Vanderbilt SEC              736  2013        1           911
#>  2 Vanderbilt SEC              736  2014        1           911
#>  3 Vanderbilt SEC              736  2015        1           911
#>  4 Vanderbilt SEC              736  2016        1           911
#>  5 Vanderbilt SEC              736  2017        1           911
#>  6 Vanderbilt SEC              736  2018        1           911
#>  7 Vanderbilt SEC              736  2019        1           911
#>  8 Vanderbilt SEC              736  2020        1           911
#>  9 Vanderbilt SEC              736  2021        1           911
#> 10 Vanderbilt SEC              736  2022        1           911