With the recent availability (new link) of play-by-play NFL data I got to analyzing my favorite team, the New York Giants with some very hasty EDA.
From the above graph you can see that on 1st down Eli preferred to throw to Hakim Nicks and on 2nd and 3rd downs he slightly favored Victor Cruz.
The code for the analysis is after the break.
I’ve only had the data for a few hours so I am just going to look at who Eli passed to on a given down during their Super Bowl XLVI season. Hopefully I’ll get to more analysis later.
opts_chunk$set(cache = TRUE)
For this I used the following packages:
require(stringr) require(plyr) require(ggplot2)
Next we load the data and winnow it down to just passing plays by the Giants.
# read in the data for 2011 allGames <- read.csv2("../data/2011_nfl_pbp_data.csv", header = TRUE, sep = ",") # just keep the giants games nyg <- allGames[which(allGames$off == "NYG" | allGames$def == "NYG"), ] # just the offensive plays and don't count kickoffs or punts nygOff <- nyg[nyg$off == "NYG" & !is.na(nyg$down), ] # just passing plays nygPass <- nygOff[str_detect(nygOff$description, "pass"), ] nygPass$description <- as.character(nygPass$description) ## extract out the receiver nygPass$Receiver <- str_extract(nygPass$description, "to [A-Za-z]\\.[A-Za-z]+( |\\.)") nygPass$Receiver <- str_replace_all(string = nygPass$Receiver, pattern = "(^to )|( $)|(\\.$)", replacement = "")
Now we will look at how many times each receiver was passed to (including incompletes) on a given down.
# how many times each receiver was passed to for each down downRec <- aggregate(offscore ~ down + Receiver, nygPass, length) # make down a factor for easier plotting downRec$down <- factor(downRec$down) # rename the offscore column to Passes downRec <- rename(downRec, c(offscore = "Passes")) ## calculate the total number of passes to a receiver throughout the ## season so we can remove receivers who didn't get passed to often totalPasses <- aggregate(Passes ~ Receiver, downRec, sum) totalPasses <- rename(totalPasses, c(Passes = "Total")) ## join that into the passing data and reduce the number of receivers downRec <- join(downRec, totalPasses, by = "Receiver") downRec <- downRec[which(downRec$Total >= 10), ] head(downRec, 10)
## down Receiver Passes Total ## 1 1 A.Bradshaw 31 67 ## 2 2 A.Bradshaw 23 67 ## 3 3 A.Bradshaw 13 67 ## 4 1 B.Jacobs 16 29 ## 5 2 B.Jacobs 11 29 ## 6 3 B.Jacobs 2 29 ## 7 1 B.Pascoe 12 21 ## 8 2 B.Pascoe 7 21 ## 9 3 B.Pascoe 2 21 ## 18 1 D.Ware 16 38
Now that we have the data ready we produce the graph from the top of this page shown here again.
ggplot(downRec, aes(x = reorder(Receiver, Passes), y = Passes)) + geom_bar(aes(group = down, color = down, fill = down), stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) + facet_wrap(~down) + scale_color_discrete("Down") + scale_fill_discrete("Down") + labs(x = "Receiver")
We can see that on 1st down Eli preferred to throw to Hakim Nicks over anyone else. On subsequent downs (not much happened on 4th down) he slightly favored Victor Cruz including his 99-yard touchdown reception against the Jets in December.
More advanced analysis will hopefully come soon.
By the way, this is my first post using knitr to build the post and it made life SO much easier. I highly recommend knitr for any web content involving code or the results of code.
Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.
Leave a Reply