Statistician

Data Scientist

Math & Statistics

\[ f(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{\frac{x-\mu}{\sigma^2}} \]

Substantive Experience

Hacking Skills

Play-by-Play

Questions?

  • Pass vs Rush
  • Favored Receiver
  • Rush Direction
  • Blitzing

S

John Chambers

John Chambers * Bell Labs * 1976

R

gentleman ihaka

Robert Gentleman & Ross Ihaka * University of Auckland, New Zealand * 1993

Statistics

Modeling

Machine Learning

Hierarchical clustering of wine data

Hierarchical clustering of wine data

Graphics

The Data

Interesting Part

Read the Data

g14 <- read.csv2(file.path(dataDir, 'football', 'pbp-2014.csv'), 
                 sep=',', header=TRUE, stringsAsFactors=FALSE)
g15 <- read.csv2(file.path(dataDir, 'football', 'pbp-2015.csv'), 
                 sep=',', header=TRUE, stringsAsFactors=FALSE)

games <- bind_rows(g14, g15)

Giants

Pass vs Rush

Focus on One Team's Offense

oneOff <- games %>%
    filter(OffenseTeam == 'NYG', PlayType %in% c('PASS', 'RUSH')) %>%
    mutate(PlayType=factor(PlayType, levels=c('RUSH', 'PASS')), 
           Down=factor(Down, levels=c(1, 2, 3, 4)))

Pass vs Rush

ggplot(oneOff, aes(x=PlayType)) + geom_bar(fill='darkgrey')

Pass vs Rush by Down

ggplot(oneOff, aes(x=PlayType)) + geom_bar(fill='darkgrey') + facet_wrap(~Down)

Binary Regression

\[ p(y_i=1) = \text{logit}^{-1}(\boldsymbol{X}_i\boldsymbol{\beta}) \]

\[ \text{logit}^{-1}(x) = \frac{e^x}{1+e^x} = \frac{1}{1+e^{-x}} \]

Pass vs Rush Modeled

passRushMod1 <- glm(PlayType ~ Down - 1, data=oneOff, family=binomial)
summary(passRushMod1)
Call:
glm(formula = PlayType ~ Down - 1, family = binomial, data = oneOff)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9174  -1.1486   0.5887   0.9610   1.2065  

Coefficients:
      Estimate Std. Error z value Pr(>|z|)    
Down1 -0.06811    0.06527  -1.044  0.29671    
Down2  0.53305    0.07709   6.914  4.7e-12 ***
Down3  1.66501    0.13034  12.774  < 2e-16 ***
Down4  1.27297    0.42762   2.977  0.00291 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2958.4  on 2134  degrees of freedom
Residual deviance: 2672.6  on 2130  degrees of freedom
AIC: 2680.6

Number of Fisher Scoring iterations: 4

Pass vs Rush Modeled

coefplot(passRushMod1, title='Probability of Pass')

Inverse Logit

invlogit <- function(x)
{
    1/(1 + exp(-x))
}

round(invlogit(-6:6), 2)
 [1] 0.00 0.01 0.02 0.05 0.12 0.27 0.50 0.73 0.88 0.95 0.98 0.99 1.00

Probabilities

coefplot(passRushMod1, trans=invlogit, title='Probability of Pass')

With Yards to Gain

passRushMod2 <- glm(PlayType ~ Down + ToGo - 1, data=oneOff, family=binomial)
coefplot(passRushMod2, trans=invlogit, title='Probability of Pass')

Scenarios

# make grid of scenarios
scenarios <- expand.grid(ToGo=1:15, Down=1:4) %>% as.tbl %>% 
    mutate(Down=factor(Down, levels=c(1, 2, 3, 4)))
# make prediction based on model
scenarioPredict <- predict(passRushMod2, 
                           newdata=scenarios, type='response', se.fit=TRUE)
# build confidence intervals
scenarios <- scenarios %>% mutate(Prediction=scenarioPredict$fit, 
                                  Lower=Prediction - 2*scenarioPredict$se.fit,
                                  Upper=Prediction + 2*scenarioPredict$se.fit)

Probability of Pass

ggplot(scenarios, aes(x=ToGo)) + scale_y_continuous(label=scales::percent) +
    geom_ribbon(aes(ymin=Lower, ymax=Upper), fill='lightgrey') +
    geom_line(aes(y=Prediction)) + facet_wrap(~Down, nrow=2)

Who Gets the Pass?

Who Gets the Pass?

onePass <- oneOff %>% filter(PlayType == 'PASS') %>%
    mutate(Receiver=str_extract_all(Description, pattern=' \\d{1,2}-\\w\\.\\w+( |\\.)',
                                    simplify=TRUE) %>% `[`(, 2), 
           Receiver=str_replace_all(Receiver, '(\\d{1,2}-)|(\\.$)', ''), 
           Receiver= str_trim(Receiver)) %>% 
                      filter(Receiver %in% 
                                 c("L.DONNELL", "V.CRUZ", "R.RANDLE", 
                                   "O.BECKHAM", "H.NICKS"))

Who Gets the Pass?

Decision Tree

\[ \hat{p}_{mk} = \frac{1}{N_m} \sum_{x_i \in R_m} I(y_i = k) \]

Decision Tree

passTree <- rpart(Receiver ~ Down + ToGo + Quarter + Minute + Second + 
                      YardLine + YardLineDirection + Formation, 
                  data=onePass)

Decision Tree

rpart.plot(passTree, extra=104, branch=1, uniform=FALSE, varlen=0, faclen=9, under=TRUE,
           box.col=c(8), border.col=1, shadow.col='darkgray', col=4, eq=": ",
           branch.col='blue', nspace=.1, minbranch=.9, cex=.8)

Decision Tree

What Else?

  • Direction of Rush
  • Length of Pass

Better Models

  • Include More Variables
  • More Advanced Machine Learning

Game Management

  • When to Blitz
  • When to Call Timeouts
  • When to Go For it on Fourth Down
  • When to Go For Two

Personnel

  • Better Drafting
  • Free Agent Valuation
  • Salary Cap Management
  • Player Comparison

Jared P. Lander

Packages

The Tools