Monday, October 20, 2014

Regression

Regression

\[ Y = \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \]

\[ \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \\ \vdots \\ Y_n \end{bmatrix} = \begin{bmatrix} 1 & X_{11} & X_{12} & \dots & X_{1p} \\ 1 & X_{21} & X_{22} & \dots & X_{2p} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & X_{n1} & X_{n2} & \dots & X_{np} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \vdots \\ \beta_p \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \vdots \\ \epsilon_n \end{bmatrix} \]

Beta

\[ \boldsymbol{\hat{\beta}} = (\boldsymbol{X}^T\boldsymbol{X})^{-1}\boldsymbol{X}^T\boldsymbol{Y} \]

Which Variables Matter?

Acres FamilyIncome FamilyType NumBedrooms NumChildren NumPeople
1-10 150 Married 4 1 3
1-10 180 Female Head 3 2 4
1-10 280 Female Head 4 0 2
1-10 330 Female Head 2 1 2
1-10 330 Male Head 3 1 2
1-10 480 Male Head 0 3 4

All Subsets

All Subsets

\[ {p \choose 1} + {p \choose 2} + \dots + {p \choose p-1} + {p \choose p} \]

Criteria

  • AIC
  • BIC
  • Deviance
  • Corss-Validation

10's of Variables

100's of Variables

1000's of Variables

Curse of Dimensionality

Curse of Dimensionality

Shrinkage and Regularization

OLS

\[ \min_{\beta_{0},\beta \in \mathbb{R}^{p+1}} \left[\sum_{i=1}^N \left( y_i - \beta_0 -x_i^T\beta \right)^2 \right] \]

Ridge

\[ \min_{\beta_{0},\beta \in \mathbb{R}^{p+1}} \left[\sum_{i=1}^N \left( y_i - \beta_0 -x_i^T\beta \right)^2 + \lambda \sum_{j=1}^p \beta_j^2 \right] \]

Lasso

\[ \min_{\beta_{0},\beta \in \mathbb{R}^{p+1}} \left[\sum_{i=1}^N \left( y_i - \beta_0 -x_i^T\beta \right)^2 + \lambda \sum_{j=1}^p |\beta_j| \right] \]

Penalty Term

\[ \text{Ridge: } \beta_j^2 \text{ Lasso: } |\beta_j| \]

Elastic Net

\[ \min_{\beta_{0},\beta \in \mathbb{R}^{p+1}} \left[ \frac{1}{2N} \sum_{i=1}^N \left( y_i - \beta_0 -x_i^T\beta \right)^2 + \lambda P_{\alpha} \left(\beta \right) \right] \] where \[ P_{\alpha} \left(\beta \right) = \left(1 - \alpha \right) \frac{1}{2}||\Gamma\beta||_{\mathit{l}_2}^2 + \alpha ||\Gamma\beta||_{\mathit{l}_1} \]

Gaussian

Binomial

Poisson

Multinomial

Cox

Multivariate Gaussian

Lasso Features

  • Fast Model Fitting
  • Same order of computation as a single OLS
  • Variable Selection
  • p >> N
  • More Stable Predictions

LARS Algorithm

  1. Standardize the predictors to have mean zero and unit norm. Start with the residual \(r = y - \hat{y}\), \(\beta_1, \beta_2, \ldots, \beta_p = 0\).
  2. Find the predictor \(x_j\) most correlated with \(r\).
  3. Move \(\beta_j\) from 0 towards its least squares coefficient \(\mathopen\lt x_{j}, r \mathclose\gt\), until some other competitor \(x_k\) has as much correlation with the current residual as does \(x_j\).
  4. Move \(\beta_j\) and \(\beta_k\) in the direction defined by their joint least squares coefficient of the current residual on \((x_j , x_k)\), until some other competitor \(x_l\) has as much correlation with the current residual. If a non-zero coefficient hits zero, drop its variable from the active set of variables and recompute the current joint least squares direction.
  5. Continue in this way until all \(p\) predictors have been entered. After \(min(N - 1, p)\) steps, we arrive at the full least squares solution.

Coefficient Path

glmnet Code

# build X
acsX <- build.x(FamilyIncome ~ NumBedrooms + NumChildren + NumPeople + 
                    NumRooms + NumUnits + NumVehicles + NumWorkers + 
                    OwnRent + YearBuilt + ElectricBill + FoodStamp + 
                    HeatingFuel + Insurance + Language - 1, 
                data=acs, contrasts=FALSE)

# build Y
acsY <- build.y(FamilyIncome ~ NumBedrooms + NumChildren + NumPeople + 
                    NumRooms + NumUnits + NumVehicles + NumWorkers + 
                    OwnRent + YearBuilt + ElectricBill + FoodStamp + 
                    HeatingFuel + Insurance + Language - 1, 
                data=acs)

set.seed(1863561)
# run the cross-validated glmnet
acsCV1 <- cv.glmnet(x=acsX, y=acsY, family="gaussian", nfold=5)

Coefficients

Inference

  • Confidence Intervals
  • Significance Tests

Refit as LM

Bootstrap Confidence Intervals

\[ \tilde{\beta}_j = \hat{\beta}_j I\left( |\hat{\beta}_j| \le a_n \right) \]

\[ r_i = y_i - x_i^T\tilde{\boldsymbol{\beta}} \]

\[ e_i = r_i - \bar{r} \]

\[ y_i^* = x_i^T\tilde{\boldsymbol{\beta}} + e_i^* \]

\[ \hat{\boldsymbol{\beta}}^* = \underset{u \in \mathbb{R}^p}{argmin} \left[ \sum_{i=1}^N (y_i^* - x_i^Tu)^2 + \hat{\lambda}\sum_{j=1}^p|u_j| \right] \]

\[ \left[ \hat{\beta} + \tilde{\beta} - \hat{\beta}_{1-\alpha/2}^*, \hat{\beta} + \tilde{\beta} - \hat{\beta}_{\alpha/2}^* \right] \]

Bootstrap Confidence Intervals

Covariance Test Statistic

\[ \tilde{\beta}_A(\lambda_{k+1}) = \underset{\beta_A \in \mathbb{R}^{|A|}}{\operatorname{argmin}} \left[\frac{1}{2} ||y - \boldsymbol{X}_A\beta_A||_2^2 + \lambda_{k+1}||\beta_A||_1 \right] \]

\[ T_k = \left( \mathopen\lt y,\boldsymbol{X}\hat{\beta}(\lambda_{k+1}) \mathclose\gt - \mathopen\lt y,\boldsymbol{X}_A\tilde{\beta}_A(\lambda_{k+1}) \mathclose\gt \right) / \sigma^2 \]

\[ T_k \overset{d}{\rightarrow} \text{Exp}(1) \]

covTest

set.seed(1863561)
acsLasso <- lars(x=acsX, y=acsY)
acsTest <- covTest(acsLasso, x = acsX, y = acsY)
 Predictor_Number Drop_in_covariance P-value
               39          2270.8716   0.000
                4            55.6591   0.000
                9           311.0317   0.000
               28            45.0275   0.000
                8             5.1157   0.006
 Predictor_Number Drop_in_covariance P-value
               12                  0       1
              -15                 NA      NA
                7                  0       1
              -35                 NA      NA
              -30                 NA      NA

Significance Test for glmnet

\[ \hat{\beta}^{en} = \underset{\beta \in \mathbb{R}^{p}}{\operatorname{argmin}} \left[\frac{1}{2} ||y - \boldsymbol{X}\beta||_2^2 + \lambda||\beta||_1 + \frac{\gamma}{2}||\beta||_2^2 \right] \]

\[ T_k = \left( \mathopen\lt y,\boldsymbol{X}\hat{\beta}^{en}(\lambda_{k+1},\gamma) \mathclose\gt - \mathopen\lt y,\boldsymbol{X}_A\tilde{\beta}_A^{en}(\lambda_{k+1},\gamma) \mathclose\gt \right) / \sigma^2 \]

\[ (1 + \gamma) \cdot T_k \overset{d}{\rightarrow} \text{Exp}(1) \]

Next Steps

  • Confidence Intervals
  • Prediction Intervals
  • Software for glmnet

Jared P. Lander

The Tools