Portfolio Optimization and Bayesian Regression

Jared P. Lander

\(\pi\) Day, 2014

Portfolio Optimization

Definitions

\(n\) Assets
\(r_i\) Return for asset \(i\)
\(w_i\) Proportion of Portfolio in Asset \(i\)
\(\boldsymbol{\Sigma}\) Covariance Matrix
\(\mu_p\) Expected Portfolio Return
\(\sigma_p^2\) Portfolio Variance

Asset Weights

\[ \sum_{i=1}^n w_i = 1 \]

Asset Weights

\[ \mathbf{w} = \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ w_n \end{bmatrix} \]
\[ \mathbf{w}^T\mathbf{I} = 1 \]

Expected Return

\[ \mu_p = E\left[ \sum_{i=1}^n w_ir_i \right] = \sum_{i=1}^n w_iE[r_i] = \sum_{i=1}^n w_i\mu_i = \mathbf{w}^T\mathbf{\mu} \]

Covariance Matrix

\[ \boldsymbol{\Sigma} = \begin{bmatrix} \sigma_{1}^2 & \sigma_{12} & \dots & \sigma_{1n} \\ \sigma_{21} & \sigma_{2}^2 & \dots & \sigma_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{n1} & \sigma_{n2} & \dots & \sigma_{n}^2 \end{bmatrix} \] where \[ \sigma_{ij} = \sigma_{ji} = E\left(\left(r_i-\mu_i\right)\left(r_j-\mu_j\right)\right) \]

Portfolio Variance

\[ \sigma_p^2 = E\left[(r-\mu)^2\right] = \sum_{i=1}^N \sum_{j=1}^N w_i w_j \sigma_{ij} = \mathbf{w}^T\mathbf{\Sigma}\mathbf{w} \]

\[ \sigma_p = \sqrt{\mathbf{w}^T\mathbf{\Sigma}\mathbf{w}} \]

The Data

Dow Jones

                  MMM        AXP          T        BA       CAT        CVX
2010-01-05 -0.0062832 -0.0022018 -0.0049106  0.032227  0.011885  0.0070583
2010-01-06  0.0140822  0.0160353 -0.0296186  0.029883  0.003033  0.0001256
2010-01-07  0.0007168  0.0117409 -0.0112913  0.039684  0.004030 -0.0037745
2010-01-08  0.0070217 -0.0007149 -0.0073530 -0.009693  0.011166  0.0017632
2010-01-11 -0.0040404 -0.0115082 -0.0048086 -0.011921  0.060917  0.0175870
2010-01-12  0.0008332  0.0131754 -0.0003709 -0.007255 -0.029914 -0.0058280
2010-01-13 -0.0032175  0.0030890 -0.0119404  0.012008  0.001445 -0.0076150
2010-01-14 -0.0033477  0.0124957 -0.0170362  0.006519 -0.005631 -0.0031377
2010-01-15 -0.0015581 -0.0068179 -0.0153908 -0.012094 -0.030469 -0.0040307
2010-01-19  0.0207735  0.0133570  0.0157726 -0.002799  0.013383  0.0056636
require(quantmod)
require(plyr)
# download Dow data
dow <- alply(dowSymbols, .margins=1, .fun=getSymbols, 
             auto.assign=FALSE)
# get closing data
dow <- llply(dow, function(x) x[, 4])
dow <- Reduce(f=cbind, x=dow)
# build returns
dowReturns <- diff(log(dow))[-1, ]

Create Portfolios

make.portfolio <- function(returns, 
                           weights=rep(1/ncol(returns), ncol(returns)))
{
    portCov <- cov(returns)
    expectedReturns <- colMeans(returns)
    c(Return=weights %*% expectedReturns, 
      Volatility=sqrt(weights %*% portCov %*% weights))
}

make.weights <- function(num, zero.prob=0.6, max.draw=100)
{
    remainingProb <- rep((1-zero.prob)/max.draw, times=max.draw)
    probs <- c(zero.prob, remainingProb)
    randoms <- sample(x=0:max.draw, size=num, replace=TRUE, prob=probs)
    randoms/sum(randoms)
}

generate.portfolios <- function(returns, n=100, zero.prob=0.9)
{
    replicants <- replicate(n=n, expr=make.portfolio(returns=returns, 
                            weights=make.weights(num=ncol(returns), 
                                                 zero.prob=zero.prob)))
    as.data.frame(t(replicants))
}
set.seed(2678176)
sampleReturns <- generate.portfolios(returns=dowReturns, n=1000)
require(ggplot2)
require(scales)
g <- ggplot(sampleReturns, aes(x=Volatility, y=Return)) + geom_point() + 
    ylab("Expected Return") + scale_y_continuous(label=percent)
g

Optimal Portfolio

Markowitz

Harry Markowitz

Minimize Volatility

Minimize \(\mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\)

Solve \[ \mathbf{w} = \arg\min_{\mathbf{w}}\{\mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\} \] subject to \[ \mathbf{w}^T\mathbf{I} = \sum_{i=1}^N w_i = 1 \]

Solve \[ \mathbf{w} = \arg\min_{\mathbf{w}}\{\mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\} \] subject to
\[ \mathbf{w}^T\mathbf{I} = \sum_{i=1}^N w_i = 1 \]
\[ \mathbf{w}^T\mathbf{\mu} = \sum_{i=1}^N w_i \mu_i = \mu_0 \]

Lagrange Multipliers

\[ L(\mathbf{w}, \lambda_1, \lambda_2) = \mathbf{w}^T\mathbf{\Sigma}\mathbf{w} + \lambda_1(\mathbf{w}^T\mathbf{I} - 1) + \lambda_2(\mathbf{w}^T\mathbf{\mu} - \mu_0) \]

Gradient

\[ \nabla_L = \mathbf{0} \]

Objective Function

\[ L(\mathbf{w}, \lambda_1, \lambda_2) = \mathbf{w}^T\mathbf{\Sigma}\mathbf{w} + \lambda_1(\mathbf{w}^T\mathbf{I} - 1) + \lambda_2(\mathbf{w}^T\mathbf{\mu} - \mu_0) \]
First Order Conditions

\[ \frac{\partial L(\mathbf{w}, \lambda_1, \lambda_2)}{\partial\mathbf{w}} = 2\mathbf{\Sigma}\mathbf{w} + \lambda_1 \mathbf{I} + \lambda_2 \mathbf{\mu} = \mathbf{0} \]
\[ \frac{\partial L(\mathbf{w}, \lambda_1, \lambda_1)}{\partial\lambda_1} = \mathbf{w}^T\mathbf{I} - 1 = 0 \]
\[ \frac{\partial L(\mathbf{w}, \lambda_1, \lambda_2)}{\partial\lambda_2} = \mathbf{w}^T\mathbf{\mu} - \mu_0 = 0 \]

First Order Conditions

\[ \begin{bmatrix} 2\mathbf{\Sigma} & \mathbf{I} & \mathbf{\mu} \\ \mathbf{I}^T & 0 & 0 \\ \mathbf{\mu}^T & 0 & 0 \end{bmatrix} \begin{bmatrix} \mathbf{w} \\ \lambda_1 \\ \lambda_2 \end{bmatrix} = \begin{bmatrix} \mathbf{0} \\ 1 \\ \mu_0 \end{bmatrix} \]

\[ \mathbf{A}\mathbf{z}_w = \mathbf{b}_0 \] where \[ \mathbf{A} = \begin{bmatrix} 2\mathbf{\Sigma} & \mathbf{1} & \mathbf{\mu} \\ \mathbf{1}^T & 0 & 0 \\ \mathbf{\mu}^T & 0 & 0 \end{bmatrix}, \mathbf{z}_w= \begin{bmatrix} \mathbf{w} \\ \lambda_1 \\ \lambda_2 \end{bmatrix}, \mathbf{b}_0 = \begin{bmatrix} \mathbf{0} \\ 1 \\ \mu_0 \end{bmatrix} \]

Solution

\[ \mathbf{z}_w = \mathbf{A}^{-1}\mathbf{b}_0 \]

\[ \begin{bmatrix} \mathbf{w} \\ \lambda_1 \\ \lambda_2 \end{bmatrix} = \begin{bmatrix} 2\mathbf{\Sigma} & \mathbf{1} & \mathbf{\mu} \\ \mathbf{1}^T & 0 & 0 \\ \mathbf{\mu}^T & 0 & 0 \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{0} \\ 1 \\ \mu_0 \end{bmatrix} \]

Quadratic Programming

One Optimal Portfolio

   Asset     Weight
1    MMM  4.525e-17
2    AXP  9.381e-17
3      T  2.827e-17
4     BA  2.220e-18
5    CAT -2.305e-17
6    CVX  3.118e-17
7   CSCO -1.885e-18
8     KO -1.034e-17
9     DD -7.973e-17
10   XOM  3.885e-19
11    GE -1.021e-16
12    GS -1.475e-17
13    HD  6.726e-02
14  INTC  6.417e-18
15   IBM  0.000e+00
16   JNJ  2.276e-01
17   JPM  1.218e-16
18   MCD  2.508e-01
19   MRK -2.841e-18
20  MSFT -3.249e-17

Getting information out of a table is like getting sunshine out of a cucumber.

-Farquhar & Farquhar (1891)

-Serge Belongie

Visualize

Calculate Portfolio

require(tseries)
meanPortfolio <- portfolio.optim(dowReturns)
meanWeights <- data.frame(Asset=names(dowReturns), 
                          Weight=meanPortfolio$pw,
                          stringsAsFactors=FALSE)
head(meanWeights, 20)
   Asset     Weight
1    MMM  4.525e-17
2    AXP  9.381e-17
3      T  2.827e-17
4     BA  2.220e-18
5    CAT -2.305e-17
6    CVX  3.118e-17
7   CSCO -1.885e-18
8     KO -1.034e-17
9     DD -7.973e-17
10   XOM  3.885e-19
11    GE -1.021e-16
12    GS -1.475e-17
13    HD  6.726e-02
14  INTC  6.417e-18
15   IBM  0.000e+00
16   JNJ  2.276e-01
17   JPM  1.218e-16
18   MCD  2.508e-01
19   MRK -2.841e-18
20  MSFT -3.249e-17
require(scales)
ggplot(meanWeights, aes(x=Asset, ymin=0, ymax=Weight)) + 
    geom_linerange() + 
    theme(axis.text.x=element_text(angle=90, hjust=1, vjust=.5)) + 
    scale_y_continuous(label=percent)

One Optimal Portfolio

qplot(x=meanPortfolio$pm, y=meanPortfolio$ps, geom="point", 
      xlim=c(-.0025, .0025), ylim=c(-.01, .01), xlab="Volatility", 
      ylab="Expected Return") + scale_y_continuous(label=percent)

Efficient Frontier

# Adapted from Guy Yollin
get.optimal <- function(desired.return, returns, riskless=FALSE, rf=0.0, 
                        shorts=FALSE)
{
    optimal <- NULL
    # catch errors
    try(optimal <- portfolio.optim(returns, pm=desired.return, riskless=riskless, 
                                   rf=rf, short=shorts), silent=TRUE)
    # if successful, return the Return and Volatility
    if(!is.null(optimal))
    {
        data.frame(Return=optimal$pm, Volatility=optimal$ps)
    }
}

frontier <- function(returns, n.portfolios=20, riskless=FALSE, rf=0.0, 
                        shorts=FALSE)
{
    # Find the best return and its negative
    max.return <- max(colMeans(returns))
    min.return <- -max.return
    
    # construct a sequence of desired returns
    desired.returns <- seq(from=min.return, to=max.return, 
                           length.out=n.portfolios)
    
    ## find the optimal portfolio for each of the desired returns
    adply(desired.returns, .margins=1, get.optimal, 
          returns=dowReturns, riskless=riskless, rf=rf, shorts=shorts)
}
portfolios <- frontier(returns=dowReturns, n.portfolios=50)
g + geom_path(data=portfolios)

Shorts

\[ w_i < 0 \]
\[ \mathbf{w}^T\mathbf{I} = 1 \]

portfoliosShort <- frontier(returns=dowReturns, n.portfolios=50, shorts=TRUE)
g + geom_path(data=portfolios) + geom_path(data=portfoliosShort, color="red")

Risk Free Asset

\[ \mu_{pf} = w\mu_p + (1-w)\mu_f \]
\[ \sigma_{pf} = \sqrt{w^2\sigma_p^2 + (1-w)^2\sigma_f^2} = w\sigma_p \]

portfoliosRiskless <- frontier(returns=dowReturns, n.portfolios=50, 
                               riskless=TRUE, rf=.0005)
g + geom_path(data=portfolios) + 
    geom_path(data=portfoliosRiskless[portfoliosRiskless$Return > 0, ], 
              color="blue")

Pros and Cons

  • Rigorous mathematical foundation
  • Doesn’t account for unexpected shocks

Next Steps

  • GARCH model for variance
  • Capital Asset Pricing Model
  • Sharpe Ratio
  • Value at Risk

Bayesian Regression

Bayes’ Theorem

\[ \text{P}(A|B) = \frac{\text{P}(B|A)\text{P}(A)}{\text{P}(B)} \]

\[ \text{P}(AB) = \text{P}(A)\text{P}(B|A) = \text{P}(B)\text{P}(A|B) \]


\[ \text{P}(A|B) = \frac{\text{P}(B|A)\text{P}(A)}{\text{P}(B)} \]

\[ \pi(\theta|x) = \frac{f(x|\theta)\pi(\theta)}{m(x)} \]

\[ \pi(\theta|x) = \frac{f(x|\theta)\pi(\theta)}{\int_\theta f(x|\theta)\pi(\theta)d\theta} \]
\[ \color{red}{\pi(\theta|x)} = \frac{\color{blue}{f(x|\theta)}\color{green}{\pi(\theta)}}{\color{gray}{\int_\theta f(x|\theta)\pi(\theta)d\theta}} \]
\[ \color{red}{\text{Posterior}} = \frac{\color{blue}{\text{Likelihood}} * \color{green}{\text{Prior}}}{\color{gray}{\text{Normalizing Constant}}} \]
\[ \color{red}{\text{Posterior}} \propto \color{blue}{\text{Likelihood}} * \color{green}{\text{Prior}} \]

Simulation

MCMC

Markov Chain Monte Carlo

STAN

STAN

  • Stanislaw Ulam (1909-–1984)
  • Co-inventor of MCMC
  • Co-inventor of the hydrogen bomb

STAN Interfaces

  • RStan
  • PyStan
  • CmdStan

Start Simple

Census Data

  FamilyIncome  FamilyType NumBedrooms NumChildren NumPeople NumRooms
1          150     Married           4           1         3        9
2          180 Female Head           3           2         4        6
3          280 Female Head           4           0         2        8
4          330 Female Head           2           1         2        4
5          330   Male Head           3           1         2        5
6          480   Male Head           0           3         4        1
  NumVehicles NumWorkers  OwnRent   YearBuilt HouseCosts ElectricBill
1           1          0 Mortgage   1950-1959       1800           90
2           2          0   Rented Before 1939        850           90
3           3          1 Mortgage   2000-2004       2600          260
4           1          0   Rented   1950-1959       1800          140
5           1          0 Mortgage Before 1939        860          150
6           0          0   Rented Before 1939        700          140
  FoodStamp HeatingFuel Insurance
1        No         Gas      2500
2        No         Oil         0
3        No         Oil      6600
4        No         Oil         0
5        No         Gas       660
6        No         Gas         0
acs <- read.table("http://www.jaredlander.com/data/acs_ny.csv", 
                      sep=",", header=TRUE, stringsAsFactors=FALSE)
acs <- acs[, c("FamilyIncome", "FamilyType", "NumBedrooms", 
               "NumChildren", "NumPeople", "NumRooms", "NumVehicles", 
               "NumWorkers", "OwnRent", "YearBuilt", "HouseCosts", 
               "ElectricBill", "FoodStamp", "HeatingFuel", "Insurance")]
head(acs)
  FamilyIncome  FamilyType NumBedrooms NumChildren NumPeople NumRooms
1          150     Married           4           1         3        9
2          180 Female Head           3           2         4        6
3          280 Female Head           4           0         2        8
4          330 Female Head           2           1         2        4
5          330   Male Head           3           1         2        5
6          480   Male Head           0           3         4        1
  NumVehicles NumWorkers  OwnRent   YearBuilt HouseCosts ElectricBill
1           1          0 Mortgage   1950-1959       1800           90
2           2          0   Rented Before 1939        850           90
3           3          1 Mortgage   2000-2004       2600          260
4           1          0   Rented   1950-1959       1800          140
5           1          0 Mortgage Before 1939        860          150
6           0          0   Rented Before 1939        700          140
  FoodStamp HeatingFuel Insurance
1        No         Gas      2500
2        No         Oil         0
3        No         Oil      6600
4        No         Oil         0
5        No         Gas       660
6        No         Gas         0

Regression

acs1 <- lm(FamilyIncome ~ NumRooms + NumVehicles + NumWorkers, data=acs)
require(coefplot)
coefplot(acs1)

Bayesian Regression

acsB1 <- arm::bayesglm(FamilyIncome ~ NumRooms + NumVehicles + NumWorkers, 
                       data=acs, family=gaussian(), prior.scale=2.5, prior.df=1)
coefplot(acsB1)

No Real Change

Voter Data

  Year       Vote Age Gender  Race                             Education
1 1948   democrat  NA   male white     grade school of less (0-8 grades)
2 1948 republican  NA female white high school (12 grades or fewer, incl
3 1948   democrat  NA female white high school (12 grades or fewer, incl
4 1948 republican  NA female white some college(13 grades or more,but no
5 1948   democrat  NA   male white some college(13 grades or more,but no
6 1948 republican  NA female white high school (12 grades or fewer, incl
                Income                  Religion
1  34 to 67 percentile                protestant
2 96 to 100 percentile                protestant
3  68 to 95 percentile catholic (roman catholic)
4 96 to 100 percentile                protestant
5  68 to 95 percentile catholic (roman catholic)
6 96 to 100 percentile                protestant
load(url("http://www.jaredlander.com/data/ideo.rdata"))
head(ideo)
  Year       Vote Age Gender  Race                             Education
1 1948   democrat  NA   male white     grade school of less (0-8 grades)
2 1948 republican  NA female white high school (12 grades or fewer, incl
3 1948   democrat  NA female white high school (12 grades or fewer, incl
4 1948 republican  NA female white some college(13 grades or more,but no
5 1948   democrat  NA   male white some college(13 grades or more,but no
6 1948 republican  NA female white high school (12 grades or fewer, incl
                Income                  Religion
1  34 to 67 percentile                protestant
2 96 to 100 percentile                protestant
3  68 to 95 percentile catholic (roman catholic)
4 96 to 100 percentile                protestant
5  68 to 95 percentile catholic (roman catholic)
6 96 to 100 percentile                protestant

Secret Weapon

theYears <- as.character(unique(ideo$Year))

# preallocating the object makes the code run faster
results <- vector(mode="list", length=length(theYears))
# give good names to the list
names(results) <- theYears

# fit a model on the subset of data for each year
for(i in theYears)
{
    results[[i]] <- glm(Vote ~ Race + Income + Gender + Education, 
                        data=ideo, subset=Year==i, 
                        family=binomial(link="logit"))
}
multiplot(results, coefficients="Raceblack", secret.weapon=TRUE) + 
    coord_flip(xlim=c(-20, 10))

1964

Weakly Informative Priors

resultsB <- vector(mode="list", length=length(theYears))
names(resultsB) <- theYears

for(i in theYears)
{
    # fit model with Cauchy priors with a scale of 2.5
    resultsB[[i]] <- 
        arm::bayesglm(Vote ~ Race + Income + Gender + Education, 
                      data=ideo[ideo$Year==i, ],
                      family=binomial(link="logit"), 
                      prior.scale=2.5, prior.df=1)
}
multiplot(resultsB, coefficients="Raceblack", secret.weapon=TRUE)

qplot(x=rcauchy(1000, scale = 2.5), geom="density", main="Cauchy Distribution")
qplot(x=rt(1000, df = 1), geom="density", main="t Distribution")

multiplot(results, coefficients="Raceblack", secret.weapon=TRUE) + 
    coord_flip(xlim=c(-20, 10))
multiplot(resultsB, coefficients="Raceblack", secret.weapon=TRUE)

STAN

Compiles Model in C++

No U-Turn Sampler

Blocks

data
{
}

parameters
{
}

model
{
}

data
{
}

transformed data
{
}

parameters
{
}

transformed parameters
{
}

model
{
}

generated quantities
{
}

Data

Census

  FamilyIncome  FamilyType NumBedrooms NumChildren NumPeople NumRooms
1          150     Married           4           1         3        9
2          180 Female Head           3           2         4        6
3          280 Female Head           4           0         2        8
4          330 Female Head           2           1         2        4
5          330   Male Head           3           1         2        5
6          480   Male Head           0           3         4        1
  NumVehicles NumWorkers  OwnRent   YearBuilt HouseCosts ElectricBill FoodStamp
1           1          0 Mortgage   1950-1959       1800           90        No
2           2          0   Rented Before 1939        850           90        No
3           3          1 Mortgage   2000-2004       2600          260        No
4           1          0   Rented   1950-1959       1800          140        No
5           1          0 Mortgage Before 1939        860          150        No
6           0          0   Rented Before 1939        700          140        No
  HeatingFuel Insurance
1         Gas      2500
2         Oil         0
3         Oil      6600
4         Oil         0
5         Gas       660
6         Gas         0

22,745 Rows

Data Block

data
{
    # number of data points
    int<lower=0> N;
    
    # variable for response of length N containing real values
    vector[N] y;

    # variable for x1 length N containing real values
    vector[N] x1;
}

Standardized Data

\[ y' = \frac{y - \bar{y}}{\text{sd}(y)} \]
\[ x' = \frac{x - \bar{x}}{\text{sd}(x)} \]

Transformed Data Block

transformed data
{
    vector[N] y_std;    # Response
    vector[N] x1_std;   # Predictor

    y_std  <- (y - mean(y)) / sd(y);
    x1_std <- (x1 - mean(x1)) / sd(x1);
}

Parameters Block

parameters
{
    # establish there is an intercept
    real alpha_std;

    # establish slope
    real beta1_std;

    # make sigma parameter
    real<lower=0> sigma_std;
}

The Model

\[ y \tilde{} \text{N}(\alpha + \beta x, \sigma) \]
\[ \sigma\tilde{} \text{cauchy}(0, 5) \]
\[ \alpha \tilde{} \text{N}(0, 10) \]
\[ \beta \tilde{} \text{N}(0, 10) \]

Model Block

model
{
    alpha_std ~ normal(0, 10);
    beta1_std ~ normal(0, 10);

    sigma_std ~ cauchy(0, 5);

    y_std ~ normal(alpha_std + beta1_std*x1_std, sigma_std);
}

Back Transform Coefficients

\[ \alpha = sd(y)*\left(\alpha' - \beta'\frac{\bar{x}}{\text{sd}(x)}\right) + \bar{y} \]
\[ \beta = \beta'\frac{\text{sd}(y)}{\text{sd}(x)} \]
\[ \sigma = \text{sd}(y)\sigma' \]
\[ \nu = z(u) = \frac{u-\bar{u}}{\text{sd}(u)} \]
\[ u = z^{-1}(\nu) = \text{sd}(u)\nu + \bar{u} \]
\[ \begin{eqnarray} y_n &=& z_y^{-1}(z_y(y_n)) \nonumber \\ &=& z_y^{-1}\left( \alpha' + \beta'z_x(x_n) + \epsilon_n' \right) \nonumber \\ &=& z_y^{-1}\left( \alpha' + \beta'\left(\frac{x_n-\bar{x}}{\text{sd}(x)}\right) + \epsilon_n' \right) \nonumber \\ &=& \text{sd}(y)\left( \alpha' + \beta'\left(\frac{x_n-\bar{x}}{\text{sd}(x)}\right) + \epsilon_n' \right) + \bar{y} \nonumber \\ &=& \left( \text{sd}(y)\left( \alpha' - \beta'\frac{\bar{x}}{\text{sd}(x)} \right) + \bar{y} \right) + \left( \beta'\frac{\text{sd}(y)}{\text{sd}(x)}\right)x_n + \text{sd}(y)\epsilon_n' \nonumber \end{eqnarray} \]

Generated Quantities Block

generated quantities
{
    real alpha;
    real beta1;
    real<lower=0> sigma;

    # transform back the intercept
    alpha <- sd(y)*(alpha_std - beta1_std*(mean(x1)/sd(x1))) + mean(y);
    
    # transform back the slope
    beta1 <- beta1_std*sd(y)/sd(x1);

    # transform back the standard deviation
    sigma <- sd(y)*sigma_std;
}

All Together

.stan File

data
{
    int<lower=0> N;
    ...
}

transformed data
{
    vector[N] y_std;    # Response
    ...
}

parameters
{
    real alpha_std;
    ...
}

model
{
    alpha_std ~ normal(0, 10);
    ...
}

generated quantities
{
    real alpha;
    ...
}

Compile the Model

With Python

Or R

Or the Command Line

Let’s Go with R

RStan

require(rstan)
acsStan <- stan(file="GenericRegression.stan", 
                data=list(y=acs$FamilyIncome, 
                          x1=acs$NumWorkers, N=NROW(acs)))

Inference for Stan model: GenericRegression.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

              mean se_mean     sd     2.5%      25%      50%      75%    97.5%
alpha_std      0.0     0.0    0.0      0.0      0.0      0.0      0.0      0.0
beta1_std      0.2     0.0    0.0      0.2      0.2      0.2      0.2      0.2
sigma_std      1.0     0.0    0.0      1.0      1.0      1.0      1.0      1.0
alpha      61219.2    31.7 1507.4  58208.2  60186.4  61259.6  62285.1  64045.0
beta1      28112.6    16.2  779.3  26649.3  27572.9  28098.7  28648.2  29641.4
sigma      97769.6     8.6  464.8  96840.1  97464.7  97767.5  98083.3  98669.7
lp__      -10753.7     0.0    1.2 -10756.6 -10754.3 -10753.4 -10752.8 -10752.3
          n_eff Rhat
alpha_std  2670    1
beta1_std  2313    1
sigma_std  2907    1
alpha      2265    1
beta1      2313    1
sigma      2907    1
lp__       1368    1

Samples were drawn using NUTS(diag_e) at Thu Mar 13 13:27:06 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Getting information out of a table is like getting sunshine out of a cucumber.

-Farquhar & Farquhar (1891)

-Serge Belongie

Visualize

Coefplot

acsSumm <- as.data.frame(rstan:::summary_sim(acsStan@sim)$msd)
acsSumm$Parameter <- rownames(acsSumm)
ggplot(acsSumm[4:5, ], aes(x=mean, y=Parameter)) + 
    geom_errorbarh(aes(xmin=mean-2*sd, xmax=mean+2*sd), 
                   height=0, color="blue") + 
    geom_errorbarh(aes(xmin=mean-1*sd, xmax=mean+1*sd), 
                   height=0, size=1, color="blue") + 
    geom_point(, color="blue") + 
    geom_vline(xintercept=0, linetype=2, color="grey") + 
    scale_x_continuous(label=scales::comma)

Pairs Plot

pairs(acsStan)

Traceplot

traceplot(acsStan)

Advantages

Parameter Distributions

acsSims <- extract(acsStan, pars=dimnames(acsStan)[[3]], permuted=FALSE)
acsSims <- as.data.frame(apply(acsSims, MARGIN="parameters", FUN=function(y) y))
require(reshape2)
simsMelt <- melt(acsSims[, c("alpha", "beta1", "sigma")], 
                 variable.name="Parameter", value.name="Estimate")
require(ggplot2)
require(scales)
ggplot(simsMelt, aes(x=Estimate, fill=Parameter, color=Parameter)) + 
    geom_density(alpha=1/2) + facet_wrap(~Parameter, scales="free_x") + 
    theme(axis.text.x=element_text(angle=90, hjust=1, vjust=.5)) + 
    scale_x_continuous(label=comma) + scale_color_discrete(guide=FALSE) + 
    scale_fill_discrete(guide=FALSE)

Multilevel Models

require(lme4)
acsML <- lmer(FamilyIncome ~ (1 + NumWorkers | OwnRent), data=acs)
acsPred <- cbind(acs, Fitted=predict(acsML, acs))
ggplot(acsPred, aes(x=NumWorkers, y=Fitted)) + geom_jitter(alpha=1/2) + 
    facet_wrap(~OwnRent)

Reduce Variance

Disadvantages

stan Runtime ~ 4 Minutes
lm Runtime ~ .03 Seconds

Further Steps

  • Test model with simulation
  • Cross-validation
  • Parallel chains
  • More advanced models

Jared P. Lander

The Tools