Anomaly Detection in R

Jared P. Lander
Tibco Financial Services Conference

May 2, 2013

Motivation

Currencies

More than 160 World Currencies
More than 160 World Currencies
12,720 Possible Exchange Rates

Stocks

About 3,200 Stocks on NYSE & 2,600 Stocks on NASDAQ
About 3,200 Stocks on NYSE & 2,600 Stocks on NASDAQ

Commodities

Thousands of Commodities
Thousands of Commodities

Commodities

Frozen Concentrated Orange Juice
Including Frozen Concentrated Orange Juice

Problems

The Solution

Advanced Analytics

About R

S

chambers
John Chambers * Bell Labs * 1976

S+

R

gentleman ihaka
Robert Gentleman & Ross Ihaka * University of Auckland, New Zealand * 1993

R

R Community

Finding Anomalies

Based on a True Story


The story you are about to see is true. The names have been changed to protect the innocent.

AT&T Stock

Old Fashioned Ways

Better Way

Fit a Model on Historical Data!

Appropriate History

ARIMA



    X_t - \Phi_1 X_{t-1} - \cdots - \Phi_p X_{t-p} = Z_t + \theta_1 Z_{t-1} + \cdots + \theta_q Z_{t-q}



    Z_t \sim \text{WN}(0, \sigma^2)

AT&T ARIMA Model

## Series: ts(attClose) 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##         ar1  intercept
##       0.995     32.622
## s.e.  0.004      2.279
## 
## sigma^2 estimated as 0.11:  log likelihood=-187.1
## AIC=380.2   AICc=380.3   BIC=393.4

Forecast 10 Days Out

Actual Results

GARCH



    \epsilon_t = \sigma_t e_t


    \sigma_t^2 = \alpha_0 + \alpha_1\epsilon_{t-1}^2 + \dots + \alpha_m\epsilon_{t-m}^2 + \beta_1\sigma_{t-1}^2 + \dots + \beta_s\sigma_{t-s}^2


    e \sim \text{GWN}(0,1)

Vector Autoregressive



    \mathbf{X}_t = \Phi_1 \mathbf{X}_{t-1} + \dots + \Phi_p \mathbf{X}_{t-p} + \mathbf{Z}_t


    \{\mathbf{Z}_t\} \sim \text{WN}(\mathbf{0}, \mathbf{\Sigma})

Other Options

Time Series

Regression

What Next?

Spotfire Dashboard

spotfire-dashboard

Spotfire R Interface

spotfire-r

Conclusions

Conclusions

FTW

one-dollar

About Me

Jared P. Lander

Data Scientist
Adjunct Professor at Columbia University
Organizer of New York Open Statistical Programming Meetup
Author of R for Non-Statisticans (August 2013)

Twitter: @jaredlander
Website: http://www.jaredlander.com

The Tools