Last night we celebrated Rounded Pi Day by rounding at the 10,000’s digit to get 3.1416 which nicely works with the date 3/14/16.  This was great after Mega Pi Day worked out so perfectly last year.  And this all built upon previous years’ celebrations.

We ate a large quantity of pizza at Lombardi’s. and for the second year in a row we got the Pi Cake from Empire Cakes with peanut butter and chocolate flavors.  The base was inscribed with historic approximations of Pi:  25/8, 256/81, 339/108, 223/71, 377/120, 3927/1250, 355/113, 62832/20000, 22/7.

Some pictures from the fantastic night:

IMG_20160314_193523 IMG_20160314_203411 IMG_20160314_203443

Previous year’s Pi Cakes:


Earlier this week, my company, Lander Analytics, organized our first public Bayesian short course, taught by Andrew Gelman, Bob Carpenter and Daniel Lee.  Needless to say the class sold out very quickly and left a long wait list.  So we will schedule another public training (exactly when tbd) and will make the same course available for private training.

This was the first time we utilized three instructors (as opposed to a main instructor and assistants which we often use for large classes) and it led to an amazing dynamic.  Bob laid the theoretical foundation for Markov chain Monte Carlo (MCMC), explaining both with math and geometry, and discussed the computational considerations of performing simulation draws.  Daniel led the participants through hands-on examples with Stan, covering everything from how to describe a model, to efficient computation to debugging.  Andrew gave his usual, crowd dazzling performance use previous work as case studies of when and how to use Bayesian methods.

It was an intensive three days of training with an incredible amount of information.  Everyone walked away knowing a lot more about Bayes, MCMC and Stan and eager to try out their new skills, and an autographed copy of Andrew’s book, BDA3.

A big help, as always was Daniel Chen who put in so much effort making the class run smoothly from securing the space, physically moving furniture and running all the technology.

Pi Cake 2015
This year we celebrated Mega Pi Day with the date (3/14/15) covering the first four digits of Pi. And of course, we unveiled the Pi Cake at 9:26 to get the next three digits.  This year the cake came from Empire Cakes and was peanut butter flavored.  We even had the bakery put as many digits as would fit around the cake.

A large group from the NYC Data Mafia came out and Scott Wiener of Scott’s Pizza Tours ensured we had the perfect assortment and quantity of pizza.


A look at Pi Cakes from previous years:

Michael Malecki recently shared a link to a Business Insider article that discussed the Monty Hall Problem.

The problem starts with three doors, one of which has a car and two of which have a goat. You choose one door at random and then the host reveals one door (not the one you chose) that holds a goat. You can then choose to stick with your door or choose the third, remaining door.

Probability theory states that people who switch win the car two-thirds of the time and those who don’t switch only win one-third of time.

But people often still do not believe they should switch based on the probability argument alone. So let’s run some simulations.

This function randomly assigns goats and cars behind three doors, chooses a door at random, reveals a goat door, then either switches doors or does not.

monty <- function(switch=TRUE)
    # randomly assign goats and cars
    doors <- sample(x=c("Car", "Goat", "Goat"), size=3, replace=FALSE)

    # randomly choose a door
    doorChoice <- sample(1:3, size=1)

    # get goat doors
    goatDoors <- which(doors == "Goat")
    # show a door with a goat
    goatDoor <- goatDoors[which(goatDoors != doorChoice)][1]

        # if we are switching choose the other remaining door
        return(doors[-c(doorChoice, goatDoor)])
        # otherwise keep the current door

Now we simulate switching 10,000 times and not switching 10,0000 times

withSwitching <- replicate(n = 10000, expr = monty(switch = TRUE), simplify = TRUE)
withoutSwitching <- replicate(n = 10000, expr = monty(switch = FALSE), simplify = TRUE)

## [1] "Goat" "Car"  "Car"  "Goat" "Car"  "Goat"
## [1] "Goat" "Car"  "Car"  "Car"  "Car"  "Car"

mean(withSwitching == "Car")
## [1] 0.6678
mean(withoutSwitching == "Car")
## [1] 0.3408

Plotting the results really shows the difference.

## Loading required package: ggplot2
## Loading required package: scales
qplot(withSwitching, geom = "bar", fill = withSwitching) + scale_fill_manual("Prize", 
    values = c(Car = muted("blue"), Goat = "orange")) + xlab("Switch") + ggtitle("Monty Hall with Switching")

qplot(withoutSwitching, geom = "bar", fill = withoutSwitching) + scale_fill_manual("Prize", 
    values = c(Car = muted("blue"), Goat = "orange")) + xlab("Don't Switch") + 
    ggtitle("Monty Hall without Switching")

(How are these colors? I’m trying out some new combinations.)

This clearly shows that switching is the best strategy.

The New York Times has a nice simulator that lets you play with actual doors.

Distribution of Lottery Winners based on 1,000 Simulations

With tonight’s Mega Millions jackpot estimated to be over $640 million there are long lines of people waiting to buy tickets.  Of course you always hear about the probability of winning which is easy enough to calculate:  Five numbers ranging from 1 through 56 are drawn (without replacement) then a sixth ball is pulled from a set of 1 through 46.  That means there are choose(56, 5) * 46 = 175,711,536 possible different combinations.  That is why people are constantly reminded of how unlikely they are to win.

But I want to see how likely it is that SOMEONE will win tonight.  So let’s break out R and ggplot!

As of this afternoon it was reported (sorry no source) that two tickets were sold for every American.  So let’s assume that each of these tickets is an independent Bernoulli trial with probability of success of 1/175,711,536.

Running 1,000 simulations we see the distribution of the number of winners in the histogram above.

So we shouldn’t be surprised if there are multiple winners tonight.

The R code:

winners <- rbinom(n=1000, size=600000000, prob=1/175000000)
qplot(winners, geom="histogram", binwidth=1, xlab="Number of Winners")