Thanks to Rachel Schutt, who I’m teaching with at Columbia, and Cathy O’Neil from MathBabe I had the opportunity to go on TV and talk about the statistics of tonight’s Powerball lottery.

There’s an article with a brief quote from me and a video where I may a very quick appearance at the 1:14 mark.  My interview during the live broadcast actually went on for about three minutes but I can’t find that online.  If I can transfer the video from my DVR, I’ll post that too.

In the longer interview I discussed the probability of winning and the expected value of a given ticket and other such statistical nuggets.  In particular I broke down how choosing numbers based on birthdays eliminates any number higher than 31 mean you are missing out on 28 of the 59 possible numbers that are uniformly distributed.  Hopefully I’ll find that longer cut.

The video can be found here:  Video

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.

Distribution of Lottery Winners based on 1,000 Simulations

With tonight’s Mega Millions jackpot estimated to be over $640 million there are long lines of people waiting to buy tickets.  Of course you always hear about the probability of winning which is easy enough to calculate:  Five numbers ranging from 1 through 56 are drawn (without replacement) then a sixth ball is pulled from a set of 1 through 46.  That means there are choose(56, 5) * 46 = 175,711,536 possible different combinations.  That is why people are constantly reminded of how unlikely they are to win.

But I want to see how likely it is that SOMEONE will win tonight.  So let’s break out R and ggplot!

As of this afternoon it was reported (sorry no source) that two tickets were sold for every American.  So let’s assume that each of these tickets is an independent Bernoulli trial with probability of success of 1/175,711,536.

Running 1,000 simulations we see the distribution of the number of winners in the histogram above.

So we shouldn’t be surprised if there are multiple winners tonight.

The R code:

winners <- rbinom(n=1000, size=600000000, prob=1/175000000)
qplot(winners, geom="histogram", binwidth=1, xlab="Number of Winners")

Related Posts



Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.