Thanks to Rachel Schutt, who I’m teaching with at Columbia, and Cathy O’Neil from MathBabe I had the opportunity to go on TV and talk about the statistics of tonight’s Powerball lottery.
There’s an article with a brief quote from me and a video where I may a very quick appearance at the 1:14 mark. My interview during the live broadcast actually went on for about three minutes but I can’t find that online. If I can transfer the video from my DVR, I’ll post that too.
In the longer interview I discussed the probability of winning and the expected value of a given ticket and other such statistical nuggets. In particular I broke down how choosing numbers based on birthdays eliminates any number higher than 31 mean you are missing out on 28 of the 59 possible numbers that are uniformly distributed. Hopefully I’ll find that longer cut.
A friend of mine has told me on numerous occasions that since 1960 the Yankees have not won a World Series while a Republican was President. Upon hearing this my Republican friends (both Yankee and Red Sox fans) turn incredulous and say that this is ridiculous. So I decided to investigate. To be clear this is in no way shows causality, but just checks the numbers.
The plot above shows every Yankee win (and loss) since 1960 and the party of the President at the time. It is clear to see that all nine Yankees World Series wins came while a Democrat inhabited the White House. The fluctuation plot below shows Yankee wins both before and after 1960 and the complete lack of a block for Republican/Post-1960 simply makes the case.
There are similar plots for the American League after the jump.
With tonight’s Mega Millions jackpot estimated to be over $640 million there are long lines of people waiting to buy tickets. Of course you always hear about the probability of winning which is easy enough to calculate: Five numbers ranging from 1 through 56 are drawn (without replacement) then a sixth ball is pulled from a set of 1 through 46. That means there are choose(56, 5) * 46 = 175,711,536 possible different combinations. That is why people are constantly reminded of how unlikely they are to win.
But I want to see how likely it is that SOMEONE will win tonight. So let’s break out R and ggplot!
As of this afternoon it was reported (sorry no source) that two tickets were sold for every American. So let’s assume that each of these tickets is an independent Bernoulli trial with probability of success of 1/175,711,536.
Running 1,000 simulations we see the distribution of the number of winners in the histogram above.
So we shouldn’t be surprised if there are multiple winners tonight.
Shortly after the Giantsfantasticdefeat of the Patriots in Super Bowl XLVI (I was a little disappointed that Eli, Coughlin and the Vince Lombardi Trophy all got off the parade route early and the views of City Hall were obstructed by construction trailers, but Steve Weatherford was awesome as always) a friend asked me to settle a debate amongst some people in a Super Bowl pool.
We have 10 participants in a superbowl pool. The pool is a “pick the player who scores first” type pool. In a hat, there are 10 Giants players. Each participant picks 1 player out of the hat (in no particular order) until the hat is emptied. Then 10 Patriots players go in the hat and each participant picks again.
In the end, each of the 10 participants has 1 Giants player and 1 Patriots player. No one has any duplicate players as 10 different players from each team were selected. Pool looks as follows:
Winners = First Player to score wins half the pot. First player to score in 2nd half wins the remaining half of the pot.
The question is, what are the odds that someone wins Both the 1st and 2nd half. Remember, the picks were random.
Before anyone asks about the safety, one of the slots was for Special Teams/Defense.
There are two probabilistic ways of thinking about this. Both hinge on the fact that whoever scores first in each half is both independent and not mutually exclusive.
First, let’s look at the two halves individually. In a given half any of 20 players can score first (10 from the Giants and 10 from the Patriots) and an individual participant can win with two of those. So a participant has a 2/20 = 1/10 chance of winning a half. Thus that participant has a (1/10) * (1/10) = 1/100 chance of winning both halves. Since there are 10 participants there is an overall probability of 10 * (1/100) = 1/10 of any single participant winning both halves.
The other way is to think a little more combinatorically. There are 20 * 20 = 400 different combinations of players scoring first in each half. A participant has two players which are each valid for each half giving them four of the possible combinations leading to a 4 / 400 = 1/100 probability that a single participant will win both halves. Again, there are 10 participants giving an overall 10% chance of any one participant winning both halves.
Since both methods agreed I am pretty confidant in the results, but just in case I ran some simulations in R which you can find after the break.