My time slot was up against Nate Silver so I didn’t expect many people to attend. Much to my surprise when I entered the room every seat was taken, people were lining the walls and sitting in the aisles.
My presentation, which was unrelated to the work I did, analyzed the Giants’ probability of passing versus rushing and the probability of which receiver was targeted. It is available at the talks section of my site.
Visually, we see that until 2011 the Giants preferred to run on first and second down. Third down is usually a do-or-die down so passes will dominate on third-and-long. The grey vertical lines mark Super Bowls XLII and XLVI.
Shortly after the Giantsfantasticdefeat of the Patriots in Super Bowl XLVI (I was a little disappointed that Eli, Coughlin and the Vince Lombardi Trophy all got off the parade route early and the views of City Hall were obstructed by construction trailers, but Steve Weatherford was awesome as always) a friend asked me to settle a debate amongst some people in a Super Bowl pool.
We have 10 participants in a superbowl pool. The pool is a “pick the player who scores first” type pool. In a hat, there are 10 Giants players. Each participant picks 1 player out of the hat (in no particular order) until the hat is emptied. Then 10 Patriots players go in the hat and each participant picks again.
In the end, each of the 10 participants has 1 Giants player and 1 Patriots player. No one has any duplicate players as 10 different players from each team were selected. Pool looks as follows:
Winners = First Player to score wins half the pot. First player to score in 2nd half wins the remaining half of the pot.
The question is, what are the odds that someone wins Both the 1st and 2nd half. Remember, the picks were random.
Before anyone asks about the safety, one of the slots was for Special Teams/Defense.
There are two probabilistic ways of thinking about this. Both hinge on the fact that whoever scores first in each half is both independent and not mutually exclusive.
First, let’s look at the two halves individually. In a given half any of 20 players can score first (10 from the Giants and 10 from the Patriots) and an individual participant can win with two of those. So a participant has a 2/20 = 1/10 chance of winning a half. Thus that participant has a (1/10) * (1/10) = 1/100 chance of winning both halves. Since there are 10 participants there is an overall probability of 10 * (1/100) = 1/10 of any single participant winning both halves.
The other way is to think a little more combinatorically. There are 20 * 20 = 400 different combinations of players scoring first in each half. A participant has two players which are each valid for each half giving them four of the possible combinations leading to a 4 / 400 = 1/100 probability that a single participant will win both halves. Again, there are 10 participants giving an overall 10% chance of any one participant winning both halves.
Since both methods agreed I am pretty confidant in the results, but just in case I ran some simulations in R which you can find after the break.
With the Super Bowl only hours away now is your last chance to buy your boxes. Assuming the last digits are not assigned randomly you can maximize your chances with a little analysis. While I’ve seen plenty of sites giving the raw numbers, I thought a little visualization was in order.
In the graph above (made using ggplot2 in R, of course) the bigger squares represent greater frequency. The axes are labelled “Home” and “Away” for orientation, but in the Super Bowl that probably doesn’t matter too much, especially considering that Indianapolis is (Peyton) Manning territory so the locals will most likely be rooting for the Giants. Further, I believe Super Bowl XLII, featuring the same two teams, had a disproportionate number of Giants fans. Bias disclaimer: GO BIG BLUE!!!
Below is the same graph broken down by year to see how the distribution has changed over the past 20 years.
I’m a few days behind on my posts, so please excuse my tardiness and the slew of posts that should be forthcoming.
A-Rod finally reached 600 homeruns a couple weeks ago. While that may have relieved pressure on him, now people are looking toward Jeter’s 3,000th hit. The Wall Street Journal ran a piece predictingthat Jeter should hit the 3,000 mark around June 6th next year.
They looked at his historical numbers and took into account the 27 other players to hit that number and determined that Jeter should get a hit every 3.66 at-bats next season. I’m not sure what method they used to calculate 3.66, but I would guess some sort of simple average. Then, based on how many hits he needs (128 at the time of the article), his average number of at-bats per game, the average number of games he plays a season and the Yankees typical schedule, they determined the June 6th date.
I don’t really have much to add other than that this seems like a solid method. What do the sabermetricians think? By the way, that looks like an awesome cast.