A friend recently posted the following the problem:
There are 10 green balls, 20 red balls, and 25 blues balls in a a jar. I choose a ball at random. If I choose a green then I take out all the green balls, if i choose a red ball then i take out all the red balls, and if I choose, a blue ball I take out all the blue balls, What is the probability that I will choose a red ball on my second try?
The math works out fairly easily. It’s the probability of first drawing a green ball AND then drawing a red ball, OR the probability of drawing a blue ball AND then drawing a red ball.
\[
\frac{10}{10+20+25} * \frac{20}{20+25} + \frac{25}{10+20+25} * \frac{20}{10+20} = 0.3838
\]
But I always prefer simulations over probability so let’s break out the R code like we did for the Monty Hall Problem and calculating lottery odds. The results are after the break.
First, let’s create a vector containing our ball counts.
balls <- c(green = 10, red = 20, blue = 25)
Then let’s build functions for doing the sampling. This could have all been done in one function, but compartmentalized code will be more reusable.
# Draws one ball and returns the chosen ball
pick.ball <- function(balls) {
sample(x = names(balls), size = 1, prob = balls)
}
# Draws a ball then returns the vector of remaining balls.
first.draw <- function(balls) {
theDraw <- pick.ball(balls = balls)
balls[names(balls) != theDraw]
}
# Draws the first ball, reducing the vector, then returns the color of the
# second draw
experiment <- function(balls) {
balls <- first.draw(balls = balls)
pick.ball(balls = balls)
}
We repeat the experiment 100,000 times.
outcomes <- replicate(n = 1e+05, expr = experiment(balls = balls), simplify = TRUE)
crossTab/NROW(outcomes)
## outcomes
## blue green red
## 0.3632 0.2539 0.3829
And we see that red comes out 38285 times for a probability of 0.3829.
The simulation is a little off from the math but pretty much in the same ballpark.
Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone.
Leave a Reply