A friend recently posted the following the problem:
There are 10 green balls, 20 red balls, and 25 blues balls in a a jar. I choose a ball at random. If I choose a green then I take out all the green balls, if i choose a red ball then i take out all the red balls, and if I choose, a blue ball I take out all the blue balls, What is the probability that I will choose a red ball on my second try?
The math works out fairly easily. It’s the probability of first drawing a green ball AND then drawing a red ball, OR the probability of drawing a blue ball AND then drawing a red ball.
\[
\frac{10}{10+20+25} * \frac{20}{20+25} + \frac{25}{10+20+25} * \frac{20}{10+20} = 0.3838
\]
But I always prefer simulations over probability so let’s break out the R code like we did for the Monty Hall Problem and calculating lottery odds. The results are after the break.
First, let’s create a vector containing our ball counts.
balls <- c(green = 10, red = 20, blue = 25)
Then let’s build functions for doing the sampling. This could have all been done in one function, but compartmentalized code will be more reusable.
# Draws one ball and returns the chosen ball
pick.ball <- function(balls) {
sample(x = names(balls), size = 1, prob = balls)
}
# Draws a ball then returns the vector of remaining balls.
first.draw <- function(balls) {
theDraw <- pick.ball(balls = balls)
balls[names(balls) != theDraw]
}
# Draws the first ball, reducing the vector, then returns the color of the
# second draw
experiment <- function(balls) {
balls <- first.draw(balls = balls)
pick.ball(balls = balls)
}
We repeat the experiment 100,000 times.
outcomes <- replicate(n = 1e+05, expr = experiment(balls = balls), simplify = TRUE)
crossTab/NROW(outcomes)
## outcomes
## blue green red
## 0.3632 0.2539 0.3829
And we see that red comes out 38285 times for a probability of 0.3829.
The simulation is a little off from the math but pretty much in the same ballpark.
Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science and AI firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Government Data Science and AI Conferences and author of R for Everyone.
Leave a Reply