A friend recently posted the following the problem:

There are 10 green balls, 20 red balls, and 25 blues balls in a a jar. I choose a ball at random. If I choose a green then I take out all the green balls, if i choose a red ball then i take out all the red balls, and if I choose, a blue ball I take out all the blue balls, What is the probability that I will choose a red ball on my second try?

The math works out fairly easily. It’s the probability of first drawing a green ball AND then drawing a red ball, OR the probability of drawing a blue ball AND then drawing a red ball.

$\frac{10}{10+20+25} * \frac{20}{20+25} + \frac{25}{10+20+25} * \frac{20}{10+20} = 0.3838$

But I always prefer simulations over probability so let’s break out the R code like we did for the Monty Hall Problem and calculating lottery odds.  The results are after the break.

First, let’s create a vector containing our ball counts.

balls <- c(green = 10, red = 20, blue = 25)


Then let’s build functions for doing the sampling. This could have all been done in one function, but compartmentalized code will be more reusable.

# Draws one ball and returns the chosen ball
pick.ball <- function(balls) {
sample(x = names(balls), size = 1, prob = balls)
}

# Draws a ball then returns the vector of remaining balls.
first.draw <- function(balls) {
theDraw <- pick.ball(balls = balls)
balls[names(balls) != theDraw]
}

# Draws the first ball, reducing the vector, then returns the color of the
# second draw
experiment <- function(balls) {
balls <- first.draw(balls = balls)
pick.ball(balls = balls)
}


We repeat the experiment 100,000 times.

outcomes <- replicate(n = 1e+05, expr = experiment(balls = balls), simplify = TRUE)
crossTab/NROW(outcomes)

## outcomes
##   blue  green    red
## 0.3632 0.2539 0.3829


And we see that red comes out 38285 times for a probability of 0.3829.

The simulation is a little off from the math but pretty much in the same ballpark.  Jared Lander is the Chief Data Scientist of Lander Analytics a New York data science firm, Adjunct Professor at Columbia University, Organizer of the New York Open Statistical Programming meetup and the New York and Washington DC R Conferences and author of R for Everyone. 