Last week, for pedagogical reasons, I had to solve the following problem:
In a bag there are
blue and red balls. We randomly take balls from the bag without replacement. Let the random variable denote the number of blue balls we get. Find out the expectation and variance of .
The problem is quite a simple one. For finding the expectation, we can use the definition
We can also compute
Then we get the variance using the formula
It is a straightforward exercise to calculate the probabilities. I got
I wanted to double-check the values. Instead of retracing the steps as a sane person would have done2, I wrote a small program to simulate the experiment.
The Simulation
Here is the class RedBlue which defines the essential methods:
import random as rand
from enum import Enum
class Color(Enum):
RED = 1
BLUE = 2
class RedBlue:
def __init__(self, num_reds, num_blues):
self.num_reds = num_reds
self.num_blues = num_blues
self.bag = [Color.RED] * num_reds + [Color.BLUE] * num_blues
def prob_blues(self, num_picks, num_trials):
"""Return the probability of getting 'i' blue balls when 'num_picks' balls are
randomly taken from 'self.bag' without replacement. The return value is
a list whose ith element is the probability of getting 'i' blue balls.
"""
counts = [0] * (num_picks + 1)
for i in range(num_trials):
counts[ self.count_blues(num_picks) ] += 1
return [ x/num_trials for x in counts ]
def count_blues(self, num_picks):
"""Returns the number of blue balls observed when 'num_picks' balls are taken
from 'self.bag' without replacement."""
balls = rand.sample(self.bag, num_picks)
return balls.count(Color.BLUE)
The sample function in Python’s random library is used to get a random
sample sample from the input population, without replacement. The count_blues
function gets a sample, and then counts the number of blue balls it contains.
The prob_blues function repeatedly calls count_balls to estimate the
probability of getting each possible number of blue balls.
My initial idea was to compute the probabilities using the prob_blues
function, and then to use the formulas for
It then stuck me that it was an unnecessarily roundabout way. Expectation is
ultimately just the mean of the random variable. So we could as well take the
average of the values that count_blues returns to get the Expectation.
So I ended up not using the prob_blues function. The rest of the script is
given below:
rb = RedBlue(5, 4)
# Compute the expectation and variance
s = 0 # For E[X]
s2 = 0 # For E[X^2]
num_trials = int(1e5)
num_picks = 3
for i in range(num_trials):
num_blues = rb.count_blues(num_picks)
s += num_blues
s2 += num_blues * num_blues
mean = s/num_trials
mean_x2 = s2/num_trials
var = mean_x2 - mean * mean
print("Mean = {0}, Variance = {1}".format(mean, var))
The program outputs the estimated Expectation as
Advantages
Is this method more efficient than computing the probabilities first and then
finding the weighted average? The answer is no. Whether we compute the
probabilities or not, we end up doing essentially the same work. If we do
In the function given above, all of the computation happens in a loop which gets
executed
There is, however, one important advantage: for computing the probabilities, we
need to spend
The Geometric Random Variable
The geometric random variable is a case in point. It stands for the number of
trials we need to perform for being successful. If we encounter
The geometric random variable can have any positive integer value. Therefore, to
estimate its expectation and variance using the probability method, we need
Even more importantly, it also works for continuous random variables.
The Uniform Random Variable
Possibly the simplest continuous random variable is the Uniform random variable.
Python’s random module’s random() function is a uniform random variable with
range prob_blues function to evaluate it.
But it is much clearer to use the direct method:
s, s2 = 0, 0
num_trials = int(1e5)
for i in range(num_trials):
x = rand.random()
s += x
s2 += x*x
mean = s/num_trials
mean_x2 = s2/num_trials
var = mean_x2 - mean * mean
print("Estimated mean = {0:.4f}, variance = {1:.4f}".format(mean, var))
When you run the code, you should get the estimated mean to be close to
If you want to play around with the code, you can get the files from
the Expectation and Variance sub-directory of this git repository.