Basic Theory
Definitions
In this section, our random experiment is to sample repeatedly, with replacement, from the population . This generates a sequence of independent random variables , each uniformly distributed on . We will often interpret the sampling in terms of a coupon collector: each time the collector buys a certain product (bubble gum or Cracker Jack, for example) she receives a coupon (a baseball card or a toy, for example) which is equally likely to be any one of types. Thus, in this setting, is the coupon type received on the th purchase.
Let denote the number of distinct values in the first selections, for . This is the random variable studied in the last section on the birthday problem. Our interest is in this section is the sample size needed to get a specified number of distinct sample values
For , let
the sample size needed to get distinct sample values. takes values in the set .
In terms of the coupon collector, this random variable gives the number of products required to get distinct coupon types. We will be particularly interested in , the sample size needed to get the entire population. In terms of the coupon collector, this is the number of products required to get the entire set of coupons.
The Probability Density Function
Now let's find the distribution of . The results of the previous section will be very helpful
For , the probability density function of is given by
Details:
Note first that if and only if and . Hence
Using the PDF of from the birthday problem gives the result.
In the coupon collector experiment, vary the parameters and note the shape of and position of the probability density function. For selected values of the parameters, run the experiment 1000 times and compare the relative frequency function to the probability density function.
An alternate approach to the probability density function of is via a recursion formula.
For fixed , let denote the probability density function of . Then
Decomposition as a Sum
We will now show that can be decomposed as a sum of independent, geometrically distributed random variables. This will provide some additional insight into the nature of the distribution and will make the computation of the mean and variance easy.
For , let denote the number of additional samples needed to go from distinct values to distinct values. Then is a sequence of independent random variables, and has the geometric distribution on with parameter . Moreover,
This result shows clearly that each time a new coupon is obtained, it becomes harder to get the next new coupon.
In the coupon collector experiment, run the experiment in single-step mode a few times for selected values of the parameters. In particular, try this with large and near .
Moments
The decomposition as a sum of independent variables provides an easy way to compute the mean and other moments of .
The mean and variance of the sample size needed to get distinct values are
Details:
These results follow from [6] and standard results for the geometric distribution, since and .
In the coupon collector experiment, vary the parameters and note the shape and location of the mean standard deviation bar. For selected values of the parameters, run the experiment 1000 times and compare the sample mean and standard deviation to the distribution mean and standard deviation.
The probability generating function of is given by
Details:
This follows from [6] and standard results for the geometric distribution on , since .
Examples and Applications
Suppose that people are sampled at random until 40 distinct birthdays are obtained. Find each of the following:
- The probability density function of the sample size.
- The mean of the sample size.
- The variance of the sample size.
- The probability generating function of the sample size.
Details:
Let denote the sample size.
- for
- for
Suppose that a standard, fair die is thrown until all 6 scores have occurred. Find each of the following:
- The probability density function of the number of throws.
- The mean of the number of throws.
- The variance of the number of throws.
- The probability that at least 10 throws are required.
Details:
Let denote the number of throws.
- for
A box of a certain brand of cereal comes with a special toy. There are 10 different toys in all. A collector buys boxes of cereal until she has all 10 toys. Find each of the following:
- The probability density function of the number boxes purchased.
- The mean of the number of boxes purchased.
- The variance of the number of boxes purchased.
- The probability that no more than 15 boxes were purchased.
Details:
Let denote the number of boxes purchased.
- , for