1. Random
  2. 11. Finite Sampling Models
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9

7. The Coupon Collector Problem

Basic Theory

Definitions

In this section, our random experiment is to sample repeatedly, with replacement, from the population D={1,2,,m}. This generates a sequence of independent random variables X=(X1,X2,), each uniformly distributed on D. We will often interpret the sampling in terms of a coupon collector: each time the collector buys a certain product (bubble gum or Cracker Jack, for example) she receives a coupon (a baseball card or a toy, for example) which is equally likely to be any one of m types. Thus, in this setting, XiD is the coupon type received on the ith purchase.

Let Vn denote the number of distinct values in the first n selections, for nN+. This is the random variable studied in the last section on the birthday problem. Our interest is in this section is the sample size needed to get a specified number of distinct sample values

For k{1,2,,m}, let Wk=min{nN+:Vn=k} the sample size needed to get k distinct sample values. Wk takes values in the set {k,k+1,}.

In terms of the coupon collector, this random variable gives the number of products required to get k distinct coupon types. We will be particularly interested in Wm, the sample size needed to get the entire population. In terms of the coupon collector, this is the number of products required to get the entire set of coupons.

In the coupon collector experiment, run the experiment in single-step mode a few times for selected values of the parameters.

The Probability Density Function

Now let's find the distribution of Wk. The results of the previous section will be very helpful

For k{1,2,,m}, the probability density function of Wk is given by P(Wk=n)=(m1k1)j=0k1(1)j(k1j)(kj1m)n1,n{k,k+1,}

Details:

Note first that Wk=n if and only if Vn1=k1 and Vn=k. Hence P(Wk=n)=P(Vn1=k1)P(Vn=kVn1=k1)=mk+1mP(Vn1=k1) Using the PDF of Vn1 from the birthday problem gives the result.

In the coupon collector experiment, vary the parameters and note the shape of and position of the probability density function. For selected values of the parameters, run the experiment 1000 times and compare the relative frequency function to the probability density function.

An alternate approach to the probability density function of Wk is via a recursion formula.

For fixed m, let gk denote the probability density function of Wk. Then

  1. gk(n+1)=k1mgk(n)+mk+1mgk1(n)
  2. g1(1)=1

Decomposition as a Sum

We will now show that Wk can be decomposed as a sum of k independent, geometrically distributed random variables. This will provide some additional insight into the nature of the distribution and will make the computation of the mean and variance easy.

For i{1,2,,m}, let Zi denote the number of additional samples needed to go from i1 distinct values to i distinct values. Then Z=(Z1,Z2,,Zm) is a sequence of independent random variables, and Zi has the geometric distribution on N+ with parameter pi=mi+1m. Moreover, Wk=i=1kZi,k{1,2,,m}

This result shows clearly that each time a new coupon is obtained, it becomes harder to get the next new coupon.

In the coupon collector experiment, run the experiment in single-step mode a few times for selected values of the parameters. In particular, try this with m large and k near m.

Moments

The decomposition as a sum of independent variables provides an easy way to compute the mean and other moments of Wk.

The mean and variance of the sample size needed to get k distinct values are

  1. E(Wk)=i=1kmmi+1
  2. var(Wk)=i=1k(i1)m(mi+1)2
Details:

These results follow from [6] and standard results for the geometric distribution, since E(Wk)=i=1kE(Zi) and var(Wk)=i=1kvar(Zi).

In the coupon collector experiment, vary the parameters and note the shape and location of the mean ± standard deviation bar. For selected values of the parameters, run the experiment 1000 times and compare the sample mean and standard deviation to the distribution mean and standard deviation.

The probability generating function of Wk is given by E(tWk)=i=1kmi+1m(i1)t,|t|<mk1

Details:

This follows from [6] and standard results for the geometric distribution on N+, since E(tWk)=i=1kE(tZi).

Examples and Applications

Suppose that people are sampled at random until 40 distinct birthdays are obtained. Find each of the following:

  1. The probability density function of the sample size.
  2. The mean of the sample size.
  3. The variance of the sample size.
  4. The probability generating function of the sample size.
Details:

Let W denote the sample size.

  1. P(W=n)=(364n)j=030(1)j(39j)(39j365)n1 for n{40,41,}
  2. E(W)=42.3049
  3. var(W)=2.4878
  4. E(tW)=i=140366i365(i1)t for |t|<36539

Suppose that a standard, fair die is thrown until all 6 scores have occurred. Find each of the following:

  1. The probability density function of the number of throws.
  2. The mean of the number of throws.
  3. The variance of the number of throws.
  4. The probability that at least 10 throws are required.
Details:

Let W denote the number of throws.

  1. P(W=n)=j=05(1)j(5j)(5j6)n1 for n{6,7,}
  2. E(W)=14.7
  3. var(W)=38.99
  4. P(W10)=105112960.81096

A box of a certain brand of cereal comes with a special toy. There are 10 different toys in all. A collector buys boxes of cereal until she has all 10 toys. Find each of the following:

  1. The probability density function of the number boxes purchased.
  2. The mean of the number of boxes purchased.
  3. The variance of the number of boxes purchased.
  4. The probability that no more than 15 boxes were purchased.
Details:

Let W denote the number of boxes purchased.

  1. P(W=n)=j=09(1)j(9j)(9j10)n1, for n{10,11,}
  2. E(W)=29.2897
  3. var(W)=125.6871
  4. P(W15)=0.04595