1. Random
  2. 10. Bernoulli Trials
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7

3. The Geometric Distribution

Basic Theory

Definitions

Suppose again that our random experiment is to perform a sequence of Bernoulli trials X=(X1,X2,) with success parameter p(0,1]. In this section we will study the random variable N that gives the trial number of the first success and the random variable M that gives the number of failures before the first success.

Let N=min{nN+:Xn=1}, the trial number of the first success, and let M=N1, the number of failures before the first success. The distribution of N is the geometric distribution on N+ and the distribution of M is the geometric distribution on N. In both cases, p is the success parameter of the distribution.

Since N and M differ by a constant, the properties of their distributions are very similar. Nonetheless, there are applications where it more natural to use one rather than the other, and in the literature, the term geometric distribution can refer to either. In this section, we will concentrate on the distribution of N, pausing occasionally to summarize the corresponding results for M. Throughout this text, we use the term geometric distribution N+ and geometric distribution on N to distinguish the two, as in [1].

The Probability Density Function

N has probability density function f given by f(n)=p(1p)n1 for nN+.

Details:

Note first that {N=n}={X1=0,,Xn1=0,Xn=1}. By independence, the probability of this event is (1p)n1p.

To check that f is a valid PDF, we use standard results for geometric series n=1P(N=n)=n=1(1p)n1p=p1(1p)=1

A priori, we might have thought it possible to have N= with positive probability; that is, we might have thought that we could run Bernoulli trials forever without ever seeing a success. However, we now know this cannot happen when the success parameter p is positive.

The probability density function of M is given by P(M=n)=p(1p)n for nN.

In the negative binomial experiment, set k=1 to get the geometric distribution on N+. Vary p with the scrollbar and note the shape and location of the probability density function. For selected values of p, run the simulation 1000 times and compare the relative frequency function to the probability density function.

Note that the probability density functions of N and M are decreasing, and hence have modes at 1 and 0, respectively. The geometric form of the probability density functions also explains the term geometric distribution.

Distribution Functions and the Memoryless Property

Suppose that T is a random variable taking values in N+. Recall that the ordinary distribution function of T is the function nP(Tn). In this section, the complementary function nP(T>n) will play a fundamental role. We will refer to this function as the right distribution function of T. Of course both functions completely determine the distribution of T. Suppose again that N has the geometric distribution on N+ with success parameter p(0,1].

N has right distribution function G given by G(n)=(1p)n for nN.

Details:

For a proof using the Bernoulli trials formulation, note that {N>n}={X1=0,,Xn=0}. By independence, the probability of this event is (1p)n.

For a direct proof, we use geometric series: P(N>n)=k=n+1P(N=k)=k=n+1(1p)k1p=p(1p)n1(1p)=(1p)n

From [5], it follows that the ordinary (left) distribution function of N is given by F(n)=1(1p)n,nN We will now explore another characterization known as the memoryless property.

For mN, the conditional distribution of Nm given N>m is the same as the distribution of N. That is, P(N>n+mN>m)=P(N>n);m,nN

Details:

From [5] and the definition of conditional probability, P(N>n+mN>m)=P(N>n+m)P(N>m)=(1p)n+m(1p)m=(1p)n=P(N>n)

So if the first success has not occurred by trial number m, then the remaining number of trials needed to achieve the first success has the same distribution as the trial number of the first success in a fresh sequence of Bernoulli trials. In short, Bernoulli trials have no memory. This fact has implications for a gambler betting on Bernoulli trials (such as in the casino games roulette or craps). No betting strategy based on observations of past outcomes of the trials can possibly help the gambler.

Conversely, if T is a random variable taking values in N+ that satisfies the memoryless property in , then T has a geometric distribution.

Details:

Let G(n)=P(T>n) for nN. The memoryless property and the definition of conditional probability imply that G(m+n)=G(m)G(n) for m,nN. Note that this is the law of exponents for G. It follows that G(n)=Gn(1) for nN. Hence T has the geometric distribution with parameter p=1G(1).

Moments

Suppose again that N is the trial number of the first success in a sequence of Bernoulli trials, so that N has the geometric distribution on N+ with parameter p(0,1]. The mean and variance of N can be computed in several different ways.

E(N)=1p

Details:

Our first proof uses the probability density function in [2]. Using the derivative of the geometric series, E(N)=n=1np(1p)n1=pn=1n(1p)n1=pn=1ddp(1p)n=pddpn=0(1p)n=pddp1p=p(1p2)=1p

For our second proof, recall that since N takes positive integer values, its expected value can be computed as the sum of the right distribution function. Hence E(N)=n=0P(N>n)=n=0(1p)n=1p

For our third proof, we condition on the first trial X1: If X1=1 then N=1 and hence E(NX1=1)=1. If X1=0 (equivalently N>1) then by the memoryless property, N1 has the same distribution as N. Hence E(NX1=0)=1+E(N). In short E(NX1)=1+(1X1)E(N) It follows that E(N)=E[E(NX1)]=1+(1p)E(N) Solving gives E(N)=1p.

Exercise [8] makes intuitive sense. In a sequence of Bernoulli trials with success parameter p we would expect to wait 1/p trials for the first success.

var(N)=1pp2

Details:

Our first proof uses the probability density function in [2] tp compute E[N(N1)]. This is an example of a factorial moment, and we will compute the general factorial moments in [12]. Using derivatives of the geometric series again, E[N(N1)]=n=2n(n1)p(1p)n1=p(1p)n=2n(n1)(1p)n2=p(1p)d2dp2n=0(1p)n=p(1p)d2dp21p=p(1p)2p3=21pp2 Since E(N)=1p, it follows that E(N2)=2pp2 and hence var(N)=1pp2

Our second proof uses a conditioning argument. From the proof of , E(NX1)=1+(1X1)E(N)=1+1p(1X1) and by the same reasoning, var(NX1)=(1X1)var(N). Hence var(N)=var[E(NX1)]+E[var(NX1)]=1p2p(1p)+(1p)var(N) Solving gives var(N)=1pp2.

Note that var(N)=0 if p=1, hardly surprising since N is deterministic (taking just the value 1) in this case. At the other extreme, var(N) as p0.

In the negative binomial experiment, set k=1 to get the geometric distribution. Vary p with the scrollbar and note the location and size of the mean±standard deviation bar. For selected values of p, run the simulation 1000 times and compare the sample mean and standard deviation to the distribution mean and standard deviation.

The probability generating function P of N is given by P(t)=E(tN)=pt1(1p)t,|t|<11p

Details:

This result follows from yet another application of geometric series: E(tN)=n=1tnp(1p)n1=ptn=1[t(1p)]n1=pt1(1p)t,|(1p)t|<1

Recall again that for xR and kN, the falling power of x of order k is x(k)=x(x1)(xk+1). If X is a random variable, then E[X(k)] is the factorial moment of X of order k.

The factorial moments of N are given by E[N(k)]=k!(1p)k1pk,kN+

Details:

Our first proof uses the probability density function in [2]. Using derivatives of geometric series again, E[N(k)]=n=kn(k)p(1p)n1=p(1p)k1n=kn(k)(1p)nk=p(1p)k1(1)kdkdpkn=0(1p)n=p(1p)k1(1)kdkdpk1p=k!(1p)k1pk

Our second proof uses the probability generating function P in [11] Recall that E[N(k)]=P(k)(1). So the result follows from standard calculus.

Suppose that p(0,1). The skewness and kurtosis of N are

  1. skew(N)=2p1p
  2. kurt(N)=p21p
Details:

The factorial moments in [12] can be used to find the moments of N about 0. The results then follow from the standard computational formulas for skewness and kurtosis.

Note that the geometric distribution is always positively skewed. Moreover, skew(N) and kurt(N) as p1.

Suppose now that M=N1, so that M (the number of failures before the first success) has the geometric distribution on N. Then

  1. E(M)=1pp
  2. var(M)=1pp2
  3. skew(M)=2p1p
  4. kurt(M)=p21p
  5. E(tM)=p1(1p)t for |t|<11p

Of course, the fact that the variance, skewness, and kurtosis are unchanged follows easily, since N and M differ by a constant.

The Quantile Function

Let F denote the distribution function of N, so that F(n)=1(1p)n for nN. Recall that F1(r)=min{nN+:F(n)r} for r(0,1) is the quantile function of N.

The quantile function of N is F1(r)=ln(1r)ln(1p),r(0,1)

Of course, the quantile function, like the probability density function and the distribution function, completely determines the distribution of N. Moreover, we can compute the median and quartiles to get measures of center and spread.

The first quartile, the median (or second quartile), and the third quartile are

  1. F1(14)=ln(3/4)/ln(1p)0.2877/ln(1p)
  2. F1(12)=ln(1/2)/ln(1p)0.6931/ln(1p)
  3. F1(34)=ln(1/4)/ln(1p)1.3863/ln(1p)

Open the quantile app, and select the geometric distribution and CDF view. Vary p and note the shape and location of the CDF/quantile function. For various values of p, compute the median and the first and third quartiles.

The Constant Rate Property

Suppose that T is a random variable taking values in N+, which we interpret as the first time that some event of interest occurs.

The function h given by h(n)=P(T=nTn)=P(T=n)P(Tn),nN+ is the rate function of T.

If T is interpreted as the (discrete) lifetime of a device, then h is a discrete version of the failure rate function studied in reliability theory. However, in our usual formulation of Bernoulli trials, the event of interest is success rather than failure (or death), so we will simply use the term rate function to avoid confusion. The constant rate property characterizes the geometric distribution. As usual, let N denote the trial number of the first success in a sequence of Bernoulli trials with success parameter p(0,1), so that N has the geometric distribution on N+ with parameter p.

N has constant rate p.

Details:

From the results above, P(N=n)=p(1p)n1 and P(Nn)=P(N>n1)=(1p)n1, so P(N=n)/P(Nn)=p for nN+.

Conversely, if T has constant rate p(0,1) then T has the geometric distrbution on N+ with success parameter p.

Details:

Let H(n)=P(Tn) for nN+. From the constant rate property, P(T=n)=pH(n) for nN+. Next note that P(T=n)=H(n)H(n+1) for nN+. Thus, H satisfies the recurrence relation H(n+1)=(1p)H(n) for nN+. Also H satisfies the initial condition H(1)=1. Solving the recurrence relation gives H(n)=(1p)n1 for nN+.

Relation to the Uniform Distribution

Suppose again that X=(X1,X2,) is a sequence of Bernoulli trials with success parameter p(0,1). For nN+, recall that Yn=i=1nXi, the number of successes in the first n trials, has the binomial distribution with parameters n and p. As before, N denotes the trial number of the first success.

Suppose that nN+. The conditional distribution of N given Yn=1 is uniform on {1,2,,n}.

Details:

Our first proof is a sampling argument. We showed in the last section that given Yn=k, the trial numbers of the successes form a random sample of size k chosen without replacement from {1,2,,n}. This result is a simple corollary with k=1

For a direct proof, let j{1,2,,n}. Then P(N=jYn=1)=P(N=j,Yn=1)P(Yn=1)=P(Yj1=0,Xj=1,YnYj=0)P(Yn=1) In words, the events in the numerator of the last fraction are that there are no successes in the first j1 trials, a success on trial j, and no successes in trials j+1 to n. These events are independent so P(N=jYn=1)=(1p)j1p(1p)njnp(1p)n1=1n

Note that the conditional distribution does not depend on the success parameter p. If we know that there is exactly one success in the first n trials, then the trial number of that success is equally likely to be any of the n possibilities. Another connection between the geometric distribution and the uniform distribution is given for the alternating coin tossing game in [43]: the conditional distribution of N given Nn converges to the uniform distribution on {1,2,,n} as p0.

Relation to the Exponential Distribution

The Poisson process on [0,), named for Simeon Poisson, is a model for random points in continuous time. There are many deep and interesting connections between the Bernoulli trials process (which can be thought of as a model for random points in discrete time) and the Poisson process. These connections are explored in detail in the chapter referenced above. In this section we just give the most famous and important result—the convergence of the geometric distribution to the exponential distribution. The geometric distribution, as we know, governs the time of the first random point in the Bernoulli trials process, while the exponential distribution governs the time of the first random point in the Poisson process.

The exponential distribution with rate parameter r(0,) has distribution function F(x)=1erx for x[0,). The mean of the exponential distribution is 1/r and the variance is 1/r2. In addition, the moment generating function is s1sr for s>r.

For nN+, suppose that Un has the geometric distribution on N+ with success parameter pn(0,1), where npnr>0 as n. Then the distribution of Un/n converges to the exponential distribution with parameter r as n.

Details:

Let Fn denote the CDF of Un/n. Then for x[0,) Fn(x)=P(Unnx)=P(Unnx)=P(Unnx)=1(1pn)nx But by a famous limit from calculus, (1pn)n=(1npnn)ner as n, and hence (1pn)nxerx as n. But by definition, nxnx<nx+1 or equivalently, nx1<nxnx so it follows that (1pn)nxerx as n. Hence Fn(x)1erx as n, which is the CDF of the exponential distribution.

Note that the condition npnr as n in [22] is the same condition required for the convergence of the binomial distribution to the Poisson.

Relation to the Binomial Distribution

The following result explores a connection between the binomial distribution studied in the last section and the geometric distribution studied in this section.

Suppose that X=(X1,X2,) is a sequence of Bernoulli trials with success parameter p(0,1), so that Yn=i=1nXi has the binomial distribution with parameters n and p for each nN. Now suppose that N has the geometric distribution on N with success parameter q(0,1) and that N is independent of X. Then YN has the geometric distribution on N with success parameter q/(p+qpq).

Details:

As is often the case, there are both analytic and probabilistic proofs. We start with the analytic proof, which of course involves conditioning on N: P(YN=k)=n=kP(YN=kN=n)P(N=n)=n=k(nk)pk(1p)nk(1q)nq=[p(1q)]kqn=k(nk)[(1p)(1q)]nk,kN Using derivatives of the geometic series for the last sum we have P(YN=k)=[p(1q)]kq1[1(1p)(1q)]k+1=qp+qpq(ppqp+qpq)k,kN

For the probabilistic proof, suppose that W=(W1,W2,) is a sequence of Bernoulli trials with success probability q, and independent of X. Of course, for nN we can interpret Yn as the number of successes in the X sequence in the first n trials. But now, we can also interpret N as the number of failures before the first success in the W sequence. So YN is the number of success in the X sequence before the first success in the W sequence. Equivalently, YN is the number of failures before the first success in a sequence of Bernoulli trials where success on trial jN+ is P(Wj=1Xj=1 or Wj=1)=qp+qpq

We will see a similar relationship between the exponential distribution and the Poisson distribution in the chapter on Poisson process.

Special Families

The geometric distribution on N is an infinitely divisible distribution and is a compound Poisson distribution. For the details, visit these individual sections and see the next section on the negative binomial distribution.

Examples and Applications

Simple Exercises

A standard, fair die is thrown until an ace occurs. Let N denote the number of throws. Find each of the following:

  1. The probability density function of N.
  2. The mean of N.
  3. The variance of N.
  4. The probability that the die will have to be thrown at least 5 times.
  5. The quantile function of N.
  6. The median and the first and third quartiles.
Answwer:
  1. P(N=n)=(56)n116 for nN+
  2. E(N)=6
  3. var(N)=30
  4. P(N5)=525/1296
  5. F1(r)=ln(1r)/ln(5/6) for r(0,1)
  6. Quartiles q1=2, q2=4, q3=8

A type of missile has failure probability 0.02. Let N denote the number of launches before the first failure. Find each of the following:

  1. The probability density function of N.
  2. The mean of N.
  3. The variance of N.
  4. The probability of 20 consecutive successful launches.
  5. The quantile function of N.
  6. The median and the first and third quartiles.
Details:
  1. P(N=n) (4950)n1150 for nN+
  2. E(N)=50
  3. var(N)=2450
  4. P(N>20)=0.6676
  5. F1(r)=ln(1r)/ln(0.98) for r(0,1)
  6. Quartiles q1=15, q2=35, q3=69

A student takes a multiple choice test with 10 questions, each with 5 choices (only one correct). The student blindly guesses and gets one question correct. Find the probability that the correct question was one of the first 4.

Details:

0.4

Recall that an American roulette wheel has 38 slots: 18 are red, 18 are black, and 2 are green. Suppose that you observe red or green on 10 consecutive spins. Give the conditional distribution of the number of additional spins needed for black to occur.

Details:

Geometric on N+ with p=1838

The game of roulette is studied in more detail in the chapter on games of chance.

In the negative binomial experiment, set k=1 to get the geometric distribution and set p=0.3. Run the experiment 1000 times. Compute the appropriate relative frequencies and empirically investigate the memoryless property P(V>5V>2)=P(V>3)

The Petersburg Problem

We will now explore a gambling situation, known as the Petersburg problem, which leads to some famous and surprising results.

Suppose that we are betting on a sequence of Bernoulli trials with success parameter p(0,1). We can bet any amount of money on a trial at even stakes: if the trial results in success, we receive that amount, and if the trial results in failure, we must pay that amount. We will use the following strategy, known as a martingale strategy:

  1. We bet c units on the first trial.
  2. Whenever we lose a trial, we double the bet for the next trial.
  3. We stop as soon as we win a trial.

Let N denote the number of trials played, so that N has the geometric distribution on N+ with parameter p, and let W denote our net winnings when we stop.

W=c

Details:

The first win occurs on trial N, so the initial bet was doubled N1 times. The net winnings are W=ci=0N22i+c2N1=c(12N1+2N1)=c

So W is not random and W is independent of p! Since c is an arbitrary constant, it would appear that we have an ideal strategy. However, let us study the amount of money Z needed to play the strategy.

Z=c(2N1)

The expected amount of money needed for the martingale strategy is E(Z)={c2p1,p>12,p12

So the strategy is fatally flawed when the trials are unfavorable and even when they are fair, since we need infinite expected capital to make the strategy work in these cases.

Compute E(Z) explicitly if c=100 and p=0.55.

Details:

$1000

In the negative binomial experiment, set k=1. For each of the following values of p, run the experiment 100 times. For each run compute Z (with c=1). Find the average value of Z over the 100 runs:

  1. p=0.2
  2. p=0.5
  3. p=0.8

For more information about gambling strategies, see the section on red and black and the chpater on martingales.

The Alternating Coin-Tossing Game

A coin has probability of heads p(0,1). There are nN+ players who take turns tossing the coin in round-robin style: player 1 first, then player 2, continuing until player n, then player 1 again, and so forth. The first player to toss heads wins the game.

Let N denote the number of the first toss that results in heads. Of course, N has the geometric distribution on N+ with parameter p. Additionally, let R denote the number of complete tail rounds played (all n players toss tails) and let W denote the winner of the game. So R takes values in N and W in the set {1,2,,n}. We are interested in the joint distribution of (R,W).

Random variables N, R and W are related by N=nR+W and so (R,W) has joint probability density function given by P(R=k,W=i)=(1p)nk+i1p,kN+,i{1,2,,n}

Details:

Clearly if there are kN complete tail rounds and then player i{1,2,,n} tosses heads, then the trial number of the first head is kn+i. Conversely if the trial number of the first head is mN+ then there are m1 initial tails and so the number of complete tail rounds is k=(m1)/n and then the winning player is i=mkn. That is, R=(N1)/n and W=NRn. The result then follows since N has the geoemtric distribution with parameter p.

Random variables R and W are independent.

  1. R has the geometric distribuion on N with parameter 1(1p)n.
  2. W has probability density function given by P(W=i)=(1p)i1p1(1p)n,i{1,2,,n}
Details:

The independence of R and W follows from [35] and the factoring theorem since the joint density is a product of a function of k and a function of i. Explicitly, we can rewrite the joint density function in [35] in the form P(R=k,W=i)=[(1p)n]k[1(1p)n](1p)i1p1(1p)n,kN+,i{1,2,,n}

Open the alternating coin experiment. Vary the parameters and note the shape and location of the probability density functions. For selected values of the parameters, run the simulation 1000 times and compare the relative frequency functions to the probability density functions.

Describe the distributions of R and W when the coin is fair so that p=1/2.

Details:

The number of rounds R has the geometric distribution on N with parameter 1(1/2)n. The winning player W has probability density function given by P(W=i)=2ni2n1,i{1,2,,n}

Describe the distributions of R and W with two players, so that n=2.

Details:

The number of rounds R has the geometric distribution on N with parameter p(2p). The winning player W has probability density function given by P(W=1)=12p,P(W=2)=1p2p

Combining [38] and [39], we have two players with a fair coin. The number of rounds R has the geometric distribution on N with parameter 3/4. Player 1 wins with probability 2/3 while player 2 wins with probability 1/3.

Let's return to the case of general nN+ and p(0,1). Since R has a geometric distribution on N, all of the basic theory above applies, with p replaced by 1(1p)n. So for most of the remainder of this subsection, we will concentrate on random variable W, the winning player.

P(W=i)=(1p)i1P(W=1) for i{1,2,,n}.

Details:

This result can be argued directly, using the memoryless property of the geometric distribution in [6]. In order for player i to win, the previous i1 players must first all toss tails. Then, player i effectively becomes the first player in a new sequence of tosses. This result can be used to give another derivation of the probability density function in [36].

Note that P(W=i) is a decreasing function of i{1,2,,n}. Not surprisingly, the lower the toss order the better for the player. Note also from [36] that W itself has a truncated geometric distribution.

The distribution of W is the same as the conditional distribution of N given Nn: P(W=i)=P(N=iNn),i{1,2,,n}

Details:

This follows from the same probabilistic argument as in the details of [40]. For a direct proof, P(N=iNn)=P(N=i)P(Nn)=(1p)i1p1(1p)n,i{1,2,,n}

Our next results concern the moments of W.

The probability generating function of W is given by E(tW)=pt1t(1p)1[t(1p)]n1(1p)n,|t|<11p

Details:

The result can be obtained by direct computation and standard results for geometric series. But an easier and better method is to use [35], the independence of R and W, and the moment generating functions of N and R: E(tN)=E(tnR+W)=E[(tn)R]E(tW) But E(tN)=pt1(1p)t,|t|<11pE[(tn)R]=1(1p)n1(1p)ntn,|t|<1(1p)n

The mean and variance of W are

  1. E(W)=1pn(1p)n1(1p)n
  2. var(W)=1pp2n2(1p)n[1(1p)n]2
Details:

The results can be obtained from the definitions of mean and variance and derivatives of geometric series. The results can also be obtained by taking derivatives of the probability generating function in [42]. For both methods the computations are extremely tedious. But the computations are easy using [35], the independence of R and W, and the corresponding moments of R and N.

  1. E(N)=nE(R)+E(W). But E(N)=1/p and E(R)=(1p)n/[1(1p)n].
  2. var(N)=n2var(R)+var(W). But var(N)=(1p)/p2 and var(R)=(1p)n/[1(1p)n]2.

Open the alternating coin experiment. Vary the parameters and note the shape and location of the mean±standard deviation bars. For selected values of the parameters, run the simulation 1000 times and compare the empirical means and standard deviations to the corresponding distribution values.

The following result explore some limiting distributions related to the alternating coin-tossing game.

Limiting distributions:

  1. As n for fixed p(0,1), the distribution of R converges to point mass at 0 and the distribution of W converges to the geometric distribution on N+ with parameter p.
  2. As p0 for fixed nN+, the distribution of R converges to point mass at and the distribution of W converges to the uniform distribution on {1,2,,n}.
Details:
  1. Fix p(0,1) and let n. Then 1(1p)n1 and so the geometric distribution of R converges to point mass at 0. From [36], P(W=i)(1p)i1p for iN+. Alternately, from [41], P(W=i)P(N=i).
  2. Fix nN+ and let p0. Then 1(1p)n0 and so the geometric distribution of R converges to point mass at . From [36] again and L'Hospital's rule, P(W=i)1/n for i{1,2,,n}. The probability generating function in [42] could also be used.

Players at the end of the tossing order should hope for a coin biased towards tails.

Open the alternating coin experiment. Although the parameter values are limited, you can beginn to see the convergence results in [45].

  1. Set p=0.5 and increase n from 2 to 10. Note how the density function of W begins to resemble the density function of N.
  2. Set n=5 and decrease p from 0.95 to 0.05. Note how the density function of W becomes increasingly uniform.

Odd Man Out

In the game of odd man out, we start with a specified number of players, each with a coin that has the same probability of heads. The players toss their coins at the same time. If there is an odd man, that is a player with an outcome different than all of the other players, then the odd player is eliminated; otherwise no player is eliminated. In any event, the remaining players continue the game in the same manner. A slight technical problem arises with just two players, since different outcomes would make both players odd. So in this case, we might (arbitrarily) make the player with tails the odd man.

Suppose there are k{2,3,} players and p[0,1]. In a single round, the probability of an odd man is rk(p)={2p(1p),k=2kp(1p)k1+kpk1(1p),k{3,4,}

Details:

Let Y denote the number of heads. If k=2, the event that there is an odd man is {Y=1}. If k3, the event that there is an odd man is {Y{1,k1}}. The result now follows since Y has a binomial distribution with parameters k and p.

The graph of rk is more interesting than you might think.

The graphs of rk for k{3,4,5,6}
Odd man out graph

For k{2,3,}, rk has the following properties:

  1. rk(0)=rk(1)=0
  2. rk is symmetric about p=12
  3. For fixed p[0,1], rk(p)0 as k.
Details:

These properties are clear from the functional form of rk(p). Note that rk(p)=rk(1p).

For k{2,3,4}, rk has the following properties:

  1. rk increases and then decreases, with maximum at p=12.
  2. rk is concave downward
Details:

This follows by computing the first derivatives: r2(p)=2(12p), r3(p)=3(12p), r4(p)=4(12p)3, and the second derivatives: r2(p)=4, r3(p)=6, r4(p)=24(12p)2.

For k{5,6,}, rk has the following properties:

  1. The maximum occurs at two points of the form pk and 1pk where pk(0,12) and pk0 as k.
  2. The maximum value rk(pk)1/e0.3679 as k.
  3. The graph has a local minimum at p=12.
Details:

Note that rk(p)=sk(p)+sk(1p) where sk(t)=ktk1(1t) for t[0,1]. Also, psk(p) is the dominant term when p>12 while psk(1p) is the dominant term when p<12. A simple analysis of the derivative shows that sk increases and then decreases, reaching its maximum at (k1)/k. Moreover, the maximum value is sk[(k1)/k]=(11/k)k1e1 as k. Also, sk is concave upward and then downward, with inflection point at (k2)/k.

Suppose p(0,1), and let Nk denote the number of rounds until an odd man is eliminated, starting with k players. Then Nk has the geometric distribution on N+ with parameter rk(p). The mean and variance are

  1. μk(p)=1/rk(p)
  2. σk2(p)=[1rk(p)]/rk2(p)

As we might expect, μk(p) and σk2(p) as k for fixed p(0,1). On the other hand, from [50], μk(pk)e and σk2(pk)e2e as k.

Suppose we start with k{2,3,} players and p(0,1). The number of rounds until a single player remains is Mk=j=2kNj where (N2,N3,,Nk) are independent and Nj has the geometric distribution on N+ with parameter rj(p). The mean and variance are

  1. E(Mk)=j=2k1/rj(p)
  2. var(Mk)=j=2k[1rj(p)]/rj2(p)
Details:

The form of Mk follows from [51]: Nk is the number of rounds until the first player is eliminated. Then the game continues independently with k1 players, so Nk1 is the number of additional rounds until the second player is eliminated, and so forth. Parts (a) and (b) follow from [51] and standard properties of expected value and variance.

Starting with k players and probability of heads p(0,1), the total number of coin tosses is Tk=j=2kjNj. The mean and variance are

  1. E(Tk)=j=2kj/rj(p)
  2. var(Tk)=j=2kj2[1rj(p)]/rj2(p)
Details:

As before, the form of Mk follows from [49]: Nk is the number of rounds until the first player is eliminated, and each these rounds has k tosses. Then the game continues independently with k1 players, so Nk1 is the number of additional rounds until the second player is eliminated with each round having k1 tosses, and so forth. Parts (a) and (b) also follow from [49] and standard properties of expected value and variance.

Number of Trials Before a Pattern

Consider again a sequence of Bernoulli trials X=(X1,X2,) with success parameter p(0,1). Recall that the number of trials M before the first success (outcome 1) occurs has the geometric distribution on N with parameter p. A natural generalization is the random variable that gives the number of trials before a specific finite sequence of outcomes occurs for the first time. (Such a sequence is sometimes referred to as a word from the alphabet {0,1} or simply a bit string). In general, finding the distribution of this variable is a difficult problem, with the difficulty depending very much on the nature of the word. The problem of finding just the expected number of trials before a word occurs can be solved using powerful tools from the theory of renewal processes and from the theory of martingalges.

To set up the notation, let x denote a finite bit string and let Mx denote the number of trials before x occurs for the first time. Finally, let q=1p. Note that Mx takes values in N. In the following exercises, we will consider x=10, a success followed by a failure. As always, try to derive the results yourself before expanding the details.

The probability density function f10 of M10 is given as follows:

  1. If p12 then f10(n)=pqpn+1qn+1pq,nN
  2. If p=12 then f10(n)=(n+1)(12)n+2 for nN.
Details:

For nN, the event {M10=n} can only occur if there is an initial string of 0s of length k{0,1,,n} followed by a string of 1s of length nk and then 1 on trial n+1 and 0 on trial n+2. Hence f10(n)=P(M10=n)=k=0nqkpnkpq,nN The stated result then follows from standard results on geometric series.

It's interesting to note that f is symmetric in p and q, that is, symmetric about p=12. It follows that the distribution function, probability generating function, expected value), and variance, which we consider below, are all also symmetric about p=12. It's also interesting to note that f10(0)=f10(1)=pq, and this is the largest value. So regardless of p(0,1) the distribution is bimodal with modes 0 and 1.

The distribution function F10 of M10 is given as follows:

  1. If p12 then F10(n)=1pn+3qn+3pq,nN
  2. If p=12 then F10=1(n+3)(12)n+2 for nN.
Details:

By definition, F10(n)=k=0nf10(k) for nN. The stated result then follows from [54], standard results on geometric series, and some algebra.

The probability generating function P10 of M10 is given as follows:

  1. If p12 then P10(t)=pqpq(p1tpq1tq),|t|<min{1/p,1/q}
  2. If p=12 then P10(t)=1/(t2)2 for |t|<2
Details:

By definition, P10(t)=E(tM10)=n=0f10(n)tn for all tR for which the series converges absolutely. The stated result then follows from the theorem above, and once again, standard results on geometric series.

The mean of M10 is given as follows:

  1. If p12 then E(M10)=p4q4pq(pq)
  2. If p=12 then E(M10)=2.
Details:

Recall that E(M10)=P10(1) so the stated result follows from calculus and [56]. The mean can also be computed from the definition E(M10)=n=0nf10(n) using standard results from geometric series, but this method is more tedious.

The graph of E(M10) as a function of p(0,1) is given below. It's not surprising that E(M10) as p0 and as p1, and that the minimum value occurs when p=12.

E(M10) as a function of p
Mean

The variance of M10 is given as follows:

  1. If p12 then var(M10)=2p2q2(p6q6pq)+1pq(p4q4pq)1p2q2(p4q4pq)2
  2. If p=12 then var(M10)=4.
Details:

Let P denote the probability generating function in [56]. Recall that P10(1)=E[M10(M101)], the second factorial moment, and so var(M10)=P10(1)+P10(1)[P10(1)]2 The stated result then follows from calculus.