Suppose again that our random experiment is to perform a sequence of Bernoulli trials with success parameter . In this section we will study the random variable that gives the trial number of the first success and the random variable that gives the number of failures before the first success.
Let , the trial number of the first success, and let , the number of failures before the first success. The distribution of is the geometric distribution on and the distribution of is the geometric distribution on . In both cases, is the success parameter of the distribution.
Since and differ by a constant, the properties of their distributions are very similar. Nonetheless, there are applications where it more natural to use one rather than the other, and in the literature, the term geometric distribution can refer to either. In this section, we will concentrate on the distribution of , pausing occasionally to summarize the corresponding results for . Throughout this text, we use the term geometric distribution and geometric distribution on to distinguish the two, as in [1].
The Probability Density Function
has probability density function given by for .
Details:
Note first that . By independence, the probability of this event is .
To check that is a valid PDF, we use standard results for geometric series
A priori, we might have thought it possible to have with positive probability; that is, we might have thought that we could run Bernoulli trials forever without ever seeing a success. However, we now know this cannot happen when the success parameter is positive.
The probability density function of is given by for .
In the negative binomial experiment, set to get the geometric distribution on . Vary with the scrollbar and note the shape and location of the probability density function. For selected values of , run the simulation 1000 times and compare the relative frequency function to the probability density function.
Note that the probability density functions of and are decreasing, and hence have modes at 1 and 0, respectively. The geometric form of the probability density functions also explains the term geometric distribution.
Distribution Functions and the Memoryless Property
Suppose that is a random variable taking values in . Recall that the ordinary distribution function of is the function . In this section, the complementary function will play a fundamental role. We will refer to this function as the right distribution function of . Of course both functions completely determine the distribution of . Suppose again that has the geometric distribution on with success parameter .
has right distribution function given by for .
Details:
For a proof using the Bernoulli trials formulation, note that . By independence, the probability of this event is .
For a direct proof, we use geometric series:
From [5], it follows that the ordinary (left) distribution function of is given by
We will now explore another characterization known as the memoryless property.
For , the conditional distribution of given is the same as the distribution of . That is,
So if the first success has not occurred by trial number , then the remaining number of trials needed to achieve the first success has the same distribution as the trial number of the first success in a fresh sequence of Bernoulli trials. In short, Bernoulli trials have no memory. This fact has implications for a gambler betting on Bernoulli trials (such as in the casino games roulette or craps). No betting strategy based on observations of past outcomes of the trials can possibly help the gambler.
Conversely, if is a random variable taking values in that satisfies the memoryless property in , then has a geometric distribution.
Details:
Let for . The memoryless property and the definition of conditional probability imply that for . Note that this is the law of exponents for . It follows that for . Hence has the geometric distribution with parameter .
Moments
Suppose again that is the trial number of the first success in a sequence of Bernoulli trials, so that has the geometric distribution on with parameter . The mean and variance of can be computed in several different ways.
Details:
Our first proof uses the probability density function in [2]. Using the derivative of the geometric series,
For our second proof, recall that since takes positive integer values, its expected value can be computed as the sum of the right distribution function. Hence
For our third proof, we condition on the first trial :
If then and hence . If (equivalently then by the memoryless property, has the same distribution as . Hence . In short
It follows that
Solving gives .
Exercise [8] makes intuitive sense. In a sequence of Bernoulli trials with success parameter we would expect to wait trials for the first success.
Details:
Our first proof uses the probability density function in [2] tp compute . This is an example of a factorial moment, and we will compute the general factorial moments in [12]. Using derivatives of the geometric series again,
Since , it follows that and hence
Our second proof uses a conditioning argument. From the proof of ,
and by the same reasoning, . Hence
Solving gives .
Note that if , hardly surprising since is deterministic (taking just the value 1) in this case. At the other extreme, as .
In the negative binomial experiment, set to get the geometric distribution. Vary with the scrollbar and note the location and size of the meanstandard deviation bar. For selected values of , run the simulation 1000 times and compare the sample mean and standard deviation to the distribution mean and standard deviation.
The factorial moments in [12] can be used to find the moments of about 0. The results then follow from the standard computational formulas for skewness and kurtosis.
Note that the geometric distribution is always positively skewed. Moreover, and as .
Suppose now that , so that (the number of failures before the first success) has the geometric distribution on . Then
for
Of course, the fact that the variance, skewness, and kurtosis are unchanged follows easily, since and differ by a constant.
The Quantile Function
Let denote the distribution function of , so that for . Recall that for is the quantile function of .
The quantile function of is
Of course, the quantile function, like the probability density function and the distribution function, completely determines the distribution of . Moreover, we can compute the median and quartiles to get measures of center and spread.
The first quartile, the median (or second quartile), and the third quartile are
Open the quantile app, and select the geometric distribution and CDF view. Vary and note the shape and location of the CDF/quantile function. For various values of , compute the median and the first and third quartiles.
The Constant Rate Property
Suppose that is a random variable taking values in , which we interpret as the first time that some event of interest occurs.
The function given by
is the rate function of .
If is interpreted as the (discrete) lifetime of a device, then is a discrete version of the failure rate function studied in reliability theory. However, in our usual formulation of Bernoulli trials, the event of interest is success rather than failure (or death), so we will simply use the term rate function to avoid confusion. The constant rate property characterizes the geometric distribution. As usual, let denote the trial number of the first success in a sequence of Bernoulli trials with success parameter , so that has the geometric distribution on with parameter .
has constant rate .
Details:
From the results above, and , so for .
Conversely, if has constant rate then has the geometric distrbution on with success parameter .
Details:
Let for . From the constant rate property, for . Next note that for . Thus, satisfies the recurrence relation for . Also satisfies the initial condition . Solving the recurrence relation gives for .
Relation to the Uniform Distribution
Suppose again that is a sequence of Bernoulli trials with success parameter . For , recall that , the number of successes in the first trials, has the binomial distribution with parameters and . As before, denotes the trial number of the first success.
Suppose that . The conditional distribution of given is uniform on .
Details:
Our first proof is a sampling argument. We showed in the last section that given , the trial numbers of the successes form a random sample of size chosen without replacement from . This result is a simple corollary with
For a direct proof, let . Then
In words, the events in the numerator of the last fraction are that there are no successes in the first trials, a success on trial , and no successes in trials to . These events are independent so
Note that the conditional distribution does not depend on the success parameter . If we know that there is exactly one success in the first trials, then the trial number of that success is equally likely to be any of the possibilities. Another connection between the geometric distribution and the uniform distribution is given for the alternating coin tossing game in [43]: the conditional distribution of given converges to the uniform distribution on as .
Relation to the Exponential Distribution
The Poisson process on , named for Simeon Poisson, is a model for random points in continuous time. There are many deep and interesting connections between the Bernoulli trials process (which can be thought of as a model for random points in discrete time) and the Poisson process. These connections are explored in detail in the chapter referenced above. In this section we just give the most famous and important result—the convergence of the geometric distribution to the exponential distribution. The geometric distribution, as we know, governs the time of the first random point in the Bernoulli trials process, while the exponential distribution governs the time of the first random point in the Poisson process.
The exponential distribution with rate parameter has distribution function for . The mean of the exponential distribution is and the variance is . In addition, the moment generating function is for .
For , suppose that has the geometric distribution on with success parameter , where as . Then the distribution of converges to the exponential distribution with parameter as .
Details:
Let denote the CDF of . Then for
But by a famous limit from calculus, as , and hence as . But by definition, or equivalently, so it follows that as . Hence as , which is the CDF of the exponential distribution.
The following result explores a connection between the binomial distribution studied in the last section and the geometric distribution studied in this section.
Suppose that is a sequence of Bernoulli trials with success parameter , so that has the binomial distribution with parameters and for each . Now suppose that has the geometric distribution on with success parameter and that is independent of . Then has the geometric distribution on with success parameter .
Details:
As is often the case, there are both analytic and probabilistic proofs. We start with the analytic proof, which of course involves conditioning on :
Using derivatives of the geometic series for the last sum we have
For the probabilistic proof, suppose that is a sequence of Bernoulli trials with success probability , and independent of . Of course, for we can interpret as the number of successes in the sequence in the first trials. But now, we can also interpret as the number of failures before the first success in the sequence. So is the number of success in the sequence before the first success in the sequence. Equivalently, is the number of failures before the first success in a sequence of Bernoulli trials where success on trial is
A standard, fair die is thrown until an ace occurs. Let denote the number of throws. Find each of the following:
The probability density function of .
The mean of .
The variance of .
The probability that the die will have to be thrown at least 5 times.
The quantile function of .
The median and the first and third quartiles.
Answwer:
for
for
Quartiles , ,
A type of missile has failure probability 0.02. Let denote the number of launches before the first failure. Find each of the following:
The probability density function of .
The mean of .
The variance of .
The probability of 20 consecutive successful launches.
The quantile function of .
The median and the first and third quartiles.
Details:
for
for
Quartiles , ,
A student takes a multiple choice test with 10 questions, each with 5 choices (only one correct). The student blindly guesses and gets one question correct. Find the probability that the correct question was one of the first 4.
Details:
0.4
Recall that an American roulette wheel has 38 slots: 18 are red, 18 are black, and 2 are green. Suppose that you observe red or green on 10 consecutive spins. Give the conditional distribution of the number of additional spins needed for black to occur.
In the negative binomial experiment, set to get the geometric distribution and set . Run the experiment 1000 times. Compute the appropriate relative frequencies and empirically investigate the memoryless property
The Petersburg Problem
We will now explore a gambling situation, known as the Petersburg problem, which leads to some famous and surprising results.
Suppose that we are betting on a sequence of Bernoulli trials with success parameter . We can bet any amount of money on a trial at even stakes: if the trial results in success, we receive that amount, and if the trial results in failure, we must pay that amount. We will use the following strategy, known as a martingale strategy:
We bet units on the first trial.
Whenever we lose a trial, we double the bet for the next trial.
We stop as soon as we win a trial.
Let denote the number of trials played, so that has the geometric distribution on with parameter , and let denote our net winnings when we stop.
Details:
The first win occurs on trial , so the initial bet was doubled times. The net winnings are
So is not random and is independent of ! Since is an arbitrary constant, it would appear that we have an ideal strategy. However, let us study the amount of money needed to play the strategy.
The expected amount of money needed for the martingale strategy is
So the strategy is fatally flawed when the trials are unfavorable and even when they are fair, since we need infinite expected capital to make the strategy work in these cases.
Compute explicitly if and .
Details:
$1000
In the negative binomial experiment, set . For each of the following values of , run the experiment 100 times. For each run compute (with ). Find the average value of over the 100 runs:
For more information about gambling strategies, see the section on red and black and the chpater on martingales.
The Alternating Coin-Tossing Game
A coin has probability of heads . There are players who take turns tossing the coin in round-robin style: player 1 first, then player 2, continuing until player , then player 1 again, and so forth. The first player to toss heads wins the game.
Let denote the number of the first toss that results in heads. Of course, has the geometric distribution on with parameter . Additionally, let denote the number of complete tail rounds played (all players toss tails) and let denote the winner of the game. So takes values in and in the set . We are interested in the joint distribution of .
Random variables , and are related by and so has joint probability density function given by
Details:
Clearly if there are complete tail rounds and then player tosses heads, then the trial number of the first head is . Conversely if the trial number of the first head is then there are initial tails and so the number of complete tail rounds is and then the winning player is . That is, and . The result then follows since has the geoemtric distribution with parameter .
The independence of and follows from [35] and the factoring theorem since the joint density is a product of a function of and a function of . Explicitly, we can rewrite the joint density function in [35] in the form
Open the alternating coin experiment. Vary the parameters and note the shape and location of the probability density functions. For selected values of the parameters, run the simulation 1000 times and compare the relative frequency functions to the probability density functions.
Describe the distributions of and when the coin is fair so that .
Details:
The number of rounds has the geometric distribution on with parameter . The winning player has probability density function given by
Describe the distributions of and with two players, so that .
Details:
The number of rounds has the geometric distribution on with parameter . The winning player has probability density function given by
Combining [38] and [39], we have two players with a fair coin. The number of rounds has the geometric distribution on with parameter . Player 1 wins with probability while player 2 wins with probability .
Let's return to the case of general and . Since has a geometric distribution on , all of the basic theory above applies, with replaced by . So for most of the remainder of this subsection, we will concentrate on random variable , the winning player.
for .
Details:
This result can be argued directly, using the memoryless property of the geometric distribution in [6]. In order for player to win, the previous players must first all toss tails. Then, player effectively becomes the first player in a new sequence of tosses. This result can be used to give another derivation of the probability density function in [36].
Note that is a decreasing function of . Not surprisingly, the lower the toss order the better for the player. Note also from [36] that itself has a truncated geometric distribution.
The distribution of is the same as the conditional distribution of given :
Details:
This follows from the same probabilistic argument as in the details of [40]. For a direct proof,
The result can be obtained by direct computation and standard results for geometric series. But an easier and better method is to use [35], the independence of and , and the moment generating functions of and :
But
The results can be obtained from the definitions of mean and variance and derivatives of geometric series. The results can also be obtained by taking derivatives of the probability generating function in [42]. For both methods the computations are extremely tedious. But the computations are easy using [35], the independence of and , and the corresponding moments of and .
. But and .
. But and .
Open the alternating coin experiment. Vary the parameters and note the shape and location of the meanstandard deviation bars. For selected values of the parameters, run the simulation 1000 times and compare the empirical means and standard deviations to the corresponding distribution values.
The following result explore some limiting distributions related to the alternating coin-tossing game.
Limiting distributions:
As for fixed , the distribution of converges to point mass at 0 and the distribution of converges to the geometric distribution on with parameter .
As for fixed , the distribution of converges to point mass at and the distribution of converges to the uniform distribution on .
Details:
Fix and let . Then and so the geometric distribution of converges to point mass at 0. From [36], for . Alternately, from [41], .
Fix and let . Then and so the geometric distribution of converges to point mass at . From [36] again and L'Hospital's rule, for . The probability generating function in [42] could also be used.
Players at the end of the tossing order should hope for a coin biased towards tails.
Open the alternating coin experiment. Although the parameter values are limited, you can beginn to see the convergence results in [45].
Set and increase from 2 to 10. Note how the density function of begins to resemble the density function of .
Set and decrease from 0.95 to 0.05. Note how the density function of becomes increasingly uniform.
Odd Man Out
In the game of odd man out, we start with a specified number of players, each with a coin that has the same probability of heads. The players toss their coins at the same time. If there is an odd man, that is a player with an outcome different than all of the other players, then the odd player is eliminated; otherwise no player is eliminated. In any event, the remaining players continue the game in the same manner. A slight technical problem arises with just two players, since different outcomes would make both players odd. So in this case, we might (arbitrarily) make the player with tails the odd man.
Suppose there are players and . In a single round, the probability of an odd man is
Details:
Let denote the number of heads. If , the event that there is an odd man is . If , the event that there is an odd man is . The result now follows since has a binomial distribution with parameters and .
The graph of is more interesting than you might think.
The graphs of for
For , has the following properties:
is symmetric about
For fixed , as .
Details:
These properties are clear from the functional form of . Note that .
For , has the following properties:
increases and then decreases, with maximum at .
is concave downward
Details:
This follows by computing the first derivatives: , , , and the second derivatives: , , .
For , has the following properties:
The maximum occurs at two points of the form and where and as .
The maximum value as .
The graph has a local minimum at .
Details:
Note that where for . Also, is the dominant term when while is the dominant term when . A simple analysis of the derivative shows that increases and then decreases, reaching its maximum at . Moreover, the maximum value is as . Also, is concave upward and then downward, with inflection point at .
Suppose , and let denote the number of rounds until an odd man is eliminated, starting with players. Then has the geometric distribution on with parameter . The mean and variance are
As we might expect, and as for fixed . On the other hand, from [50], and as .
Suppose we start with players and . The number of rounds until a single player remains is where are independent and has the geometric distribution on with parameter . The mean and variance are
Details:
The form of follows from [51]: is the number of rounds until the first player is eliminated. Then the game continues independently with players, so is the number of additional rounds until the second player is eliminated, and so forth. Parts (a) and (b) follow from [51] and standard properties of expected value and variance.
Starting with players and probability of heads , the total number of coin tosses is . The mean and variance are
Details:
As before, the form of follows from [49]: is the number of rounds until the first player is eliminated, and each these rounds has tosses. Then the game continues independently with players, so is the number of additional rounds until the second player is eliminated with each round having tosses, and so forth. Parts (a) and (b) also follow from [49] and standard properties of expected value and variance.
Number of Trials Before a Pattern
Consider again a sequence of Bernoulli trials with success parameter . Recall that the number of trials before the first success (outcome 1) occurs has the geometric distribution on with parameter . A natural generalization is the random variable that gives the number of trials before a specific finite sequence of outcomes occurs for the first time. (Such a sequence is sometimes referred to as a word from the alphabet or simply a bit string). In general, finding the distribution of this variable is a difficult problem, with the difficulty depending very much on the nature of the word. The problem of finding just the expected number of trials before a word occurs can be solved using powerful tools from the theory of renewal processes and from the theory of martingalges.
To set up the notation, let denote a finite bit string and let denote the number of trials before occurs for the first time. Finally, let . Note that takes values in . In the following exercises, we will consider , a success followed by a failure. As always, try to derive the results yourself before expanding the details.
The probability density function of is given as follows:
If then
If then for .
Details:
For , the event can only occur if there is an initial string of 0s of length followed by a string of 1s of length and then 1 on trial and 0 on trial . Hence
The stated result then follows from standard results on geometric series.
It's interesting to note that is symmetric in and , that is, symmetric about . It follows that the distribution function, probability generating function, expected value), and variance, which we consider below, are all also symmetric about . It's also interesting to note that , and this is the largest value. So regardless of the distribution is bimodal with modes 0 and 1.
The distribution function of is given as follows:
If then
If then for .
Details:
By definition, for . The stated result then follows from [54], standard results on geometric series, and some algebra.
The probability generating function of is given as follows:
If then
If then for
Details:
By definition,
for all for which the series converges absolutely. The stated result then follows from the theorem above, and once again, standard results on geometric series.
The mean of is given as follows:
If then
If then .
Details:
Recall that so the stated result follows from calculus and [56]. The mean can also be computed from the definition using standard results from geometric series, but this method is more tedious.
The graph of as a function of is given below. It's not surprising that as and as , and that the minimum value occurs when .
as a function of
The variance of is given as follows:
If then
If then .
Details:
Let denote the probability generating function in [56]. Recall that , the second factorial moment, and so
The stated result then follows from calculus.