Conditional Distributions

A number of characterizations of the standard exponential distribution deal with conditional distributions in various ways. Once again our starting point in this section is a measurable space \((S, \ms S)\) and a measurable semigroup \((S, \cdot)\) as discussed in Section 1. Recall that the graph associated with \((S, \cdot)\) is \((S, \rta)\) where \(x \rta y\) means that \(y \in x S\) for \(x, \, y \in S\). We assume that random variables in \(S\) are supported by \((S, \cdot)\).

Suppose that \(X\) and \(Y\) are independent random variables in \(S\) with reliability functions \(F\) and \(G\) for \((S, \rta)\), respectivley. Suppose also that \(X\) has an exponential distribution on \((S, \cdot)\) and that \(Y\) has a memoryless distribution on \((S, \cdot)\). Then the conditional distribution of \(X\) given \(Y \in X S\) is exponential on \((S, \cdot)\), with reliability function \(F G\).

Details:

First, since both distributions are memoryless, we have \begin{align*} (F G)(x y) &= F(x y) G(x y) = F(x) F(y) G(x) G(y) \\ &= [F(x) G(x)] [F(y) G(y)] = (FG)(x) (FG)(y), \quad x, \, y \in S \end{align*} Next, \(X\) has constant rate \(\alpha \in (0, \infty)\) with respect to a left-invariant measure \(\lambda\). Since \(G: S \to (0, 1]\) we have \[\frac{1}{\beta} := \int_S F(x) G(x) d\lambda(x) \le \int_S F(x) d\lambda(x) = \frac{1}{\alpha} \lt \infty\] From the characterization in Section 5, it follows that \(F G\) is the reliability function of an exponential distribution that has constant rate \(\beta\) with respect to \(\lambda\). It remains to show that this distribution is the conditional distribution of \(X\) given \(Y \in X S\). Towards this end, note that \begin{align*} \P(Y \in X S) &= \E[\P(Y \in X S \mid X)] = \E[G(X)] \\ &= \int_X G(x) \alpha F(x) d\lambda(x) = \alpha \int_S G(x) F(x) d\lambda(x) = \frac{\alpha}{\beta} \end{align*} Next, if \(A \in \ms S\), then by independence \begin{align*} \P(X \in A, Y \in X S) &= \E[\P(X \in A, Y \in X S \mid X)] \\ &= \E[G(X), X \in A] = \int_A G(x) \alpha F(x) d\lambda(x) \end{align*} and therefore \[\P(X \in A \mid Y \in X S) = \int_A \beta F(x) G(x) d\lambda(x)\] So the conditional density of \(X\) given \(Y \in X S\) is \(\beta F G\). Using the integral version of left invariance in Section 3 \begin{align*} \P(X \in x S \mid Y \in X S) &= \beta \int_{x S} F(y) G(y) d\lambda(y) = \beta \int_S F(x z) G(x z) d\lambda(z) \\ &= F(x) G(x) \int_S \beta F(z) G(z) d\lambda(z) = F(x) G(x), \quad x \in S \end{align*} so \(F G\) is the reliability function of \(X\) given \(Y \in X S\).

In the context of , suppose that \(X\) and \(Y\) are independent variables each with an exponential distribution on \((S, \cdot)\). Then the conditional distribution of \(X\) given \(Y \in X S\) and the conditional distribution of \(Y\) given \(X \in Y S\) are the same—exponential with reliability function \(F G\). There is a closely related result in the setting of a positive semigroup. It is the abstract version of the result in the standard continuous setting that states that the minimum of exponential variables is also exponential.

Suppose that \((S, \cdot)\) is a positive semigroup whose associated partial order graph \((S, \preceq)\) is a lower semi-lattice. Suppose also that random variables \(X\) and \(Y\) in \(S\) are right independent and memoryless with reliability functions \(F\) and \(G\) respectively. Then \(X \wedge Y\) is memoryless and has reliability function \(F G\).

Details:

The proof is trivial. \[\P(X \wedge Y \succeq x) = \P(X \succeq x, Y \succeq x) = \P(X \succeq x) \P(Y \succeq x) = F(x) G(x) \quad x \in S \] As above, \(F G\) is muliplicative.

We can reinterpret proposition in the setting of proposition : \(Y \in X S\) if and only if \(X \preceq Y\) if and only if \(X \wedge Y = X\). Also, as in if either \(X\) or \(Y\) is exponential then so is \(X \wedge Y\).

Suppose that \(X\) and \(Y\) are independent random variables in \(S\), and that the distribution of \(Y\) is exponential on \((S, \cdot)\) with reliability function \(G\).

The variables \(X\) and \(X^{-1} Y\) are conditionally independent given \(Y \in X S\).
The conditional distribution of \(X^{-1} Y\) given \(Y \in X S\) is the same as the distribution of \(Y\).
The conditional distribution of \(X\) given \(Y \in X S\) is defined by \[\P(X \in A \mid Y \in X S) = \frac{\E[G(X), X \in A]}{\E[G(X)]}, \quad A \in \ms S\]

Details:

As in the proof of , let \[\frac{1}{\beta} = \P(Y \in X S) = \E[\P(Y \in X S \mid X)] = \E[G(X)]\] Let \(A, \, B \in \ms S\). Using the exponential property of \(Y\), \begin{align*} \P(X \in A, X^{-1} Y \in B \mid Y \in X S) &= \frac{\P(X \in A, X^{-1} Y \in B, Y \in X S)}{\P(Y \in X S)} \\ &= \beta \P(X \in A, Y \in X B) \\ &= \beta \E[\P(X \in A, Y \in X B \mid X)] \\ &= \beta \E[\P(Y \in X B \mid X), X \in A] \\ &= \beta \E[\P(Y \in X S \mid X) \P(Y \in B \mid X), X \in A]\\ &= \beta \E[G(X) \P(Y \in B), X \in A] = \beta \P(Y \in B) \E[G(X), X \in A] \end{align*} It follows that \(X\) and \(X^{-1}Y\) are conditionally independent given \(Y \in X S\). Letting \(A = S\), \[\P(X^{-1} Y \in B \mid Y \in X S) = \P(Y \in B), \quad B \in \ms S\] Letting \(B = S\), \[\P(X \in A \mid Y \in X S) = \frac{\E[G(X), X \in A]}{\E[G(X)]}, \quad A \in \ms S\]

Suppose that \(X\) and \(Y\) are independent random variables, each with an exponential distribution on \((S, \cdot)\), and with reliability functions \(F\) and \(G\).

The variables \(X\) and \(X^{-1} Y\) are conditionally independent given \(Y \in X S\).
The conditional distribution of \(X^{-1} Y\) given \(Y \in X S\) is the same as the distribution of \(Y\).
The conditional distribution of \(X\) given \(Y \in X S\) is exponential with reliability function \(F G\).

Suppose now that \((T, \cdot)\) is a measurable sub-semigroup of \((S, \cdot)\). Recall that the underlying measurable space for \((T, \cdot)\) is \((T, \ms T)\) where \(\ms T = \{A \in \ms S: A \subseteq T\}\).

Suppose that \(X\) has an exponential distribution on \((S, \cdot)\) with reliability function \(F\) and that \(\P(X \in T) \gt 0\). Then

The reliability function of \(X\) given \(X \in T\) on \((T, \cdot)\) is the restriction of \(F\) to \(T\).
The distribution of \(X\) given \(X \in T\) is exponential on \((T, \cdot)\).
If \(X\) has constant rate \(\alpha \in (0, \infty)\) on \((S, \cdot)\) with respect to a left-invariant measure \(\lambda\), then the distribution of \(X\) given \(X \in T\) has constant rate \(\alpha / \P(X \in T)\) on \((T, \cdot)\) with respect to the restriction of \(\lambda\) to \(\ms T\).

Details:

Let \(F_T\) denote the reliability function of \(X\) given \(X \in T\) relative to \((T, \cdot)\). That is, \[F_T(x) = \P(X \in x T \mid X \in T), \quad x \in T\] Since \(x T \subseteq T\) and by the exponential property of \(X\) we have \[F_T(x) = \frac{\P(X \in x T)}{\P(X \in T)} = \frac{\P(X \in x S) \P(X \in T)}{\P(X \in T)} = \P(X \in x S) = F(x), \quad x \in T\]
Let \(A \in \ms T\) and \(x \in T\). Again since \(x A \subseteq T\) and by the exponential property of \(X\) we have \[\P(X \in x A \mid X \in T) = \frac{\P(X \in x A)}{\P(X \in T)} = \frac{\P(X \in x S) \P(X \in A)}{\P(X \in T)} = F_T(x) \P(X \in A \mid X \in T), \quad x \in T\]
Let \(f = \alpha F\) on \(S\) so that \(f\) is a density function of \(X\). The density function \(f_T\) of \(X\) given \(X \in T\) is \(f_T(x) = f(x) / \P(X \in T)\) for \(x \in T\). Hence \[f_T(x) = \frac{f(x)}{\P(X \in T)} = \frac{\alpha F(x)}{\P(X \in T)} = \frac{\alpha}{\P(X \in T)} F_T(x), \quad x \in T\]

We will extend this result in Section 8 on quotient spaces. Suppose now that \(\lambda\) is a fixed left-invariant measure for \((S, \cdot)\), so that the restriction of \(\lambda\) to \(\ms T\) is left invariant for \((T, \cdot)\). The following result revisits , but with only the constant rate property.

Suppose that \(X\) is supported by \((T, \cdot)\) and has constant rate \(\alpha \in (0, \infty)\) on \((S, \cdot)\). Then the conditional distribution of \(X\) given \(X \in T\) has rate function \(r_T\) on \((T, \cdot)\) defined by \[r_T(x) = \alpha \frac{F(x)}{F_T(x)} = \alpha \frac{\P(X \in x S)}{\P(X \in x T)}, \quad x \in T\]

Details:

Let \(F\) denote the reliability function of \(X\) on \((S, \cdot)\) so that \(F(x) = \P(X \in x S)\) for \(x \in S\). By assumption, \(f = \alpha F\) is a density of \(X\). The reliability function \(F_T\) of \(X\) given \(X \in T\) on \((T, \cdot)\) is defined by \[F_T(x) = \P(X \in x T \mid X \in T) = \frac{\P(X \in x T)}{\P(X \in T)}, \quad x \in T\] The density function \(f_T\) of \(X\) given \(X \in T\) is defined by \[f_T(x) = \frac{f(x)}{\P(X \in T)} = \alpha \frac{F(x)}{\P(X \in T)}\] Hence the rate function \(r_T\) of \(X\) given \(X \in T\) on \((T, \cdot)\) is defined by \[r_T(x) = \frac{f_T(x)}{F_T(x)} = \alpha \frac{\P(X \in x S)}{\P(X \in x T)}, \quad x \in T\]

So in general, the conditional distribution of \(X\) given \(X \in T\) does not have constant rate on \((T, \cdot)\), but there is an important exception. For the corollary and proposition that follow, suppose that \((S, \cdot)\) is a discrete positive semigroup with identity element \(e\), and let \(S_+ = \{x \in S: x \ne e\}\). By definition, \((S_+, \cdot)\) is a sub-semigroup of \((S, \cdot)\). The relation \(\preceq\) associated with \((S, \cdot)\) is a partial order, and the relation \(\prec\) associated with \((S_+, \cdot)\) is the corresponding strict partial order restricted to \(S_+\). Of course, counting measure \(\#\) is left invariant for \((S, \cdot)\), and is the unique such measure, up to multiplication by positive constants.

Suppose that \(X\) is a random variable in \(S\).

If \(X\) has an exponential distribution for \((S, \cdot)\) then the distribution of \(X\) given \(X \in S_+\) is exponential for \((S_+, \cdot)\).
If \(X\) has constant rate \(\alpha \in (0, 1)\) for \((S, \preceq)\) then the conditional distribution of \(X\) given \(X \in S_+\) has constant rate \(\beta = \alpha / (1 - \alpha)\) for \((S_+, \prec)\).

Details:

This follows directly from .
Let \(f\) denote the probability density function of \(X\) and let \(F\) denote the reliability function of \(X\) for \((S, \preceq)\). From and the constant rate property, the rate function \(r_+\) of \(X\) for \((S_+, \prec)\) is given by \[r_+(x) = \alpha \frac{F(x)}{F(x) - f(x)} = \alpha \frac{F(x)}{F(x) - \alpha F(x)} = \frac{\alpha}{1 - \alpha}, \quad x \in S_+\]

Suppose that random variable \(Y\) in \(S_+\) has constant rate \(\beta \in (0, \infty)\) for \((S_+, \prec)\). Let \(\alpha = \beta / (1 + \beta)\) and define \(X\) on \(S\), indepedently of \(Y\), by \(\P(X = e) = \alpha\) and \(\P(X = Y) = (1 - \alpha)\). Then \(X\) has constant rate \(\alpha\) on \((S, \preceq)\). Moreover, if \(Y\) has an exponential distribution on \((S_+, \cdot)\) then \(X\) has an exponential distribution on \((S, \cdot)\).

Details:

Let \(f\) and \(g\) denote the probability density functions of \(X\) and \(Y\) respectively. By assumption, \(f(e) = \alpha\) and \(f(x) = (1 - \alpha) g(x)\) for \(x \in S_+\). Let \(F\) and \(G\) denote the reliability functions of \(X\) and \(Y\) for \((S, \preceq)\) and \((S_+, \prec)\) respectively. Then \(F(e) = \alpha\) and, by the constant rate property of \(Y\) and the definition of \(\alpha\), \begin{align*} F(x) &= \sum_{x \preceq y} f(y) = \sum_{x \preceq y}(1 - \alpha) g(y) = (1 - \alpha) g(x) + \sum_{x \prec y} (1 - \alpha) g(y) \\ &= (1 - \alpha) \beta G(x) + (1 - \alpha) G(x) = (1 - \alpha)(1 + \beta) G(x) = G(x), \quad x \in S_+ \end{align*} But also, \(f(e) = \alpha F(e)\) and \[f(x) = (1 - \alpha) g(x) = (1 - \alpha) \beta G(x) = \alpha F(x), \quad x \in S_+\] Hence \(X\) has constant rate \(\alpha\) for \((S, \preceq)\). Moreover, if \(Y\) has an exponential distribution for \((S_+, \cdot)\) then \(Y\) is memoryless. Hence \[F(x y) = G(x y) = G(x) G(y) = F(x) F(y), \quad x, \, y \in S_+\] Trivially this also holds of \(x = e\) or \(y = e\). Hence \(X\) is also memoryless.

Clearly is a converse to since the conditional distribution of \(X\) given \(X \in S_+\) is the distribution of \(Y\).

Examples

Recall that the standard continuous semigroup is \(([0, \infty), +)\), with associated total order \(\le\), where \([0, \infty)\) is given the usual Borel \(\sigma\)-algebra with Lebesuge measure \(\lambda\) as the reference measure. For this semigroup, the basic results above are standard and well known. An exponential distribution for this semigroup is a standard exponential distribution on \([0, \infty)\), with reliability function of the form \(x \mapsto e^{-\alpha x}\) where \(\alpha \in (0, \infty)\) is the constant rate.

Suppose that \(X, \, Y\) are independent, exponential variables in \([0, \infty)\), with rate parameters \(\alpha, \, \beta \in (0, \infty)\), respectively. Then the distribution of \(X \wedge Y\) and the conditional distribution of \(X\) given \(X \le Y\) are the same, namely exponential with rate \(\alpha + \beta\).

Suppose that \(X, \, Y\) are independent variables in \([0, \infty)\) and that \(X\) has density function \(f\) and \(Y\) is exponential with rate \(\beta \in (0, \infty)\).

\(X\) and \(Y - X\) are conditionally independent given \(X \le Y\).
The conditional distribution of \(Y - X\) given \(X \le Y\) is also exponential with rate \(\beta\).
The conditional distribution of \(X\) given \(X \le Y\) has density function \(f_1\) defined by \(f_1(x) = f(x) e^{-\beta x} / C\), where \(C = \int_0^\infty f(t) e^{-\beta t} dt\) is the normalizing constant.

The standard continuous space, and related spaces, are studied in Chapter 3. Recall next that the standard discrete space is \((\N, +)\) with associated total order \(\le\). Of course \(\N\) is given the \(\sigma\)-algebra \(\ms P(\N)\) of all subsets and counting measure \(\#\) is the reference measure. For this space, as with the standard continuous space, the basic results above are standard and well known. An exponential distribution for \((\N, +)\) is a standard geometric distribution on \(\N\). The constant rate \(p \in (0, 1)\) is the success parameter so that the reliability function is \(x \mapsto (1 - p)^x\).

Suppose that \(X, \, Y\) are independent, geometric variables in \(\N\), with success parameters \(p, \, q \in (0, 1)\), respectively. Then the distribution of \(X \wedge Y\) and the conditional distribution of \(X\) given \(X \le Y\) are the same, namely geoemtric with success parameter \(p + q - p q\).

Suppose that \(X, \, Y\) are independent variables in \(\N\) and that \(X\) has density function \(f\) and \(Y\) is geometric with success parameter \(q \in (0, 1)\).

\(X\) and \(Y - X\) are conditionally independent given \(X \le Y\).
The conditional distribution of \(Y - X\) given \(X \le Y\) is also geometric with success parameter \(q\).
The conditional distribution of \(X\) given \(X \le Y\) has density function \(f_1\) defined by \(f_1(x) = f(x) (1 - q)^x / C\), where \(C = \sum_{t = 0}^\infty f(t) (1 - q)^t\) is the normalizing constant.

Suppose that \(X\) has the geometric distribution on \(\N\) with success parameter \(p \in (0, 1)\). Then the conditional distribution of \(X\) given \(X \in \N_+\) is geometric on \(\N_+\) with success parameter \(p / (1 - p)\).

The standard discrete space, and related spaces, are studied in more detail in Chapter 4. In addition, the basic results in this section are explored for the free semigroup in Chapter 5 and the arithmetic semigroups in Chapter 6.

6. Conditional Distributions

Basic Theory

Examples