Two of the most important modes of convergence in probability theory are convergence with probability 1, and convergence in mean. As we have noted several times, neither mode of convergence implies the other. However, if we impose an additional condition on the sequence of variables, convergence in probability implies convergence in mean. The purpose of this brief section, is to explore the additional condition that is needed. This section is particularly important for the theory of martingales.
As usual, our starting point is a random experiment modeled by a probability space \( (\Omega, \mathscr{F}, \P) \), so that \( \Omega \) is the set of outcomes, \( \mathscr{F} \) is the \( \sigma \)-algebra of events, and \( \P \) is the probability measure on the sample space \( (\Omega, \mathscr F) \). In this section, all random variables that are mentioned are assumed to be real valued, unless otherwise noted. Next, recall that for \( k \in [1, \infty) \), \( \mathscr{L}_k \) is the vector space of random variables \( X \) with \( \E(|X|^k) \lt \infty \), endowed with the norm \( \|X\|_k = \left[\E(X^k)\right]^{1/k} \). In particular, \( X \in \mathscr{L}_1 \) simply means that \( \E(|X|) \lt \infty \) so that \( \E(X) \) exists as a real number. Finally, recall the following notation for the expected value of a random variable \(X\) over an event \(A\), assuming of course that the expected value makes sense: \[ \E(X; A) = \E(X \bs{1}_A) = \int_A X \, d\P \]
The following result is motivation for the main definition in this section.
If \( X \) is a random variable then \( \E(|X|) \lt \infty \) if and only if \( \E(|X|; |X| \ge x) \to 0 \) as \( x \to \infty \).
Note that that \( |X| \bs{1}(|X| \le x) \) is nonnegative, increasing in \( x \in [0, \infty) \) and \( |X| \bs{1}(|X| \le x) \to |X| \) as \( x \to \infty \). From the monotone convergence theorem, \( \E(|X|; |X| \le x) \to \E(|X|) \) as \( x \to \infty \). On the other hand, \[ \E(|X|) = \E(|X|; |X| \le x) + \E(|X|; |X| \gt x) \] If \( \E(|X|) \lt \infty \) then taking limits in the displayed equation shows that \( \E(|X|: |X| \gt x) \to 0 \) as \( x \to \infty \). On the other hand, \( \E(|X|; |X| \le x) \le x \). So if \( \E(|X|) = \infty \) then \( \E(|X|; |X| \gt x) = \infty \) for every \( x \in [0, \infty) \).
Suppose now that \( X_i \) is a random variable for each \( i \) in a nonempty index set \( I \) (not necessarily countable). The critical definition for this section is to require the convergence in the previous theorem to hold uniformly for the collection of random variables \( \bs X = \{X_i: i \in I\} \).
The collection \( \bs X = \{X_i: i \in I\} \) is uniformly integrable if for each \( \epsilon \gt 0 \) there exists \( x \gt 0 \) such that for all \( i \in I \), \[ \E(|X_i|; |X_i| \gt x) \lt \epsilon \] Equivalently \( \E(|X_i|; |X_i| \gt x) \to 0 \) as \( x \to \infty \) uniformly in \( i \in I \).
Our next discussion centers on conditions that ensure that the collection of random variables \( \bs X = \{X_i: i \in I\} \) is uniformly integrable. Here is an equivalent characterization:
The collection \( \bs X = \{X_i: i \in I\} \) is uniformly integrable if and only if the following conditions hold:
Suppose that \( \bs X \) is uniformly integrable. With \( \epsilon = 1 \) there exists \( x \gt 0 \) such that \( \E(|X_i|; |X_i| \gt x) \lt 1 \) for all \( i \in I \). Hence \[ \E(|X_i|) = \E(|X_i|; |X_i| \le x) + \E(|X_i|; |X_i| \gt x) \lt x + 1, \quad i \in I \] so (a) holds. For (b), let \( \epsilon \gt 0 \). There exists \( x \gt 0 \) such that \( \E(|X_i|; |X_i| \gt x) \lt \epsilon / 2 \) for all \( i \in I \). Let \( \delta = \epsilon / 2 x \). If \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \[ \E(|X_i|; A) = \E(|X_i|; A \cap \{|X| \le x\}) + \E(|X_i|; A \cap \{|X| \gt x\}) \le x \P(A) + \E(|X_i|; |X| \gt x) \lt \epsilon / 2 + \epsilon / 2 = \epsilon\] Conversely, suppose that (a) and (b) hold. By (a), there exists \( c \gt 0 \) such that \( \E(|X_i|) \le c \) for all \( i \in I \). Let \( \epsilon \gt 0 \). By (b) there exists \( \delta \gt 0 \) such that if \( A \in \mathscr{F} \) with \( \P(A) \lt \delta \) then \( \E(|X_i|; A) \lt \epsilon \) for all \( i \in I \). Next, by Markov's inequality, \[ \P(|X_i| \gt x) \le \frac{\E(|X_i|)}{x} \le \frac{c}{x}, \quad i \in I \] Pick \( x \gt 0 \) such that \( c / x \lt \delta \), so that \(\P(|X_i| \gt x) \lt \delta\) for each \( i \in I \). Then for each \( j \in I \), \( \E(|X_i|; |X_j| \gt x) \lt \epsilon \) for all \( i \in I \) and so in particular, \( \E(|X_i|; |X_i| \gt x) \lt \epsilon \) for all \( i \in I \). Hence \( \bs X \) is uniformly integrable.
Condition (a) means that \( \bs X \) is bounded (in norm) as a subset of the vector space \( \mathscr{L}_1 \). Trivially, a finite collection of absolutely integrable random variables is uniformly integrable.
Suppose that \( I \) is finite and that \( \E(|X_i|) \lt \infty \) for each \( i \in I \). Then \( \bs X = \{X_i: i \in I\} \) is uniformly integrable.
A subset of a uniformly integrable set of variables is also uniformly integrable.
If \( \{X_i: i \in I\} \) is uniformly integrable and \( J \) is a nonempty subset of \( I \), then \( \{X_j: j \in J\} \) is uniformly integrable.
If the random variables in the collection are dominated in absolute value by a random variable with finite mean, then the collection is uniformly integrable.
Suppose that \( Y \) is a nonnegative random variable with \( \E(Y) \lt \infty \) and that \( |X_i| \le Y \) for each \( i \in I \). Then \( \bs X = \{X_i: i \in I\} \) is uniformly integrable.
The following result is more general, but essentially the same proof works.
Suppose that \( \bs Y = \{Y_j: j \in J\} \) is uniformly integrable, and \( \bs X = \{X_i: i \in I\} \) is a set of variables with the property that for each \( i \in I \) there exists \( j \in J \) such that \( |X_i| \le |Y_j| \). Then \( \bs X \) is uniformly integrable.
As a simple corollary, if the variables are bounded in absolute value then the collection is uniformly integrable.
If there exists \( c \gt 0 \) such that \( |X_i| \le c \) for all \( i \in I \) then \( \bs X = \{X_i: i \in I\} \) is uniformly integrable.
Just having \( \E(|X_i|) \) bounded in \( i \in I \) (condition (a) in ) is not sufficient for \( \bs X = \{X_i: i \in I\} \) to be uniformly integrable; example below is a counterexample. However, if \( \E\left(|X_i|^k\right) \) is bounded in \( i \in I \) for some \( k \gt 1 \), then \( \bs X \) is uniformly integrable. This condition means that \( \bs X \) is bounded (in norm) as a subset of the vector space \( \mathscr{L}_k \).
If \( \left\{\E\left(|X_i|^k\right): i \in I\right\} \) is bounded for some \( k \gt 1 \), then \( \{X_i: i \in I\} \) is uniformly integrable.
Suppose that for some \( k \gt 1 \) and \( c \gt 0 \), \( \E\left(|X_i|^k\right) \le c \) for all \( i \in I \). Then \( k - 1 \gt 0 \) and so \( t \mapsto t^{k-1} \) is increasing on \( (0, \infty) \). So if \( |X_i| \gt x \) for \( x \gt 0 \) then \[ |X_i|^k = |X_i| |X_i|^{k-1} \ge |X_i| x^{k-1} \] Hence \( |X_i| \le |X_i|^k / x^{k-1} \) on the event \( |X_i| \gt x \). Therefore \[ \E(|X_i|; |X_i| \gt x) \le \E\left(\frac{|X_i|^k}{x^{k-1}}; |X_i| \gt x\right) \le \frac{\E(|X_i|^k)}{x^{k-1}} \le \frac{c}{x^{k-1}} \] The last expression is independent of \( i \in I \) and converges to 0 as \( x \to \infty \). Hence \( \bs X \) is uniformly integrable.
Uniformly integrability is closed under the operations of addition and scalar multiplication.
Suppose that \( \bs X = \{X_i: i \in I\} \) and \( \bs Y = \{Y_i: i \in I\} \) are uniformly integrable and that \( c \in \R \). Then each of the following collections is also uniformly integrable.
We use the characterization in .
The following corollary is trivial, but will be needed in our discussion of convergence below.
Suppose that \( \{X_i: i \in I\} \) is uniformly integrable and that \( X \) is a random variable with \( \E(|X|) \lt \infty \). Then \( \{X_i - X: i \in I\} \) is uniformly integrable.
We now come to the main results, and the reason for the definition of uniform integrability in the first place. To set up the notation, suppose that \( X_n \) is a random variable for \( n \in \N_+ \) and that \( X \) is a random variable. We know that if \( X_n \to X \) as \( n \to \infty \) in mean then \( X_n \to X \) as \( n \to \infty \) in probability. The converse is also true if and only if the sequence is uniformly integrable. Here is the first half:
If \( X_n \to X \) as \( n \to \infty \) in mean, then \( \{X_n: n \in \N\} \) is uniformly integrable.
The hypothesis means that \( X_n \to X \) as \( n \to \infty \) in the vector space \( \mathscr{L}_1 \). That is, \( \E(|X_n|) \lt \infty \) for \( n \in \N_+ \), \( \E(|X|) \lt \infty \), and \( E(|X_n - X|) \to 0 \) as \( n \to \infty \). From the section on vector spaces of random variables, we know that this implies that \( \E(|X_n|) \to \E(|X|) \) as \( n \to \infty \), so \( \E(|X_n|) \) is bounded in \( n \in \N \). Let \( \epsilon \gt 0 \). Then there exists \( N \in \N_+ \) such that if \( n \gt N \) then \( \E(|X_n - X|) \lt \epsilon/2 \). Since all of our variables are in \( \mathscr{L}_1 \), for each \( n \in \N_+ \) there exists \( \delta_n \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta_n \) then \( \E(|X_n - X|; A) \lt \epsilon / 2 \). Similarly, there exists \( \delta_0 \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta_0 \) then \( \E(|X|; A) \lt \epsilon / 2 \). Let \( \delta = \min\{\delta_n: n \in \{0, 1, \ldots, N\}\} \) so \( \delta \gt 0 \). If \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \[\E(|X_n|; A) = \E(|X_n - X + X|; A) \le \E(|X_n - X|; A) + \E(|X|; A), \quad n \in \N_+\] If \( n \le N \) then \( \E(|X_n - X|; A) \le \epsilon / 2 \) since \( \delta \le \delta_n \). If \( n \gt N \) then \( \E(|X_n - X|; A) \le \E(|X_n - X|) \lt \epsilon / 2 \). For all \( n \), \( E(|X|; A) \lt \epsilon / 2 \) since \( \delta \le \delta_0 \). So for all \( n \in \N_+ \), \( \E(|X_n|: A) \lt \epsilon \) and hence \( \{X_n: n \in \N_+\} \) is uniformly integrable.
Here is the more important half, known as the uniform integrability theorem:
If \( \{X_n: n \in \N_+\} \) is uniformly integrable and \( X_n \to X \) as \( n \to \infty \) in probability, then \( X_n \to X \) as \( n \to \infty \) in mean.
Since \( X_n \to X \) as \( n \to \infty \) in probability, we know that there exists a subsequence \( \left(X_{n_k}: k \in \N_+\right) \) of \( (X_n: n \in \N_+) \) such that \( X_{n_k} \to X \) as \( k \to \infty \) with probability 1. By the uniform integrability, \( \E(|X_n|) \) is bounded in \( n \in \N_+ \). Hence by Fatou's lemma, \[ \E(|X|) = \E\left(\liminf_{k \to \infty} \left|X_{n_k}\right|\right) \le \liminf_{n \to \infty} \E\left(\left|X_{n_k}\right|\right) \le \limsup_{n \to \infty} \E\left(\left|X_{n_k}\right|\right) \lt \infty \] Let \( Y_n = X_n - X \) for \( n \in \N_+ \). From corollary , we know that \( \{Y_n: n \in \N_+\} \) is uniformly integrable, and we also know that \( Y_n \) converges to 0 as \( n \to \infty \) in probability. Hence we need to show that \( Y_n \to 0 \) as \( n \to \infty \) in mean. Let \( \epsilon \gt 0 \). By uniform integrability, there exists \( \delta \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \( \E(|Y_n|: A) \lt \epsilon / 2 \) for all \( n \in \N \). Since \( Y_n \to 0 \) as \( n \to \infty \) in probability, there exists \( N \in \N_+ \) such that if \( n \gt N \) then \( \P(|Y_n| \gt \epsilon / 2) \lt \delta \). Hence if \( n \gt N \) then \[ \E(|Y_n|) = \E(|Y_n|; |Y_n| \le \epsilon / 2) + \E(|Y_n|; |Y_n| \gt \epsilon / 2) \lt \epsilon / 2 + \epsilon / 2 = \epsilon \] Hence \( Y_n \to 0 \) as \( n \to \infty \) in mean.
As a corollary, recall that if \( X_n \to X \) as \( n \to \infty \) with probability 1, then \( X_n \to X \) as \( n \to \infty \) in probability. Hence if \( \bs X = \{X_n: n \in \N_+\} \) is uniformly integrable then \( X_n \to X \) as \( n \to \infty \) in mean.
Our first example shows that bounded \( \mathscr{L}_1 \) norm is not sufficient for uniform integrability.
Suppose that \( U \) is uniformly distributed on the interval \( (0, 1) \) (so \( U \) has the standard uniform distribution). For \( n \in \N_+ \), let \( X_n = n \bs{1}(U \le 1 / n) \). Then
First note that \( |X_n| = X_n \) since \( X_n \ge 0 \).
By part (b), \( \E(|X_n|; |X_n| \gt x) \) does not converge to 0 as \( x \to \infty \) uniformly in \( n \in \N_+ \), so \( \bs X = \{X_n: n \in \N_+\} \) is not uniformly integrable.
The next example gives an important application to conditional expected value. Recall that if \( X \) is a random variable with \( \E(|X|) \lt \infty \) and \( \ms{G} \) is a sub \( \sigma \)-algebra of \( \ms{F} \) then \( \E(X \mid \mathscr{G}) \) is the expected value of \( X \) given the information in \( \mathscr{G} \), and is the \( \mathscr{G} \)-measurable random variable closest to \( X \) in a sense. Indeed if \( X \in \ms{L}_2(\ms{F}) \) then \( \E(X \mid \ms{G}) \) is the projection of \( X \) onto \( \ms{L}_2(\ms{G}) \).
Suppose that \(\{X_i: i \in I\}\) is a uniformly integrable collection of random variables defined on the probability space \((\Omega, \ms F, \P)\). The the following collection is also uniformly integrable:
\[ \{\E(X_i \mid \ms G): i \in I, \ms G \text{ is a sub } \sigma \text{-algebra of } \ms F\} \]
The proof uses both proposition and definition . First, there exist \(C \in (0, \infty)\) with \(\E(|X_i|) \le C\) for all \(i \in I\). Let \(\epsilon > 0\). Then there exists \(\delta \gt 0\) such that if \(A \in \ms F\) with \(\P(A) \lt \delta\) then \(\E(|X_i|; A) \lt \epsilon\) for all \(i \in I\). Now let \(x = C / \delta\). If \(i \in I\) and \(\ms G\) is a sub \(\sigma\)-algebra of \(\ms F\) then by Markov's inequality and basic properties of condtional expected value,
\[ \P[|\E(X_i \mid \ms G)| \gt x] \le \E[|\E(X_i \mid \ms G)|] / x \le \E[\E(|X_i| \mid \ms G)] / x \le E(|X_i|) / x \le C / x = \delta\]
Hence since \(\{|\E(X_i \mid \ms G)| \gt x\} \in \ms G\) we have
\[ \E[|\E(X_i \mid \ms G)|; |\E(X_i \mid \ms G)| \gt x] \le \E[\E(|X_i| \mid \ms G); |\E(X_i \mid \ms G)| \gt x] = \E[|X_i|; |\E(X_i \mid \ms G)| \gt x] \lt \epsilon \]
Details:
As a simple, but important corollary, if \(X\) is a random variable with \(\E(|X|) \lt \infty \) then the collection of conditional expected values of \(X\) \[ \{\E(X \mid \ms G): \ms G \text{ is a sub } \sigma \text{-algebra of }\ms F \} \] is uniformly integrable. The conditional expected values range from \( \E(X) \), when \( \mathscr{G} = \{\Omega, \emptyset\} \) to \( X \) itself, when \( \mathscr{G} = \mathscr{F} \).