The subject of this section is the integral associated with a positve measure, a concept of fundamental importance in probability theory. Computing probabilities by means of probability density functions involves such integrals. In addition, expected value, a basic concept in probabiity, can be interpreted as an integral with respect to a probability measure. Beyond probability, the general theory of integration is critcally important in many areas of mathematics.
Our starting point is a measure space \( (S, \ms S, \mu) \). That is, \( S \) is a set, \( \ms S \) is a \( \sigma \)-algebra of subsets of \( S \), and \( \mu \) is a positive measure on \( \ms S \). Here are the most important special cases for us:
Special measure spaces
In the context of (b), our measure space is often a subspace \((S, \ms S, \lambda^n) \) where \(S \in \ms R^n\) for some \(n \in \N_+\) with \(\lambda^n(S) \gt 0\) and where \(\ms S = \{A \in \ms R^n: A \subseteq S\}\). The following definition reflects the fact that in measure theory, sets of measure 0 are often considered unimportant.
Consider a statement with \(x \in S \) as a free variable. Technically such a statement is a predicate on \( S \). Suppose that \( A \in \ms S \).
A typical statement that we have in mind is an equation or an inequality with \( x \in S \) as a free variable.
Our goal in this section is to define the integral of certain measurable functions \( f: S \to \R \), with respect to the measure \( \mu \). Here is the terminology and notation that we will use:
Suppose that \(f: S \to \R\) is measurable.
When it exists, the integral of \(f\) is denoted variously by \[ \int_S f \, d\mu, \; \int_S f(x) \, d\mu(x), \; \int_S f(x) \mu(dx) \]
Many authors use integrable to mean what we have defined as absolutely integrable. Trivially if \(f\) is absolutely integrable then it is integrable, but as we will see very shortly, the converse is not true. Since the set of extended real numbers \( \R^* = \R \cup \{-\infty, \infty\} \) plays an important role in the theory, we need to recall the arithmetic of \( \infty \) and \( -\infty \). Here are the conventions that are appropriate for integration:
Arithmetic on \( \R^* \)
However, \( \infty - \infty \) is not defined (because it does not make consistent sense) and we must be careful never to produce this indeterminate form. You might recall from calculus that \( 0 \cdot \infty \) is also an indeterminate form. However, for the theory of integration, the convention that \( 0 \cdot \infty = 0 \) is convenient and consistent. In terms of order of course, \(-\infty \lt a \lt \infty\) for \( a \in \R \).
We also need to extend topology and measure to \( \R^* \). In terms of the first, \( (a, \infty] \) is an open neighborhood of \( \infty \) and \( [-\infty, a) \) is an open neighborhood of \( -\infty \) for every \( a \in \R \). This ensures that if \( x_n \in \R \) for \( n \in \N_+ \) then \( x_n \to \infty \) or \( x_n \to -\infty \) as \( n \to \infty \) has its usual calculus meaning. Technically this topology results in the two-point compactification of \( \R \). Now we can give \( \R^* \) the Borel \( \sigma \)-algebra \( \ms R^* \), that is, the \( \sigma \)-algebra generated by the topology. Basically, this simply means that if \( A \in \ms R \) then \( A \cup \{\infty\} \), \( A \cup \{-\infty\} \), and \( A \cup \{-\infty, \infty\} \) are all in \( \ms R^* \).
As motivation for the definition, every version of integration should satisfy some basic properties. First, the integral of the indicator function of a measurable set should simply be the size of the set, as measured by \( \mu \). This gives our first definition:
If \( A \in \ms S \) then \( \int_S \bs 1_A \, d\mu = \mu(A) \).
This definition hints at the intimate relationship between measure and integration. We will construct the integral from the measure \( \mu \) in this section, but this first property shows that if we started with the integral, we could recover the measure. This property also shows why we need \( \infty \) as a possible value of the integral, and coupled with some of the properties below, why \( -\infty \) is also needed. Here is a simple corollary of our first definition.
\( \int_S 0 \, d\mu = 0 \)
Note that \( \int_S 0 \, d\mu = \int_S \bs 1_\emptyset \, d\mu = \mu(\emptyset) = 0 \).
We give three more essential properties that we want. First are the linearity properties in two parts—part (a) is the additive property and part (b) is the scaling property.
The following properties should hold:
The additive property almost implies the scaling property. The following steps do not constitute a proof because questions of the existence of the integrals are ignored and because the limit interchange in the last step is not justified. Still, the argument shows the close relationship between the additive property and the scaling property.
To be more explicit, we want property (a) in to hold if at least one of the integrals on the right is finite, or if both are \( \infty \) or if both are \( -\infty \). What is ruled out are the two cases where one integral is \( \infty \) and the other is \( -\infty \), and this is what is meant by the indeterminate form \( \infty - \infty \). Our next essential properties are the order properties, again in two parts—part (a) is the positive property and part (b) is the increasing property.
The following properties should hold:
The additive property in and the positive property imply the increasing property: Suppose that \(f\) and \(g\) are integrable and \( f \le g \) on \( S \). Then \( g - f \ge 0 \) on \( S \) and \( g = f + (g - f) \). If \( \int_S f \, d\mu = -\infty \), then trivially. \( \int_S f \, d\mu \le \int_S g \, d\mu \). Otherwise, by the additivity property, \[ \int_S g \, d\mu = \int_S f \, d\mu + \int_S (g - f) \, d\mu\] But \( \int_S (g - f) \, d\mu \ge 0 \) by the positive property (so in particular the right side is not \( -\infty + \infty \)), and hence \( \int_S g \, d\mu \ge \int_S f \, d\mu \)
Our last essential property is perhaps the least intuitive, but is a type of continuity property of integration, and is closely related to the continuity property of positive measure. The official name is the monotone convergence theorem.
The following property should hold: If \( f_n: S \to [0, \infty) \) is measurable for \( n \in \N_+ \) and \( f_n \) is increasing in \( n \) then \[ \int_S \lim_{n \to \infty} f_n \, d\mu = \lim_{n \to \infty} \int_S f_n d \mu \]
Note that since \( f_n \) is increasing in \( n \), \( \lim_{n \to \infty} f_n(x) \) exists in \( \R \cup \{\infty\} \) for each \( x \in \R \) (and the limit defines a measurable function). This property shows that it is sometimes convenient to allow nonnegative functions to take the value \( \infty \). Note also that by the increasing property , \( \int_S f_n \, d\mu \) is increasing in \( n \) and hence also has a limit in \( \R \cup \{\infty\} \).
To see the connection with measure, suppose that \( (A_1, A_2, \ldots) \) is an increasing sequence of sets in \( \ms S \), and let \( A = \bigcup_{i=1}^\infty A_i \). Note that \( \bs 1_{A_n} \) is increasing in \( n \in \N_+ \) and \( \bs 1_{A_n} \to \bs 1_{A} \) as \( n \to \infty \). For this reason, the union \( A \) is sometimes called the limit of \( A_n \) as \( n \to \infty \). The continuity theorem of positive measure states that \( \mu(A_n) \to \mu(A) \) as \( n \to \infty \). Equivalently, \(\int_S \bs 1_{A_n} \, d\mu \to \int_S \bs 1_A \, d\mu\) as \( n \to \infty \), so the continuity theorem of positive measure is a special case of the monotone convergence theorem.
Armed with the properties that we want, the definition of the integral is fairly straightforward, and proceeds in stages. We give the definition successively for
Of course, each definition should agree with the previous one on the functions that are in both collections.
A simple function on \( S \) is simply a measurable, real-valued function with finite range.
Simple functions are usually expressed as linear combinations of indicator functions.
Representations of simple functions
You might wonder why we don't just always use the canonical representation for simple functions. The problem is that even if we start with canonical representations, when we combine simple functions in various ways, the resulting representations may not be canonical. The collection of simple functions is closed under the basic arithmetic operations, and in particular, forms a vector space.
Suppose that \( f \) and \( g \) are simple functions with representations \( f = \sum_{i \in I} a_i \bs 1_{A_i} \) and \( g = \sum_{j \in J} b_j \bs 1_{B_j} \), and that \( c \in \R \). Then
Since \( f \) and \( g \) are measurable, so are \( f + g \), \( f g \), and \( c f \). Moreover, since \( f \) and \( g \) have finite range, so do \( f + g \), \( f g \), and \( c f \). For the representations in parts (a) and (b), note that \( I \times J \) is finite, \( \left\{A_i \cap B_j: (i, j) \in I \times J\right\} \) is a collection of sets in \( \ms S \) that partition \( S \), and on \( A_i \cap B_j \), \( f + g = a_i + b_j \) and \( f g = a_i b_j \).
As we alluded to earlier, note that even if the representations of \( f \) and \( g \) are canonical, the representations for \( f + g \) and \( f g \) may not be. The next result treats composition, and will be important for the change of variables theorem in the next section.
Suppose that \( (T, \ms T) \) is another measurable space, and that \( f: S \to T \) is measurable. If \( g \) is a simple function on \( T \) with representation \( g = \sum_{i \in I} b_i \bs 1_{B_i} \), then \( g \circ f \) is a simple function on \( S \) with representation \(g \circ f = \sum_{i \in I} b_i \bs 1_{f^{-1}(B_i)}\).
Recall that \( g \circ f : S \to \R \) and \( \range(g \circ f) \subseteq \range(g) \) so \( g \circ f \) has finite range. \( f \) is measurable, and inverse images preserve all set operations, so \( \left\{f^{-1}(B_i): i \in I\right\} \) is a measurable partition of \( S \). Finally, if \( x \in f^{-1}(B_i) \) then \( f(x) \in B_i \) so \( g\left[f(x)\right] = b_i \).
Given the definition of the integral of an indicator function , and given that we want the linearity property to hold, there is no question as to how we should define the integral of a nonnegative simple function.
Suppose that \( f \) is a nonnegative simple function, with the representation \( f = \sum_{i \in I} a_i \bs 1_{A_i} \) where \( a_i \ge 0 \) for \( i \in I \). We define \[ \int_S f \, d\mu = \sum_{i \in I} a_i \mu(A_i) \]
We need to show that the definition is consistent. A simple function can have more than one representation as a linear combination of indicator functions, and hence we must show that all such representations lead to the same value for the integral. Let \(\{b_j: j \in J\}\) denote the set of distinct elements among the numbers \( a_i \) where \(i \in I\) and \(A_i \neq \emptyset \). For \( j \in J \), let \( I_j = \{i \in I: a_i = b_j\} \) and let \( B_j = \bigcup_{i \in I_j} A_i \). Thus, \( f = \sum_{j \in J} b_j \bs 1_{B_j} \), and this is the canonical representation. Note that \[ \sum_{i \in I} a_i \mu(A_i) = \sum_{j \in J} \sum_{i \in I_j} a_i \mu(A_i) = \sum_{j \in J} b_j \sum_{i \in I_j} \mu(A_i) = \sum_{j \in J} b_j \mu(B_j) \] The first sum is the integral defined in terms of the general representation \( f = \sum_{i \in I} a_i \bs 1_{A_i} \) while the last sum is the integral defined in terms of the unique canonical representation \( f = \sum_{j \in J} b_j \bs 1_{B_j} \). Thus, any representation of a simple function \( f \) leads to the same value for the integral.
Note that if \( f \) is a nonnegative simple function, then \( \int_S f \, d\mu \) exists in \( [0, \infty] \), so the order properties holds. We next show that the linearity properties are satisfied for nonnegative simple functions.
Suppose that \( f \) and \( g \) are nonnegative simple functions, and that \( c \in [0, \infty) \). Then
Suppose that \( f \) and \( g \) are nonnegative simple functions with the representations \( f = \sum_{i \in I} a_i \bs 1_{A_i} \) and \(g = \sum_{j \in J} b_j \bs 1_{B_j} \). Thus \( a_i \ge 0 \) for \( i \in I \), \( b_j \ge 0 \) for \( j \in J \), and \begin{align*} \int_S f \, d\mu &= \sum_{i \in I} a_i \mu(A_i) \\ \int_S g \, d\mu &= \sum_{j \in J} b_j \mu(B_j) \end{align*}
The increasing property in holds for nonnegative simple functions.
Suppose that \( f \) and \( g \) are nonnegative simple functions and \( f \le g \) on \( S \). Then \( \int_S f \, d\mu \le \int_S g \, d\mu \)
Next we give a version of the continuity theorem for simple functions. It's not completely general, but will be needed for the next subsection where we do prove the general version.
Suppose that \( f \) is a nonnegative simple function and that \( (A_1, A_2, \ldots) \) is an increasing sequence of sets in \( \ms S \) with \( A = \bigcup_{n=1}^\infty A_n \). then \[ \int_S \bs 1_{A_n} f \, d\mu \to \int_S \bs 1_A f \, d\mu \text{ as } n \to \infty\]
Suppose that \( f \) has the representation \( f = \sum_{i \in I} b_i \bs 1_{B_i}\). Then \( \bs 1_{A_n} f = \sum_{i \in I} b_i \bs 1_{A_n} \bs 1_{B_i} = \sum_{i \in I} b_i \bs 1_{A_n \cap B_i} \) and similarly, \( \bs 1_A f = \sum_{i \in I} b_i \bs 1_{A \cap B_i} \). But for each \( i \in I \), \( B_i \cap A_n \) is increasing in \( n \in \N_+ \) and \( \bigcup_{n=1}^\infty (B_i \cap A_n) = B_i \cap A \). By the continuity theorem for positive measures, \( \mu(B_i \cap A_n) \to \mu(B_i \cap A) \) as \( n \to \infty \) for each \( i \in I \). Since \( I \) is finite, \[ \int_{A_n} f \, d\mu = \sum_{i \in I} b_i \mu(A_n \cap B_i) \to \sum_{i \in I} b_i \mu(A \cap B_i) = \int_A f \, d\mu \text{ as } n \to \infty \]
Note that \( \bs 1_{A_n} f \) is increasing in \( n \in \N_+ \) and \( \bs 1_{A_n} f \to \bs 1_A f \) as \( n \to \infty \), so this really is a special case of the monotone convergence theorem.
Next we will consider nonnegative measurable functions on \( S \). First we note that a function of this type is the limit of nonnegative simple functions.
Suppose that \( f: S \to [0, \infty) \) is measurable. Then there exists an increasing sequence \( \left(f_1, f_2, \ldots\right) \) of nonnegative simple functions with \( f_n \to f \) on \( S \) as \( n \to \infty \).
For \( n \in \N_+ \) and \( k \in \left\{1, 2, \ldots, n 2^n\right\} \) Let \( I_{n,k} = \left[(k -1) \big/ 2^n, k \big/ 2^n\right) \) and \( I_n = [n, \infty) \). Note that
Note that the \( n \)th partition divides the interval \( [0, n) \) into \( n 2^n \) subintervals of length \( 1 \big/ 2^n \). Thus, (b) follows because the \( (n + 1) \)st partition divides each of the first \( 2^n \) intervals of the \( n \)th partition in half, and (c) follows because the \( (n + 1) \)st partition divides the interval \( [n, n + 1) \) into subintervals of length \( 1 \big/ 2^{n + 1} \). Now let \( A_{n,k} = f^{-1}\left(I_{n,k}\right) \) and \( A_n = f^{-1}\left(I_n\right) \) for \( n \in \N_+ \) and \( k \in \left\{1, 2, \ldots, n 2^n\right\} \). Since inverse images preserve all set operations, (a), (b), and (c) hold with \( A \) replacing \( I \) everywhere, and \( S \) replacing \( [0, \infty) \) in (a). Moreover, since \( f \) is measurable, \( A_n \in \ms S \) and \( A_{n, k} \in \ms S \) for each \( n \) and \( k \). Now, define \[ f_n = \sum_{k = 1}^{ n 2^n} \frac{k - 1}{2^n} \bs 1_{A_{n, k}} + n \bs 1_{A_n} \] Then \( f_n \) is a simple function and \( 0 \le f_n \le f \) for each \( n \in \N_+ \). To show convergence, fix \( x \in S \). If \( n \gt f(x) \) then \( \left|f(x) - f_n(x)\right| \le 2^{-n} \) and hence \( f_n(x) \to f(x) \) as \( n \to \infty \). All that remains is to show that \( f_n \) is increasing in \( n \). Let \( x \in S \) and \( n \in \N_+ \). If \( x \in A_{n,k} \) for some \( k \in \left\{1, 2, \ldots, n 2^n\right\} \), then \( f_n(x) = (k - 1) \big/ 2^n \). But either \(f_{n + 1}(x) = (2 k - 2) \big/ 2^{n + 1} \) or \( f_{n + 1}(x) = (2 k - 1) \big/ 2^{n + 1} \). If \( x \in A_n \) then \( f_n(x) = n \). But either \( f_{n+1}(x) = (k - 1) \big/ 2^{n + 1} \) for some \( k \in \left\{n 2^{n+1} + 1, \ldots, (n + 1) 2^{n+1}\right\} \) or \( f_{n+1}(x) = n + 1 \). In all cases, \( f_{n + 1}(x) \ge f_n(x) \).
Theorem show how to define the integral of a measurable function \( f: S \to [0, \infty) \) in terms of the integrals of simple functions. If \( g \) is a nonnegative simple function with \( g \le f \), then by the order property , we need \( \int_S g \, d\mu \le \int_S f \, d\mu \). On the other hand, there exists a sequence of nonnegative simple function converging to \( f \). Thus the continuity property suggests the following definition:
If \( f: S \to [0, \infty) \) is measurable, define \[ \int_S f \, d\mu = \sup\left\{ \int_S g \, d\mu: g \text{ is simple and } 0 \le g \le f \right\} \]
Note that \( \int_S f \, d\mu \) exists in \( [0, \infty] \) so the positive property in holds. Note also that if \( f \) is simple, the new definition agrees with the old one. As always, we need to establish the essential properties. First, the increasing property in holds.
If \( f, \, g: S \to [0, \infty) \) are measurable and \( f \le g \) on \( S \) then \( \int_S f \, d\mu \le \int_S g \, d\mu \).
Note that \( \{h: h \text{ is simple and } 0 \le h \le f\} \subseteq \{ h: h \text { is simple and } 0 \le h \le g\} \). therefore \[ \int_S f \, d\mu = \sup\left\{\int_S h \, d\mu: h \text{ is simple and } 0 \le h \le f \right\} \le \sup\left\{\int_S h \, d\mu: h \text{ is simple and } 0 \le h \le g\right\} = \int_S g \, d\mu \]
We can now prove the continuity property known as the monotone convergence theorem in full generality.
Suppose that \( f_n: S \to [0, \infty) \) is measurable for \( n \in \N_+ \) and that \( f_n \) is increasing in \( n \). Then \[ \int_S \lim_{n \to \infty} f_n \, d\mu = \lim_{n \to \infty} \int_S f_n d \mu \]
Let \( f = \lim_{n \to \infty} f_n \). By the order property, note that \( \int_S f_n \, d\mu \) is increasing in \( n \in \N_+\) and hence has a limit in \( \R^* \), which we will denote by \( c \). Note that \( f_n \le f \) on \( S \) for \( n \in \N_+ \), so by the order property again, \(\int_S f_n \, d\mu \le \int_S f \, d\mu\) for \( n \in \N_+ \). Letting \( n \to \infty \) gives \( c \le \int_S f \, d\mu \). To show that \( c \ge \int_S f \, d\mu \) we need to show that \( c \ge \int_S g \, d\mu \) for every simple function \( g \) with \( 0 \le g \le f \). Fix \( a \in (0, 1) \) and let \( A_n = \{ x \in S: f_n(x) \ge a g(x)\} \). Since \( f_n \) is increasing in \( n \), \( A_n \subseteq A_{n+1} \). Moreover, since \( f_n \to f \) as \( n \to \infty \) on \( S \) and \( g \le f \) on \( S \), \( \bigcup_{n=1}^\infty A_n = S \). But by definition, \( \alpha g \le f_n \) on \( A_n \) so \[ \alpha \int_S \bs 1_{A_n} g \, d\mu = \int_S \alpha \bs 1_{A_n} g \, d\mu \le \int_S \bs 1_{A_n} f_n \, d\mu \le \int_S f_n \, d\mu \] Letting \( n \to \infty \) in the extreme parts of the displayed inequality and using the version of the monotone convergence theorem for simple functions in , we have \( a \int_S g \, d\mu \le c \) for every \( a \in (0, 1) \). Finally, letting \( a \uparrow 1 \) gives \( \int_S g \, d\mu \le c \)
If \( f: S \to [0, \infty) \) is measurable, then by , there exists an increasing sequence \( \left(f_1, f_2, \ldots\right) \) of simple functions with \( f_n \to f \) as \( n \to \infty \). By the monotone convergence theorem , \( \int_S f_n \, d\mu \to \int_S f \, d\mu \) as \( n \to \infty \). These two facts can be used to establish other properties of the integral of a nonnegative function based on our knowledge that the properties hold for simple functions. This type of argument is known as bootstrapping. We use bootstrapping to show that the linearity properties hold:
If \( f, \, g: S \to [0, \infty) \) are measurable and \( c \in [0, \infty) \), then
Our final step is to define the integral of a measurable function \( f: S \to \R \). First, recall the positive and negative parts of \( x \in \R \): \[ x^+ = \max\{x, 0\}, \; x^- = \max\{-x, 0\} \] Note that \( x^+ \ge 0 \), \( x^- \ge 0 \), \( x = x^+ - x^- \), and \( \left|x\right| = x^+ + x^- \). Given that we want the integral to have the linearity properties , there is no question as to how we should define the integral of \( f \) in terms of the integrals of \( f^+ \) and \( f^- \), which being nonnegative, are defined by the previous subsection.
If \( f: S \to \R \) is measurable, we define \[ \int_S f \, d\mu = \int_S f^+ \, d\mu - \int_S f^- \, d\mu \] assuming that at least one of the integrals on the right is finite.
Assuming that either the integral of the positive part or the integral of the negative part is finite ensures that we do not get the dreaded indeterminate form \( \infty - \infty \). Of course, if both are finite, then \( f \) is absolutely integrable.
Suppose that \( f: S \to \R \) is measurable. Then \( f \) is absolutely integrable if and only if \( \int_S \left|f \right| \, d\mu \lt \infty \).
Recall that \( \left| f \right| = f^+ + f^- \). By the additive property for nonnegative functions, \( \int_S \left| f \right| \, d\mu = \int_S f^+ \, d\mu + \int_S f^- \, d\mu \lt \infty \). Conversely, suppose that \( \int_S \left| f \right| \, d\mu \lt \infty \). Then \( f^+ \le \left| f \right| \) and \( f^- \le \left| f \right|\) so by the increasing property for nonnegative functions, \( \int_S f^+ \, d\mu \le \int_S \left| f \right| \, d\mu \lt \infty \) and \( \int_S f^- \, d\mu \le \int_S \left| f \right| \, d\mu \lt \infty \).
Note that if \( f \) is nonnegative, then our new definition agrees with our old one, since \( f^+ = f \) and \( f^- = 0 \). For simple functions the integral has the same basic form as for nonnegative simple functions:
Suppose that \( f \) is a simple function with the representation \( f = \sum_{i \in I} a_i \bs 1_{A_i} \). Then \[ \int_S f \, d\mu = \sum_{i \in I} a_i \mu(A_i) \] assuming that the sum does not have both \( \infty \) and \( -\infty \) terms.
Note that \( f^+ \) and \( f^- \) are also simple, with the representations \( f^+ = \sum_{i \in I} a_i^+ \bs 1_{A_i} \) and \( f^- = \sum_{i \in I} a_i^- \bs 1_{A_i} \). hence \[ \int_S f \, d\mu = \sum_{i \in I} a_i^+ \mu(A_i) - \sum_{i \in I} a_i^- \mu(A_i) \] as long as one of the sums is finite. Given that this is the case, we can recombine the sums to get \[ \int_S f \, d\mu = \sum_{i \in I} a_i \mu(A_i) \]
Once again, we need to establish the essential properties. Our first result is an intermediate step towards linearity.
If \( f, \, g: S \to [0, \infty) \) are measurable then \( \int_S (f - g) \, d\mu = \int_S f \, d\mu - \int_S g \, d\mu \) as long as at least one of the integrals on the right is finite.
We take cases. Suppose first that \( \int_S f \, d\mu \lt \infty \) and \( \int_S g \, d\mu \lt \infty \). Note that \( (f - g)^+ \le f \) and \( (f - g)^- \le g \). By the increasing property for nonnegative functions, \( \int_S (f - g)^+ \, d\mu \le \int_S f \, d\mu \lt \infty \) and \( \int_S (f - g)^- \, d\mu \le \int_S g \, d\mu \lt \infty \). Thus \( f - g \) is integrable. Next we have \( f - g = (f - g)^+ - (f - g)^- \) and therefore \( f + (f - g)^- = g + (f - g)^+ \). All four of the functions in the last equation are nonnegative, and therefore by additivity property for nonnegative functions, we have \[ \int_S f \, d\mu + \int_S (f - g)^- \, d\mu = \int_S g \, d\mu + \int_S (f - g)^+ \, d\mu \] All of these integrals are finite, and hence \[\int_S (f - g) \, d\mu = \int_S (f - g)^+ \, d\mu - \int_S (f - g)^- \, d\mu = \int_S f \, d\mu - \int_S g \, d\mu\]
Next suppose that \( \int_S f \, d\mu = \infty \) and \( \int_S g \, d\mu \lt \infty \). Then \( f - g \le (f - g)^+ \) and hence \( f \le (f - g)^+ + g \). Using the additivity and increasing properties for nonnegative functions, we have \( \infty = \int_S f \, d\mu \le \int_S (f - g)^+ \, d\mu + \int_S g \, d\mu\). Since \( \int_S g \, d\mu \lt \infty \) we must have \( \int_S (f - g)^+ \, d\mu = \infty \). On the other hand, \( (f - g)^- \le g \) so \( \int_S (f - g)^- \, d\mu \le \int_S g \, d\mu \lt \infty \). Hence \( \int_S (f - g) \, d\mu = \infty = \int_S f \, d\mu - \int_S g \, d\mu \)
Finally, suppose that \( \int_S f \, d\mu \lt \infty \) and \( \int_S g \, d\mu = \infty \). By the argument in the last paragraph, we have \( \int_S (g - f)^+ \, d\mu = \infty \) and \( \int_S (g - f)^- \, d\mu \lt \infty \). Equivalently, \( \int_S (f - g)^+ \, d\mu \lt \infty \) and \( \int_S (f - g)^- \, d\mu = \infty \). Hence \( \int_S (f - g) \, d\mu = -\infty = \int_S f \, d\mu - \int_S g \, d\mu \).
We finally have the linearity properties in full generality.
If \( f, \, g: S \to \R \) are integrable and \( c \in \R \), then
In particular, note that if \( f \) and \( g \) are absolutely integrable, then so are \( f + g \) and \( c f \) for \( c \in \R \). Thus, the set of absolutely integrable functions on \( (S, \ms S, \mu) \) forms a vector space, which is denoted \( \ms L(S, \ms S, \mu) \). The \( \ms L \) is in honor of Henri Lebesgue, who first developed the theory. This vector space, and other related ones, will be studied in more detail in the section on function spaces.
We also have the increasing property in full generality.
If \( f, \, g: S \to \R \) are integrable, and if \( f \le g \) on \( S \) then \( \int_S f \, d\mu \le \int_S g \, d\mu \)
We can use the proof based on the additive property in above. First \( g = f + (g - f) \) and \( g - f \ge 0 \) on \( S \). If \( \int_S f \, d\mu = -\infty \) then trivially, \( \int_S f \, d\mu \le \int_S g \, d\mu \). Otherwise \( \int_S (g - f) \, d\mu \ge 0 \) and therefore \( \int_S g \, d\mu = \int_S f \, d\mu + \int_S (g - f) \, d\mu \ge \int_S f \, d\mu \).
Now that we have defined the integral of a measurable function \( f \) over all of \( S \), there is a natural extension to the integral of \( f \) over a set \(A \in \ms S\): We simply replace \(f\) with \(\bs 1_A f\).
If \( f: S \to \R \) is measurable and \( A \in \ms S \), we define \[ \int_A f \, d\mu = \int_S \bs 1_A f \, d\mu \] assuming that the integral on the right exists.
We use the same terminology as before: \(f\) can be integrable over \(A\), absolutely integrable over \(A\), or the integral can fail to exist.
Suppose that \(A, \, B \in \ms S\) with \(A \subseteq B\). If \( f: S \to \R \) is integrable over \(B\) then \(f\) is integrable over \(A\).
Note that \( \bs 1_A f^+ \le \bs 1_B f^+ \) and \( \bs 1_A f^- \le \bs 1_B f^- \). If \(f\) is integrable over \(B\), then either \( \int_S \bs 1_B f^+ \, d\mu \lt \infty \) or \( \int_S \bs 1_B f^- \, d\mu \lt \infty \). By the increasing property, it follows that either \( \int_S \bs 1_A f^+ \, d\mu \lt \infty \) or \( \int_S \bs 1_A f^- \, d\mu \lt \infty \), so \(f\) is integrable over \(A\).
On the other hand, it's clearly possible for \(f\) to be integrable over \( A \in \ms S \), but not integrable over a larger set \(B \in \ms S\).
We could also simply think of \( \int_A f \, d\mu \) as the integral of a measurable function \( f: A \to \R \) over the measure space \( (A, \ms S_A, \mu_A) \), where \( \ms S_A = \{ B \in \ms S: B \subseteq A\} = \{C \cap A: C \in \ms S\} \) is the \( \sigma \)-algebra of measurable subsets of \( A \), and where \( \mu_A \) is the restriction of \( \mu \) to \( \ms S_A \). It follows that all of the essential properties hold for integrals over \( A \): the linearity properties in , the order properties in , and the monotone convergence theorem in . The following property is a simple consequence of the general additive property, and is known as additive property for disjoint domains.
Suppose that \( f: S \to \R \) integrable and that \( A, \, B \in \ms S \) are disjoint. then \[ \int_{A \cup B} f \, d\mu = \int_A f \, d\mu + \int_B f \, d\mu \]
Recall that \( \bs 1_{A \cup B} = \bs 1_A + \bs 1_B \). Hence by the additive property and the previous result, \[ \int_{A \cup B} f \, d\mu = \int_S \bs 1_{A \cup B} f \, d\mu = \int_S \left(\bs 1_A f + \bs 1_B f\right) \, d\mu = \int_S \bs 1_A f \, d\mu + \int_S \bs 1_B f \, d\mu = \int_A f \, d\mu + \int_B f \, d\mu \]
By induction, the additive property holds for a finite collection of disjoint domains. The extension to a countably infinite collection of disjoint domains will be considered in the next section on properties of the integral.
Recall again that the measure space \((S, \ms S, \#)\) is discrete if \(S\) is countable, \(\ms S\) is the collection of all subsets of \(S\), and \( \# \) is counting measure on \( (S, \ms S) \). Thus all functions \( f: S \to \R \) are measurable, and and as we will see, integrals with respect to \( \# \) are simply sums.
If \( f: S \to \R \) then \[ \int_S f \, d\# = \sum_{x \in S} f(x) \] as long as either the sum of the positive terms or the sum of the negative terms in finite.
The proof is a bootstrapping argument.
If the sum of the positive terms and the sum of the negative terms are both finite, then \( f \) is absolutely integrable with respect to \( \# \), but the usual term from calculus is that the series \( \sum_{x \in S} f(x) \) is absolutely convergent. The result will look more familiar in the special case \( S = \N_+ \). Functions on \( S \) are simply sequences, so we can use the more familiar notation \( a_i \) rather than \( a(i) \) for a function \( a: S \to \R \). Part (b) of the proof (with \( A_n = \{1, 2, \ldots, n\} \)) is just the definition of an infinite series of nonnegative terms as the limit of the partial sums: \[ \sum_{i=1}^\infty a_i = \lim_{n \to \infty} \sum_{i=1}^n a_i \] Part (c) of the proof is just the definition of a general infinite series \[ \sum_{i=1}^\infty a_i = \sum_{i=1}^\infty a_i^+ - \sum_{i=1}^\infty a_i^- \] as long as one of the series on the right is finite. Again, when both are finite, the series is absolutely convergent. In calculus we also consider conditionally convergent series. This means that \( \sum_{i=1}^\infty a_i^+ = \infty \), \( \sum_{i=1}^\infty a_i^- = \infty \), but \( \lim_{n \to \infty} \sum_{i=1}^n a_i \) exists in \( \R \). Such series have no place in general integration theory. Also, you may recall that such series are pathological in the sense that, given any number in \( \R^* \), there exists a rearrangement of the terms so that the rearranged series converges to the given number.
Consider first the one-dimensional Euclidean measure space \( (\R, \ms R, \lambda) \) where \( \ms R \) is the usual \( \sigma \)-algebra of Borel measurable sets and \( \lambda \) is Lebesgue measure. The theory developed above applies, of course, for the integral of a measurable function \( f: \R \to \R \) over a set \( A \in \ms R \). It's not surprising that in this special case, the theory of integration is referred to as Lebesgue integration in honor of our good friend Henri Lebesgue, who first developed the theory. However instead of \( \int_A f(x) \, d\lambda(x) \), the usual notation is \[\int_A f(x) \, dx \] The same notation is used for the ordinary Riemann integral of calculus, named for our other good friend Georg Riemann, and for good reason:
The Lebesgue integral extends the Riemann integral.
To understand this important theorem we need to review the definition of the Riemann integral. Consider first the standard case where the domain of integration is a closed, bounded interval. Here are the preliminary definitions that we will need.
Suppose that \( f: [a, b] \to \R \), where \( a, \, b \in \R \) and \( a \lt b \).
Note that the Riemann sum is simply the integral of the simple function \( g = \sum_{i \in I} f(x_i) \bs 1_{A_i} \). Moreover, since \( A_i \) is an interval for each \( i \in I \), \( g \) is a step function, since it is constant on a finite collection of disjoint intervals. Moreover, again since \( A_i \) is an interval for each \( i \in I \), \( \lambda(A_i) \) is simply the length of the subinterval \( A_i \), so of course measure theory per se is not needed for Riemann integration. Now for the definition from calculus:
\( f \) is Riemann integrable on \( [a, b] \) if there exists \( r \in \R \) with the property that for every \( \epsilon \gt 0 \) there exists \( \delta \gt 0 \) such that if \( \ms{A} \) is a partition of \( [a, b] \) with \( \|A\| \lt \delta \) then \( \left| r - R\left(f, \ms{A}, B\right) \right| \lt \epsilon \) for every set of points \( B \) associated with \( \ms{A} \). Then of course we define the integral by \[ \int_a^b f(x) \, dx = r\]
Here is our main theorem of this subsection.
If \( f: [a, b] \to \R \) is Riemann integrable on \( [a, b] \) then \( f \) is Lebesgue integrable on \( [a, b] \) and \[ \int_{[a, b]} f \, d\lambda = \int_a^b f(x) \, dx \]
On the other hand, there are lots of functions that are Lebesgue integrable but not Riemann integrable. In fact there are indicator functions of this type, the simplest of functions from the point of view of general integration.
Consider the function \( \bs 1_\Q \) where as usual, \( \Q \) is the set of rational number in \( \R \). Then
Part (a) follows from the definition of the Lebesgue integral: \[ \int_\R \bs 1_\Q \, d\lambda = \lambda(\Q) = 0 \] For part (b), note that there are rational and irrational numbers in every interval of \( \R \) of positive length (the rational numbers and the irrational numbers are dense in \( \R \)). Thus, given any partition \( \ms{A} = \{A_i: i \in I\} \) of \( [a, b] \), no matter how small the norm, there are Riemann sums that are 0 (take \(x_i \in A_i\) irrational for each \( i \in I \)), and Riemann sums that are \( b - a \) (take \( x_i \in A_i \) rational for each \( i \in I \))
The following fundamental theorem completes the picture.
\( f: [a, b] \to \R \) is Riemann integrable on \( [a, b] \) if and only if \( f \) is bounded on \( [a, b] \) and \( f \) is continuous almost everywhere on \( [a, b] \).
The Riemann integral can be extended to domains other than closed, bounded intervals, using limits and additivity over disjoint domains. In every case, the Lebesgue integral agrees with the Riemann integral.
For \(n \in \N_+\), consider now the \(n\)-dimensional Euclidean measure space \((\R^n, \ms R^n, \lambda^n)\) where \(\ms R^n\) is the Borel \(\sigma\)-algebra and \(\lambda^n\) is Lebesgue measure on \((\R^n, \ms R^n)\). As we have noted several times before, just as \(\lambda\) extends length to \(\ms R\), so \(\lambda^2\) extends area to \(\ms R^2\), \(\lambda^3\) extends volume to \(\ms R^3\), and generally \(\lambda^n\) is \(n\)-dimesional volume on \(\ms R^n\). The Lebesgue integral of a measurable function \(f: \R^n \to \R\) over a set \(A \in \ms R^n\) is denoted as before by
\[\int_A f(x) \, dx\]
But for \(n \ge 2\) this is now a multiple integral
in the sense that \(x = (x_1, x_2, \ldots, x_n) \in \R^n\) and \(dx = d(x_1, x_2, \ldots, x_n)\). Once again, the Lebesgue integral agrees with the ordinary (multiple) Riemann integral when \(f\) and \(A\) are sufficiently nice. Recall that, as \(\sigma\)-algebras, \(\ms R^n\) is also the power of \(\ms R\) of order \(n\), and as measures, \(\lambda^n\) is the power of \(\lambda\) of order \(n\). So a natural question, that we will address in the next section, is how the multiple integral relates to the corresponding iterated integrals.
Consider again the measurable space \( (\R, \ms R) \) where \( \ms R \) is the usual \( \sigma \)-algebra of Borel measurable subsets of \( \R \). Suppose that \( F: \R \to \R \) is a general distribution function, so that by definition, \( F \) is increasing and continuous from the right. Recall that the Lebesgue-Stieltjes measure \( \mu \) associated with \( F \) is the unique measure on \( \ms R \) that satisfies \[ \mu(a, b] = F(b) - F(a); \quad a, \, b \in \R, \; a \lt b \] The integral with respect to the measure \( \mu \) is, appropriately enough, referred to as the Lebesgue-Stieltjes integral with respect to \( F \), and like the measure, is named for the ubiquitous Henri Lebesgue and for Thomas Stieltjes. In addition to our usual notation \( \int_S f \, d\mu \), the Lebesgue-Stieltjes integral is also denoted \( \int_S f \, dF\) and \(\int_S f(x) \, dF(x) \).
Let \( g(x) = \frac{1}{1 + x^2} \) for \( x \in \R \).
The function \( g \) in the last exercise is important in the study of the Cauchy distribution, named for Augustin Cauchy. The graph of \( g \) is known as the witch of Agnesi, named for Maria Agnesi.
Let \( g(x) = \frac{1}{x^b} \) for \( x \in [1, \infty) \) where \( b \gt 0 \) is a parameter. Find \( \int_1^\infty g(x) \, dx \)
\(\int_1^\infty g(x) \, dx = \begin{cases} \infty, & 0 \lt b \le 1 \\ \frac{1}{b - 1}, & b \gt 1 \end{cases} \)
The function \( g \) in the last exercise is important in the study of the Pareto distribution, named for Vilfredo Pareto.
Suppose that \( f(x) = 0 \) if \( x \in \Q \) and \( f(x) = \sin(x) \) if \( x \in \R - \Q \).