Our goal in this section is to continue the broad sketch of the general theory of Markov processes. As with the last section, some of the statements are not completely precise and rigorous, because we want to focus on the main ideas without being overly burdened by technicalities. If you are a new student of probability, or are primarily interested in applications, you may want to skip ahead to the study of discrete-time Markov chains.
As usual, our starting point is a probability space \( (\Omega, \mathscr{F}, \P) \), so that \( \Omega \) is the set of outcomes, \( \mathscr{F} \) the \( \sigma \)-algebra of events, and \( \P \) the probability measure on the sample space \( (\Omega, \mathscr{F}) \). The set of times \( T \) is either \( \N \), discrete time with the discrete topology, or \( [0, \infty) \), continuous time with the usual Euclidean topology. The time set \( T \) is given the Borel \( \sigma \)-algebra \( \mathscr{T} \), which is just the power set if \( T = \N \), and then the time space \( (T, \mathscr{T}) \) is given the usual measure, counting measure in the discrete case and Lebesgue measure in the continuous case. The set of states \( S \) has an LCCB topology (locally compact, Hausdorff, with a countable base), and is also given the Borel \( \sigma \)-algebra \( \mathscr{S} \). Recall that to say that the state space is discrete means that \( S \) is countable with the discrete topology, so that \( \mathscr{S} \) is the power set of \( S \). The topological assumptions mean that the state space \( (S, \mathscr{S}) \) is nice enough for a rich mathematical theory and general enough to encompass the most important applications. There is often a natural Borel measure \( \lambda \) on \( (S, \mathscr{S}) \), counting measure \( \# \) if \( S \) is discrete, and for example, Lebesgue measure if \( S = \R^k \) for some \( k \in \N_+ \).
Recall also that there are several spaces of functions on \( S \) that are important. Let \( \mathscr{B} \) denote the set of bounded, measurable functions \( f: S \to \R \). Let \( \mathscr{C} \) denote the set of bounded, continuous functions \( f: S \to \R \), and let \( \mathscr{C}_0 \) denote the set of continuous functions \( f: S \to \R \) that vanish at \( \infty \) in the sense that for every \( \epsilon \gt 0 \), there exists a compact set \( K \subseteq S \) such \( \left|f(x)\right| \lt \epsilon \) for \( x \in K^c \). These are all vector spaces under the usual (pointwise) addition and scalar multiplication, and \( \mathscr{C}_0 \subseteq \mathscr{C} \subseteq \mathscr{B} \). The supremum norm, defined by \( \left\| f \right\| = \sup\{\left|f(x)\right|: x \in S\} \) for \( f \in \mathscr{B} \) is the norm that is used on these spaces.
Suppose now that \(\bs{X} = \{X_t: t \in T\}\) is a time-homogeneous Markov process with state space \( (S, \mathscr{S}) \) defined on the probability space \( (\Omega, \mathscr{F}, \P) \). As before, we also assume that we have a filtration \( \mathfrak{F} = \{\mathscr{F}_t: t \in T\} \), that is, an increasing family of sub \( \sigma \)-algebras of \( \mathscr{F} \), indexed by the time space, with the properties that \( X_t \) is measurable with repsect to \( \mathscr{F}_t \) for \( t \in T \). Intuitively, \( \mathscr{F}_t \) is the collection of events up to time \( t \in T \).
As usual, we let \( P_t \) denote the transition probability kernel for an increase in time of size \( t \in T \). Thus \[ P_t(x, A) = \P(X_t \in A \mid X_0 = x), \quad x \in S, \, A \in \mathscr{S} \] Recall that for \( t \in T \), the transition kernel \( P_t \) defines two operators, on the left with measures and on the right with functions. So, if \( \mu \) is a measure on \( (S, \mathscr{S}) \) then \( \mu P_t \) is the measure on \( (S, \mathscr{S}) \) given by \[ \mu P_t(A) = \int_S \mu(dx) P_t(x, A), \quad A \in \mathscr{S} \] If \( \mu \) is the distribution of \( X_0 \) then \( \mu P_t \) is the distribution of \( X_t \) for \( t \in T \). If \( f \in \mathscr{B} \) then \( P_t f \in \mathscr{B} \) is defined by \[ P_t f(x) = \int_S P_t(x, dy) f(y) = \E\left[f(X_t) \mid X_0 = x\right] \] Recall that the collection of transition operators \( \bs{P} = \{P_t: t \in T\} \) is a semigroup because \( P_s P_t = P_{s+t} \) for \( s, \, t \in T \). Just about everything in this section is defined in terms of the semigroup \( \bs{P} \), which is one of the main analytic tools in the study of Markov processes.
We make the same assumptions as in the Introduction. Here is a brief review:
We assume that the Markov process \( \bs{X} = \{X_t: t \in T\} \) satisfies the following properties (and hence is a Feller Markov process):
Part (a) is an assumption on continuity in space, while part (b) is an assumption on continuity in time. If \( S \) is discrete then (a) automatically holds, and if \( T \) is discrete then (b) automatically holds. As we will see, the Feller assumptions are sufficient for a very nice mathematical theory, and yet are general enough to encompass the most important continuous-time Markov processes.
The process \( \bs{X} = \{X_t: t \in T\} \) has the following properties:
The Feller assumptions on the Markov process have equivalent formulations in terms of the transition semigroup.
The transition semigroup \( \bs{P} = \{P_t: t \in T\} \) has the following properties:
As before, part (a) is a condition on continuity in space, while part (b) is a condition on continuity in time. Once again, (a) is trivial if \( S \) is discrete, and (b) trivial if \( T \) is discrete. The first condition means that \( P_t \) is a linear operator on \( \mathscr{C}_0 \) (as well as being a linear operator on \( \mathscr{B} \)). The second condition leads to a stronger continuity result.
For \( f \in \mathscr{C}_0 \), the mapping \( t \mapsto P_t f \) is continuous on \( T \). That is, for \( t \in T \), \[ \|P_s f - P_t f\| = \sup\{\left|P_s f(x) - P_t f(x) \right|: x \in S\} \to 0 \text{ as } s \to t\]
Our interest in this section is primarily the continuous time case. However, we start with the discrete time case since the concepts are clearer and simpler, and we can avoid some of the technicalities that inevitably occur in continuous time.
Suppose that \( T = \N \), so that time is discrete. Recall that the transition kernels are just powers of the one-step kernel. That is, we let \( P = P_1 \) and then \( P_n = P^n \) for \( n \in \N \).
For \( \alpha \in (0, 1] \), the \( \alpha \)-potential kernel \( R_\alpha \) of \( \bs{X} \) is defined as follows: \[ R_\alpha(x, A) = \sum_{n=0}^\infty \alpha^n P^n(x, A), \quad x \in S, \, A \in \mathscr{S} \]
The function \( x \mapsto R_\alpha(x, A) \) from \( S \) to \( [0, \infty) \) is measurable for \( A \in \mathscr{S} \) since \( x \mapsto P^n(x, A) \) is measurable for each \( n \in \N \). The mapping \( A \mapsto R_\alpha(x, A) \) is a positive measure on \( \mathscr{S} \) for \( x \in S \) since \( A \mapsto P^n(x, A) \) is a probability measure for each \( n \in \N \). Finally, the interpretation of \( R(x, A) \) for \( x \in S \) and \( A \in \mathscr{S} \) comes from interchanging sum and expected value, which is allowed since the terms are nonnegative: \[ R(x, A) = \sum_{n=0}^\infty P^n(x, A) = \sum_{n=0}^\infty \E[\bs{1}(X_n \in A) \mid X_0 = x] = \E\left( \sum_{n=0}^\infty \bs{1}(X_n \in A) \biggm| X_0 = x\right) = \E[\#\{n \in \N: X_n \in A\} \mid X_0 = x] \]
Note that it's quite possible that \( R(x, A) = \infty \) for some \( x \in S \) and \( A \in \mathscr{S} \). In fact, knowing when this is the case is of considerable importance in the study of Markov processes. As with all kernels, the potential kernel \( R_\alpha \) defines two operators, operating on the right on functions, and operating on the left on positive measures. For the right potential operator, if \( f: S \to \R \) is measurable then \[R_\alpha f(x) = \sum_{n=0}^\infty \alpha^n P^n f(x) = \sum_{n=0}^\infty \alpha^n \int_S P^n(x, dy) f(y) = \sum_{n=0}^\infty \alpha^n \E[f(X_n) \mid X_0 = x], \quad x \in S \] assuming as usual that the expected values and the infinite series make sense. This will be the case, in particular, if \( f \) is nonnegative or if \( p \in (0, 1) \) and \( f \in \mathscr{B} \).
If \( \alpha \in (0, 1) \), then \( R_\alpha(x, S) = \frac{1}{1 - \alpha} \) for all \( x \in S \).
Using geometric series, \[ R_\alpha(x, S) = \sum_{n=0}^\infty \alpha^n P^n(x, S) = \sum_{n=0}^\infty \alpha^n = \frac{1}{1 - \alpha} \]
It follows that for \( \alpha \in (0, 1) \), the right operator \( R_\alpha \) is a bounded, linear operator on \( \mathscr{B} \) with \(\left\|R_\alpha \right\| = \frac{1}{1 - \alpha}\). It also follows that \( (1 - \alpha) R_\alpha \) is a probability kernel. There is a nice interpretation of this kernel.
If \( \alpha \in (0, 1) \) then \( (1 - \alpha) R_\alpha(x, \cdot) \) is the conditional distribution of \( X_N \) given \( X_0 = x \in S \), where \( N \) is independent of \( \bs{X} \) and has the geometric distribution on \( \N \) with parameter \( 1 - \alpha \).
Suppose that \( x \in S \) and \( A \in \mathscr{S} \). Conditioning on \( N \) gives \[ \P(X_N \in A \mid X_0 = x) = \sum_{n=0}^\infty \P(N = n) \P(X_N \in A \mid N = n, X_0 = x) \] But by the substitution rule and the assumption of independence, \[ \P(X_N \in A \mid N = n, X_0 = x) = \P(X_n \in A \mid N = n, X_0 = x) = \P(X_n \in A \mid X_0 = x) = P^n(x, A) \] Since \( N \) has the geometric distribution on \( N \) with parameter \( 1 - \alpha \) we have \( P(N = n) = (1 - \alpha) \alpha^n \) for \( n \in \N \). Substituting gives \[ \P(X_N \in A \mid X_0 = x) = \sum_{n=0}^\infty (1 - \alpha) \alpha^n P^n(x, A) = (1 - \alpha) R_\alpha(x, A)\]
So \( (1 - \alpha)R_\alpha \) is a transition probability kernel, just as \( P_n \) is a transition probability kernel, but corresponding to the random time \( N \) (with \( \alpha \in (0, 1) \) as a parameter), rather than the deterministic time \( n \in \N \). An interpretation of the potential kernel \( R_\alpha \) for \( \alpha \in (0, 1) \) can be also given in economic terms. Suppose that \( A \in \mathscr{S} \) and that we receive one monetary unit each time the process \( \bs{X} \) visits \( A \). Then as above, \( R(x, A) \) is the expected total amount of money we receive, starting at \( x \in S \). However, typically money that we will receive at times distant in the future has less value to us now than money that we will receive soon. Specifically suppose that a monetary unit received at time \( n \in \N \) has a present value of \( \alpha^n \), where \( \alpha \in (0, 1) \) is an inflation factor (sometimes also called a discount factor). Then \( R_\alpha(x, A) \) gives the expected, total, discounted amount we will receive, starting at \( x \in S \). A bit more generally, if \( f \in \mathscr{B} \) is a reward function, so that \( f(x) \) is the reward (or cost, depending on the sign) that we receive when we visit state \( x \in S \), then for \( \alpha \in (0, 1) \), \( R_\alpha f(x) \) is the expected, total, discounted reward, starting at \( x \in S \).
For the left potential operator, if \( \mu \) is a positive measure on \( \mathscr{S} \) then \[\mu R_\alpha(A) = \sum_{n=0}^\infty \alpha^n \mu P^n(A) = \sum_{n=0}^\infty \alpha^n \int_S \mu(dx) P^n(x, A), \quad A \in \mathscr{S}\] In particular, if \( \mu \) is a probability measure and \( X_0 \) has distribution \( \mu \) then \( \mu P^n \) is the distribution of \( X_n \) for \( n \in \N \), so from the last result, \((1 - \alpha) \mu R_\alpha \) is the distribution of \( X_N \) where again, \( N \) is independent of \( \bs{X} \) and has the geometric distribution on \( \N \) with parameter \( 1 - \alpha \). The family of potential kernels gives the same information as the family of transition kernels.
The potential kernels \( \bs{R} = \{R_\alpha: \alpha \in (0, 1)\} \) completely determine the transition kernels \( \bs{P} = \{P_n: n \in \N\} \).
Note that for \( x \in S \) and \( A \in \mathscr{S} \), the function \( \alpha \mapsto R_\alpha(x, A) \) is a power series in \( \alpha \) with coefficients \( n \mapsto P^n(x, A) \). In the language of combinatorics, \( \alpha \mapsto R_\alpha(x, A) \) is the ordinary generating function of the sequence \( n \mapsto P^n(x, A) \). As noted above, this power series has radius of convergence at least 1, so we can extend the domain to \( \alpha \in (-1, 1) \). Thus, given the potential kernels, we can recover the transition kernels by taking derivatives and evaluating at 0: \[ P^n(x, A) = \frac{1}{n!}\left[\frac{d^n}{d\alpha^n} R_\alpha(x, A) \right]_{\alpha = 0} \]
Of course, it's really only necessary to determine \( P \), the one step transition kernel, since the other transition kernels are powers of \( P \). In any event, it follows that the kernels \( \bs{R} = \{R_\alpha: \alpha \in (0, 1)\} \), along with the initial distribution, completely determine the finite dimensional distributions of the Markov process \( \bs{X} \). The potential kernels commute with each other and with the transition kernels.
Suppose that \( \alpha, \, \beta \in (0, 1] \) and \( k \in \N \). Then (as kernels)
Suppose that \( f \in \mathscr{B} \) is nonnegative. The interchange of the sums with the kernel operation is allowed since the kernels are nonnegative. The other tool used is the semigroup property.
The same identities hold for the right operators on the entire space \( \mathscr{B} \), with the additional restrictions that \( \alpha \lt 1 \) and \( \beta \lt 1 \). The fundamental equation that relates the potential kernels is given next.
If \( \alpha, \, \beta \in (0, 1] \) with \( \alpha \le \beta \) then (as kernels), \[ \beta R_\beta = \alpha R_\alpha + (\beta - \alpha) R_\alpha R_\beta \]
If \( \alpha = \beta \) the equation is trivial, so assume \( \alpha \lt \beta \). Suppose that \( f \in \mathscr{B} \) is nonnegative. From the previous result, \[ R_\alpha R_\beta f = \sum_{j=0}^\infty \sum_{k=0}^\infty \alpha^j \beta^k P^{j+k} f \] Changing variables to sum over \( n = j + k \) and \( j \) gives \[ R_\alpha R_\beta f = \sum_{n=0}^\infty \sum_{j=0}^n \alpha^j \beta^{n-j} P^n f = \sum_{n=0}^\infty \sum_{j=0}^n \left(\frac{\alpha}{\beta}\right)^j \beta^n P^n f = \sum_{n=0}^\infty \frac{1 - \left(\frac{\alpha}{\beta}\right)^{n+1}}{1 - \frac{\alpha}{\beta}} \beta^n P^n f \] Simplifying gives \[ R_\alpha R_\beta f = \frac{1}{\beta - \alpha} (\beta R_\beta f - \alpha R_\alpha f)\] Note that since \( \alpha \lt 1 \), \( R_\alpha f\) is a finite, so we don't have to worry about the dreaded indeterminate form \( \infty - \infty \).
The same identity holds holds for the right operators on the entire space \( \mathscr{B} \), with the additional restriction that \( \beta \lt 1 \).
If \( \alpha \in (0, 1] \), then (as kernels), \( I + \alpha R_\alpha P = I + \alpha P R_\alpha = R_\alpha \).
Suppose that \( f \in \mathscr{B} \) is nonnegative. From the result above, \[ (I + \alpha R_\alpha P) f = (I + \alpha P R_\alpha) f = f + \sum_{n=0}^\infty \alpha^{n+1} P^{n+1} f = \sum_{n = 0}^\infty \alpha^n P^n f = R_\alpha f \]
The same identity holds for the right operators on the entire space \( \mathscr{B} \), with the additional restriction that \( \alpha \lt 1 \). This leads to the following important result:
If \( \alpha \in (0, 1) \), then as operators on the space \( \mathscr{B} \),
The operators are bounded, so we can subtract. The identity \( I + \alpha R_\alpha P = R_\alpha \) leads to \( R_\alpha(I - \alpha P) = I \) and the identity \( I + \alpha P R_\alpha = R_\alpha \) leads to \( (I - \alpha P) R_\alpha = I \). Hence (a) holds. Part (b) follows from (a).
Exercise shows again that the potential operator \( R_\alpha \) determines the transition operator \( P \).
Our first example considers the binomial process as a Markov process.
Let \( \bs{I} = \{I_n: n \in \N_+\} \) be a sequence of Bernoulli Trials with success parameter \( p \in (0, 1) \). Define the Markov process \( \bs{X} = \{X_n: n \in \N\} \) by \( X_n = X_0 + \sum_{k=1}^n I_k \) where \( X_0 \) takes values in \( \N \) and is independent of \( \bs{I} \).
Recall that \( \bs{X} \) is a Markov process since it has stationary, independent increments.
With the discrete-time setting as motivation, we now turn the more important continuous-time case where \( T = [0, \infty) \).
For \( \alpha \in [0, \infty) \), the \( \alpha \)-potential kernel \( U_\alpha \) of \( \bs{X} \) is defined as follows: \[ U_\alpha(x, A) = \int_0^\infty e^{-\alpha t} P_t(x, A) \, dt, \quad x \in S, \, A \in \mathscr{S} \]
Since \( \bs{P} = \{P_t: t \in T\} \) is a Feller semigroup of transition operators, the mapping \((t, x) \mapsto P_t(x, A)\) from \( [0, \infty) \times S \) to \( [0, 1] \) is jointly measurable for \( A \in \mathscr{S} \). Thus, \( U_\alpha(x, A) \) makes sense for \( x \in S \) and \( A \in \mathscr{S} \) and \( x \mapsto U_\alpha(x, A) \) from \( S \) to \( [0, \infty) \) is measurable for \( A \in \mathscr{S} \). That \( A \mapsto U_\alpha(x, A) \) is a measure on \( \mathscr{S} \) follows from the usual interchange of sum and integral, via Fubini's theorem: Suppose that \( \{A_j: j \in J\} \) is a countable collection of disjoint sets in \( \mathscr{S} \), and let \( S = \bigcup_{j \in J} A_j \) \begin{align*} U_\alpha(x, A) & = \int_0^\infty e^{-\alpha t} P_t(x, A) \, dt = \int_0^\infty \left[\sum_{j \in J} e^{-\alpha t} P_t(x, A_j)\right] \, dt\\ & = \sum_{j \in J} \int_0^\infty e^{-\alpha t} P_t(x, A_j) \, dt = \sum_{j \in J} U_\alpha(x, A_j) \end{align*} Finally, the interpretation of \( U(x, A) \) for \( x \in S \) and \( A \in \mathscr{S} \) is another interchange of integrals: \[ U(x, A) = \int_0^\infty P_t(x, A) \, dt = \int_0^\infty \E[\bs{1}(X_t \in A) \mid X_0 = x] \, dt = \E\left( \int_0^\infty \bs{1}(X_t \in A) \, dt \biggm| X_0 = x\right) \] The inside integral is the Lebesgue measure of \( \{t \in [0, \infty): X_t \in A\} \).
As with discrete time, it's quite possible that \( U(x, A) = \infty \) for some \( x \in S \) and \( A \in \mathscr{S} \), and knowing when this is the case is of considerable interest. As with all kernels, the potential kernel \( U_\alpha \) defines two operators, operating on the right on functions, and operating on the left on positive measures. If \( f: S \to \R \) is measurable then, giving the right potential operator in its many forms, \begin{align*} U_\alpha f(x) & = \int_S U_\alpha(x, dy) f(y) = \int_0^\infty e^{-\alpha t} P_t f(x) \, dt \\ & = \int_0^\infty e^{-\alpha t} \int_S P_t(x, dy) f(y) = \int_0^\infty e^{-\alpha t} \E[f(X_t) \mid X_0 = x] \, dt, \quad x \in S \end{align*} assuming that the various integrals make sense. This will be the case in particular if \( f \) is nonnegative, or if \( f \in \mathscr{B} \) and \( \alpha \gt 0 \).
If \( \alpha \gt 0 \), then \( U_\alpha(x, S) = \frac{1}{\alpha} \) for all \( x \in S \).
For \( x \in S \), \[ U_\alpha(x, S) = \int_0^\infty e^{-\alpha t} P_t(x, S) \, dt = \int_0^\infty e^{-\alpha t} dt = \frac{1}{\alpha} \]
It follows that for \( \alpha \in (0, \infty) \), the right potential operator \( U_\alpha \) is a bounded, linear operator on \( \mathscr{B} \) with \( \|U_\alpha\| = \frac{1}{\alpha} \). It also follows that \( \alpha U_\alpha \) is a probability kernel. This kernel has a nice interpretation.
If \( \alpha \gt 0 \) then \( \alpha U_\alpha (x, \cdot) \) is the conditional distribution of \( X_\tau \) where \( \tau \) is independent of \( \bs{X} \) and has the exponential distribution on \( [0, \infty) \) with parameter \( \alpha \).
Suppose that \( x \in S \) and \( A \in \mathscr{S} \). The random time \( \tau \) has PDF \( f(t) = \alpha e^{-\alpha t} \) for \( t \in [0, \infty) \). Hence, conditioning on \( \tau \) gives \[ \P(X_\tau \in A \mid X_0 = x) = \int_0^\infty \alpha e^{-\alpha t} \P(X_\tau \in A \mid \tau = t, X_0 = x) \, dt \] But by the substitution rule and the assumption of independence, \[ \P(X_\tau \in A \mid \tau = t, X_0 = x) = \P(X_t \in A \mid \tau = t, X_0 = x) = \P(X_t \in A \mid X_0 = x) = P_t(x, A) \] Substituting gives \[ \P(X_\tau \in A \mid X_0 = x) = \int_0^\infty \alpha e^{-\alpha t} P_t(x, A) \, dt = \alpha U_\alpha(x, A)\]
So \( \alpha U_\alpha \) is a transition probability kernel, just as \( P_t \) is a transition probability kernel, but corresponding to the random time \( \tau \) (with \( \alpha \in (0, \infty) \) as a parameter), rather than the deterministic time \( t \in [0, \infty) \). As in the discrete case, the potential kernel can also be interpreted in economic terms. Suppose that \( A \in \mathscr{S} \) and that we receive money at a rate of one unit per unit time whenever the process \( \bs{X} \) is in \( A \). Then \( U(x, A) \) is the expected total amount of money that we receive, starting in state \( x \in S \). But again, money that we receive later is of less value to us now than money that we will receive sooner. Specifically, suppose that one monetary unit at time \( t \in [0, \infty) \) has a present value of \( e^{-\alpha t} \) where \( \alpha \in (0, \infty) \) is the inflation factor or discount factor. The \( U_\alpha(x, A) \) is the total, expected, discounted amount that we receive, starting in \( x \in S \). A bit more generally, suppose that \( f \in \mathscr{B} \) and that \( f(x) \) is the reward (or cost, depending on the sign) per unit time that we receive when the process is in state \( x \in S \). Then \( U_\alpha f(x) \) is the expected, total, discounted reward, starting in state \( x \in S \).
For the left potential operator, if \( \mu \) is a positive measure on \( \mathscr{S} \) then \begin{align*} \mu U_\alpha(A) & = \int_S \mu(dx) U_\alpha(x, A) = \int_0^\infty e^{-\alpha t} \mu P_t (A) \, dt\\ & = \int_0^\infty e^{-\alpha t} \left[\int_S \mu(dx) P_t(x, A)\right] dt = \int_0^\infty e^{-\alpha t} \left[\int_S \mu(dx) \P(X_t \in A) \right] dt, \quad A \in \mathscr{S} \end{align*} In particular, suppose that \( \alpha \gt 0 \) and that \( \mu \) is a probability measure and \( X_0 \) has distribution\( \mu \). Then \( \mu P_t \) is the distribution of \( X_t \) for \( t \in [0, \infty) \), and hence from the last result, \( \alpha \mu U_\alpha \) is the distribution of \( X_\tau \), where again, \( \tau \) is independent of \( \bs{X} \) and has the exponential distribution on \( [0, \infty) \) with parameter \( \alpha \). The family of potential kernels gives the same information as the family of transition kernels.
The resolvent \(\bs{U} = \{U_\alpha: \alpha \in (0, \infty)\} \) completely determines the family of transition kernels \( \bs{P} = \{P_t: t \in (0, \infty)\} \).
Note that for \( x \in S \) and \( A \in \mathscr{S} \), the function \( \alpha \mapsto U_\alpha(x, A) \) on \( (0, \infty) \) is the Laplace transform of the function \( t \mapsto P_t(x, A) \) on \( [0, \infty) \). The Laplace transform of a function determines the function completely.
It follows that the resolvent \( \{U_\alpha: \alpha \in [0, \infty)\} \), along with the initial distribution, completely determine the finite dimensional distributions of the Markov process \( \bs{X} \). This is much more important here in the continuous-time case than in the discrete-time case, since the transition kernels \( P_t \) cannot be generated from a single transition kernel. The potential kernels commute with each other and with the transition kernels.
Suppose that \( \alpha, \, \beta, \, t \in [0, \infty) \). Then (as kernels),
Suppose that \( f \in \mathscr{B} \) is nonnegative. The interchanges of operators and integrals below are interchanges of integrals, and are justified since the integrands are nonnegative. The other tool used is the semigroup property of \( \bs{P} = \{P_t: t \in [0, \infty)\} \).
The same identities hold for the right operators on the entire space \( \mathscr{B} \) under the additional restriction that \( \alpha \gt 0 \) and \( \beta \gt 0 \). The fundamental equation that relates the potential kernels, known as the resolvent equation, is given in the next theorem:
If \( \alpha, \, \beta \in [0, \infty) \) with \( \alpha \le \beta \) then (as kernels) \( U_\alpha = U_\beta + (\beta - \alpha) U_\alpha U_\beta \).
If \( \alpha = \beta \) the equation is trivial, so assume \( \alpha \lt \beta \). Suppose that \( f \in \mathscr{B} \) is nonnegative. From the previous result, \[ U_\alpha U_\beta f = \int_0^\infty \int_0^\infty e^{-\alpha s} e^{-\beta t} P_{s + t} f \, dt \, ds \] The transformation \( u = s + t, \, v = s \) maps \( [0, \infty)^2 \) one-to-one onto \( \{(u, v) \in [0, \infty)^2: u \ge v\} \). The inverse transformation is \( s = v, \, t = u - v \) with Jacobian \( -1 \). Hence we have \begin{align*} U_\alpha U_\beta f & = \int_0^\infty \int_0^u e^{-\alpha v} e^{-\beta(u - v)} P_u f \, dv \, du = \int_0^\infty \left(\int_0^u e^{(\beta - \alpha) v} dv\right) e^{-\beta u} P_u f \, du \\ & = \frac{1}{\beta - \alpha} \int_0^\infty \left[e^{(\beta - \alpha) u} - 1\right] e^{-\beta u} P_u f du\\ & = \frac{1}{\beta - \alpha}\left(\int_0^\infty e^{-\alpha u} P_u f \, du - \int_0^\infty e^{-\beta u} P_u f \, du\right) = \frac{1}{\beta - \alpha}\left(U_\alpha f - U_\beta f\right) \end{align*} Simplifying gives the result. Note that \( U_\beta f \) is finite since \( \beta \gt 0 \).
The same identity holds for the right potential operators on the entire space \( \mathscr{B} \), under the additional restriction that \( \alpha \gt 0 \). For \( \alpha \in (0, \infty) \), \( U_\alpha \) is also an operator on the space \( \mathscr{C}_0 \).
If \( \alpha \in (0, \infty) \) and \( f \in \mathscr{C}_0 \) then \( U_\alpha f \in \mathscr{C}_0 \).
Suppose that \( f \in \mathscr{C}_0 \) and that \( (x_1, x_2, \ldots) \) is a sequence in \( S \). Then \( P_t f \in \mathscr{C}_0 \) for \( t \in [0, \infty) \). Hence if \( x_n \to x \in S \) as \( n \to \infty \) then \( e^{-\alpha t} P_t f(x_n) \to e^{-\alpha t} P_t f(x) \) as \( n \to \infty \) for each \( t \in [0, \infty) \). By the dominated convergence theorem, \[ U_\alpha f(x_n) = \int_0^\infty e^{-\alpha t} P_t f(x_n) \, dt \to \int_0^\infty e^{-\alpha t} P_t f(x) \, dt = U_\alpha f(x) \text{ as } n \to \infty \] Hence \( U_\alpha f \) is continuous. Next suppose that \( x_n \to \infty \) as \( n \to \infty \). This means that for every compact \( C \subseteq S\), there exist \( m \in \N_+ \) such that \( x_n \notin C \) for \( n \gt m \). Them \( e^{-\alpha t} P_t f(x_n) \to 0 \) as \( n \to \infty \) for each \( t \in [0, \infty) \). Again by the dominated convergence theorem, \[ U_\alpha f(x_n) = \int_0^\infty e^{-\alpha t} P_t f(x_n) \, dt \to 0 \text{ as } n \to \infty \] So \( U_\alpha f \in \mathscr{C}_0 \).
If \( f \in \mathscr{C}_0 \) then \( \alpha U_\alpha f \to f \) as \( \alpha \to \infty \).
Convergence is with respect to the supremum norm on \( \mathscr{C}_0 \), of course. Suppose that \( f \in \mathscr{C}_0 \). Note first that with a change of variables \( s = \alpha t \), \[ \alpha U_\alpha f = \int_0^\infty \alpha e^{-\alpha t} P_t f \, dt = \int_0^\infty e^{-s} P_{s/\alpha} f \, ds \] and hence \[ \left|\alpha U_\alpha f - f\right| = \left|\int_0^\infty e^{-s} \left(P_{s/\alpha} f - f\right) ds\right| \le \int_0^\infty e^{-s} \left|P_{s/\alpha} f - f\right| \, ds \le \int_0^\infty e^{-s} \left\|P_{s/\alpha} f - f\right\| \, ds \] So it follows that \[ \left\|\alpha U_\alpha f - f\right\| \le \int_0^\infty e^{-s} \left\|P_{s/\alpha} f - f\right\| \, ds\] But \( \left\|P_{s/\alpha} f - f\right\| \to 0 \) as \( \alpha \to \infty \) and hence by the dominated convergence theorem, \( \int_0^\infty e^{-s} \left\|P_{s/\alpha} f - f\right\| \, ds \to 0 \) as \( \alpha \to \infty \).
In continuous time, it's not at all clear how we could construct a Markov process with desired properties, say to model a real system of some sort. Stated mathematically, the existential problem is how to construct the family of transition kernels \( \{P_t: t \in [0, \infty)\} \) so that the semigroup property \(P_s P_t = P_{s + t}\) is satisfied for all \( s, \, t \in [0, \infty) \). The answer, as for similar problems in the deterministic world, comes essentially from calculus, from a type of derivative.
The infinitesimal generator of the Markov process \( \bs{X} \) is the operator \( G: \mathscr{D} \to \mathscr{C}_0 \) defined by \[ G f = \lim_{t \downarrow 0} \frac{P_t f - f}{t} \] on the domain \( \mathscr{D} \subseteq \mathscr{C}_0 \) for which the limit exists.
As usual, the limit is with respect to the supremum norm on \( \mathscr{C}_0 \), so \( f \in \mathscr{D} \) and \( G f = g \) means that \( f, \, g \in \mathscr{C}_0 \) and \[ \left\|\frac{P_t f - f}{t} - g \right\| = \sup\left\{\left| \frac{P_t f(x) - f(x)}{t} - g(x) \right|: x \in S \right\} \to 0 \text{ as } t \downarrow 0 \] So in particular, \[ G f(x) = \lim_{t \downarrow 0} \frac{P_t f(x) - f(x)}{t} = \lim_{t \downarrow 0} \frac{\E[f(X_t) \mid X_0 = x] - f(x)}{t}, \quad x \in S \]
The domain \( \mathscr{D} \) is a subspace of \( \mathscr{C}_0 \) and the generator \( G \) is a linear operator on \( \mathscr{D} \)
These are simple results that depend on the linearity of \( P_t \) for \( t \in [0, \infty) \) and basic results on convergence.
Note \( G \) is the (right) derivative at 0 of the function \( t \mapsto P_t f \). Because of the semigroup property, this differentiability property at \( 0 \) implies differentiability at arbitrary \( t \in [0, \infty) \). Moreover, the infinitesimal operator and the transition operators commute:
If \( f \in \mathscr{D} \) and \( t \in [0, \infty) \), then \( P_t f \in \mathscr{D} \) and the following derivative rules hold with respect to the supremum norm.
Let \( f \in \mathscr{D} \). All limits and statements about derivatives and continuity are with respect to the supremum norm.
Exercise gives a possible solution to the dilema that motivated this discussion in the first place. If we want to construct a Markov process with desired properties, to model a a real system for example, we can start by constructing an appropriate generator \( G \) and then solve the initial value problem \[P^\prime_t = G P_t, \quad P_0 = I \] to obtain the transition operators \( \bs{P} = \{P_t: t \in [0, \infty)\} \). The next theorem gives the relationship between the potential operators and the infinitesimal operator, which in some ways is better. This relationship is analogous to the relationship between the potential operators and the one-step operator in for discrete time
Suppose \( \alpha \in (0, \infty) \).
For \( \alpha \gt 0 \), the operators \( U_\alpha \) and \( G \) have an inverse relationship.
Suppose again that \( \alpha \in (0, \infty) \).
Recall that \( U_\alpha: \mathscr{C}_0 \to \mathscr{D} \) and \( G: \mathscr{D} \to \mathscr{C}_0 \)
So, from the generator \( G \) we can determine the potential operators \( \bs{U} = \{U_\alpha: \alpha \in (0, \infty)\} \), which in turn determine the transition operators \( \bs{P} = \{P_t: t \in (0, \infty)\} \). In continuous time, transition operators \( \bs{P} = \{P_t: t \in [0, \infty)\} \) can be obtained from the single, infinitesimal operator \( G \) in a way that is reminiscent of the fact that in discrete time, the transition operators \( \bs{P} = \{P^n: n \in \N\} \) can be obtained from the single, one-step operator \( P \).
Our first example is essentially deterministic.
Consider the Markov process \( \bs{X} = \{X_t: t \in [0, \infty)\} \) on \( \R \) satisfying the ordinary differential equation \[ \frac{d}{dt} X_t = g(X_t), \quad t \in [0, \infty) \] where \( g: \R \to \R \) is Lipschitz continuous. The infinitesimal operator \( G \) is given by \( G f(x) = f^\prime(x) g(x)\) for \( x \in \R \) on the domain \( \mathscr{D} \) of functions \( f: \R \to \R \) where \( f \in \mathscr{C}_0\) and \(f^\prime \in \mathscr{C}_0 \).
Recall that the only source of randomness in this process is the initial sate \( X_0 \). By the continuity assumptions on \( g \), there exists a unique solution \( X_t(x) \) to the differential equation with initial value \( X_0 = x \), defined for all \( t \in [0, \infty) \). The transition operator \( P_t \) for \( t \in [0, \infty) \) is defined on \( \mathscr{B} \) by \( P_t f(x) = f[X_t(x)] \) for \( x \in \R \). By the ordinary chain rule, if \( f \) is differentiable, \[ \frac{P_t f(x) - f(x)}{t} = \frac{f[X_t(x)] - f(x)}{t} \to f^\prime(x) g(x) \text{ as } t \downarrow 0 \]
Our next example considers the Poisson process as a Markov process. Compare this with the binomial process in .
Let \( \bs{N} = \{N_t: t \in [0, \infty)\} \) denote the Poisson process on \( \N \) with rate \( \beta \in (0, \infty) \). Define the Markov process \( \bs{X} = \{X_t: t \in [0, \infty)\} \) by \( X_t = X_0 + N_t \) where \( X_0 \) takes values in \( \N \) and is independent of \( \bs{N} \).