\(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\N}{\mathbb{N}}\) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\P}{\mathbb{P}}\) \(\newcommand{\var}{\,\text{var}}\) \(\newcommand{\skw}{\,\text{skew}}\) \(\newcommand{\kur}{\,\text{kurt}}\)
  1. Random
  2. 4. Special Distributions
  3. The Wald Distribution

The Wald Distribution

The Wald distribution, named for Abraham Wald, is important in the study of Brownian motion. Specifically, the distribution governs the first time that a Brownian motion with positive drift hits a fixed, positive value. In Brownian motion, the distribution of the random position at a fixed time has a normal (Gaussian) distribution, and thus the Wald distribution, which governs the random time at a fixed position, is sometimes called the inverse Gaussian distribution.

The Basic Wald Distribution

Distribution Functions

As usual, let \( \Phi \) denote the standard normal distribution function. This function is often considered a basic, special function in mathematics.

The basic Wald distribution with shape parameter \( \lambda \in (0, \infty) \) is a continuous distribution on \( (0, \infty) \) with distribution function \( G \) given by \[ G(u) = \Phi\left[\sqrt{\frac{\lambda}{u}}(u - 1)\right] + e^{2 \lambda} \Phi\left[-\sqrt{\frac{\lambda}{u}} (u + 1)\right], \quad u \in (0, \infty) \] The special case \( \lambda = 1 \) gives the standard Wald distribution.

Details:

Note that as \( u \to \infty \), \( \sqrt{\frac{\lambda}{u}}(u - 1) \to \infty \) and \( -\sqrt{\frac{\lambda}{u}}(u + 1) \to -\infty \), and hence \( G(u) \to 1 \). As \( u \downarrow 0 \), \(\sqrt{\frac{\lambda}{u}}(u - 1) \to -\infty \) and \( -\sqrt{\frac{\lambda}{u}}(u + 1) \to -\infty\), and hence \( G(u) \to 0 \). Of course, \( G \) is clearly continuous on \( (0, \infty) \), so it remains to show that \( G \) is increasing on this interval. Differentiating gives \[ G^\prime(u) = \phi\left[\sqrt{\frac{\lambda}{u}}(u - 1)\right] \left[\frac{\sqrt{\lambda}}{2}\left(u^{-1/2} + u^{-3/2}\right)\right] + e^{2 \lambda} \phi\left[-\sqrt{\frac{\lambda}{u}}(u + 1)\right] \left[-\frac{\sqrt{\lambda}}{2}\left(u^{-1/2} - u^{-3/2}\right)\right] \] where \( \phi(z) = \frac{1}{\sqrt{2 \pi}} e^{-z^2/2} \) is the standard normal PDF. Simple algebra shows that \[ \phi\left[\sqrt{\frac{\lambda}{u}}(u - 1)\right] = e^{2 \lambda} \phi\left[-\sqrt{\frac{\lambda}{u}}(u + 1)\right] = \frac{1}{\sqrt{2 \pi}} \exp\left[-\frac{\lambda}{2 u} (u - 1)^2\right]\] so simplifying further gives \[ G^\prime(u) = \sqrt{\frac{\lambda}{2 \pi}} u^{-3/2} \exp\left[-\frac{\lambda}{2 u} (u - 1)^2\right] \gt 0, \quad u \in (0, \infty)\]

The probability density function \( g \) is given by \[ g(u) = \sqrt{\frac{\lambda}{2 \pi u^3}} \exp\left[-\frac{\lambda}{2 u}(u - 1)^2\right], \quad u \in (0, \infty) \]

  1. \( g \) increases and then decreases with mode \( u_0 = \sqrt{1 + \left(\frac{3}{2 \lambda}\right)^2} - \frac{3}{2 \lambda} \)
  2. \( g \) is concave upward then downward then upward again.
Details:

The formula for the PDF follows immediately from the details of the CDF in , since \( g = G^\prime \). The first order properties come from \[ g^\prime(u) = \sqrt{\frac{\lambda}{8 \pi u^7}} \exp\left[-\frac{\lambda}{2 u}(u - 1)^2\right] \left[\lambda(1 - u^2) - 3 u\right], \quad u \in (0, \infty) \] and the second order properties from \[ g^{\prime\prime}(u) = \sqrt{\frac{\lambda}{32 \pi u^{11}}} \exp\left[-\frac{\lambda}{2 u}(u - 1)^2\right]\left[15 u^2 + \lambda^2 (u^2 - 1)^2 + 2 \lambda u(3 u^2 - 5)\right], \quad u \in (0, \infty) \]

So \( g \) has the classic unimodal shape, but the inflection points are very complicated functions of \( \lambda \). For the mode, note that \( u_0 \downarrow 0 \) as \( \lambda \downarrow 0 \) and \( u_0 \uparrow 1 \) as \( \lambda \uparrow \infty \). The probability density function of the standard Wald distribution is \[ g(u) = \sqrt{\frac{1}{2 \pi u^3}} \exp\left[-\frac{1}{2 u}(u - 1)^2\right], \quad u \in (0, \infty) \]

Open the special distribution simulator and select the Wald distribution. Vary the shape parameter and note the shape of the probability density function. For various values of the parameter, run the simulation 1000 times and compare the empirical density function to the probability density function.

The quantile function of the basic Wald distribution does not have a simple closed form, so the median and other quantiles must be approximated.

Open the quantile app and select the Wald distribution. Vary the shape parameter and note the shape of the distribution and probability density functions. For selected values of the parameter, compute approximate values of the first quartile, the median, and the third quartile.

Moments

Suppose that random variable \( U \) has the basic Wald distribution with shape parameter \( \lambda \in (0, \infty) \).

\( U \) has moment generating function \( m \) given by \[ m(t) = \E\left(e^{t U}\right) = \exp\left[\lambda \left(1 - \sqrt{1 - \frac{2 t}{\lambda}}\right)\right], \quad t \lt \frac{\lambda}{2} \]

Details:

The proof requires some facts about the modified Bessel function of the second kind, denoted \( K_\alpha \) where the parameter \( \alpha \in \R \). This function is one of the two linearly independent solutions of the differential equation \[ x^2 \frac{d^2 y}{d x^2} + x \frac{d y}{dx} - (x^2 + \alpha^2) y = 0 \] The other solution, appropriately enough, is the modified Bessel function of the first kind. The function of the second kind, the one that we care about here, is the solution that decays exponentially as \( x \to \infty \). The first fact we need is that \[ K_{-1/2}(x) = \frac{1}{\sqrt{x}} e^{-x}, \quad x \in (0, \infty) \] which you can verify be direct substitution into the differential equation. The second fact that we need is the identity \[ \int_0^\infty x^{p-1} \exp\left[-\frac{1}{2}\left(a x + \frac{b}{x}\right)\right] dx = \frac{2 K_p(\sqrt{a b})}{(a / b)^{p/2}}, \quad a, \, b \in (0, \infty); \; p \in \R \] Now, for the moment generating function of \( U \) we have \[m(t) = \int_0^\infty e^{t x} \sqrt{\frac{\lambda}{2 \pi x^3}} \exp\left[-\frac{\lambda}{2x}(x - 1)^2\right] dx\] Combining the exponentials and doing some algebra, we can rewrite this as \[ m(t) = \sqrt{\frac{\lambda}{2 \pi}} e^\lambda \int_0^\infty x^{-3/2} \exp\left[-\frac{1}{2}(\lambda - 2 t) x - \frac{1}{2} \frac{\lambda}{x}\right] dx \] The integral now has the form of the identity given above with \( p = -1/2 \), \( a = \lambda - 2 t \), and \( b = \lambda \). Hence we have \[ m(t) = \sqrt{\frac{\lambda}{2 \pi}} e^\lambda \frac{2 K_{-1/2}\left[\sqrt{\lambda (\lambda - 2 t)}\right]}{[(\lambda - 2 t) / \lambda]^{-1/4}} \] Using the explicit form of \( K_{-1/2} \) given above and doing more algebra we get \[ m(t) = \exp\left[\lambda - \sqrt{\lambda (\lambda - 2 t)}\right] = \exp\left[\lambda\left(1 - \sqrt{1 - \frac{2 t}{\lambda}}\right)\right] \]

Since the moment generating function is finite in an interval containing 0, the basic Wald distribution has moments of all orders.

The mean and variance of \( U \) are

  1. \( \E(U) = 1 \)
  2. \( \var(U) = \frac{1}{\lambda} \)
Details:

Differentiating the moment generating function \(m\) in gives \begin{align} m^\prime(t) & = m(t) \left(1 - \frac{2 t}{\lambda}\right)^{-1/2} \\ m^{\prime\prime}(t) &= m(t) \left[\left(1 - \frac{2 t}{\lambda}\right)^{-1} + \frac{1}{\lambda} \left(1 - \frac{2 t}{\lambda}\right)^{-3/2}\right] \end{align} and hence \( \E(U) = m^\prime(0) = 1 \) and \( \E\left(U^2\right) = m^{\prime\prime}(0) = 1 + \frac{1}{\lambda} \).

So interestingly, the mean is 1 for all values of the shape parameter, while \( \var(U) \to \infty \) as \( \lambda \downarrow 0 \) and \( \var(U) \to 0 \) as \( \lambda \to \infty \).

Open the special distribution simulator and select the Wald distribution. Vary the shape parameter and note the size and location of the mean \( \pm \) standard deviation bar. For various values of the parameters, run the simulation 1000 times and compare the empirical mean and standard deviation to the distribution mean and standard deviation.

The skewness and kurtosis of \( U \) are

  1. \( \skw(U) = 3 / \sqrt{\lambda} \)
  2. \( \kur(U) = 3 + 15 / \lambda \)
Details:

The main tool is the differential equation for the moment generating function in that we used in computing the mean and variance in the details of : \[ m^\prime(t) = m(t) \left(1 - \frac{2 t}{\lambda}\right)^{-1/2} \] Using this recursively, we can find the first four moments of \( U \). We already know the first two: \( m^\prime(0) = \E(U) = 1 \), \( m^{\prime \prime}(0) = \E(U^2) = 1 + 1 / \lambda \). The third and fourth are \begin{align} m^{(3)}(0) = & E(U^3) = 1 + 3 / \lambda + 3 / \lambda^2 \\ m^{(4)}(0) = & E(U^4) = 1 + 6 / \lambda + 15 / \lambda^2 + 15 / \lambda^3 \end{align} The results then follow from the standard computational formulas for the skewness and kurtosis in terms of the moments.

It follows that the excess kurtosis is \( \kur(U) - 3 = 15 / \lambda \). Note that \( \skw(U) \to \infty \) as \( \lambda \to 0 \) and \( \skw(U) \to 0 \) as \( \lambda \to \infty \). Similarly, \( \kur(U) \to \infty \) as \( \lambda \to 0 \) and \( \kur(U) \to 3 \) as \( \lambda \to \infty \).

The General Wald Distribution

The basic Wald distribution is generalized into a scale family. Scale parameters often correspond to a change of units, and so are of basic importance.

Suppose that \( \lambda, \, \mu \in (0, \infty) \) and that \( U \) has the basic Wald distribution with shape parameter \( \lambda / \mu \). Then \( X = \mu U \) has the Wald distribution with shape parameter \( \lambda \) and mean \( \mu \).

Justification for the name of the parameter \( \mu \) as the mean is given in . Note that the generalization is consistent—when \( \mu = 1 \) we have the basic Wald distribution with shape parameter \( \lambda \).

Distribution Functions

Suppose that \( X \) has the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty) \). Again, we let \( \Phi \) denote the standard normal distribution function.

\( X \) has distribution function \( F \) given by \[ F(x) = \Phi\left[\sqrt{\frac{\lambda}{x}} \left(\frac{x}{\mu} - 1\right)\right] + \exp\left(\frac{2 \lambda}{\mu}\right) \Phi\left[-\sqrt{\frac{\lambda}{x}} \left(\frac{x}{\mu} + 1\right)\right], \quad x \in (0, \infty) \]

Details:

Recall that the CDF \( F \) of \( X \) is related to the CDF \( G \) of \( U \) by \[ F(x) = G\left(\frac{x}{\mu}\right), \quad x \in (0, \infty) \] so the result follows from the CDF in , with \( \lambda \) replaced by \( \lambda / \mu \), and \( x \) with \( x / \mu \).

\( X \) has probability density function \( f \) given by \[ f(x) = \sqrt{\frac{\lambda}{2 \pi x^3}} \exp\left[\frac{-\lambda (x - \mu)^2}{2 \mu^2 x}\right], \quad x \in (0, \infty)

  1. \( f \) increases and then decreases with mode \( x_0 = \mu\left[\sqrt{1 + \left(\frac{3 \mu}{2 \lambda}\right)^2} - \frac{3 \mu}{2 \lambda}\right] \)
  2. \( f \) is concave upward then downward then upward again.
Details:

Recall that the PDF \( f \) of \( X \) is related to the PDF \( g \) of \( U \) by \[ f(x) = \frac{1}{\mu} g\left(\frac{x}{\mu}\right), \quad x \in (0, \infty) \] Hence the result follows from the PDF in with \( \lambda \) replaced by \( \lambda / \mu \) and \( x \) with \( x / \mu \).

Once again, the graph of \( f \) has the classic unimodal shape, but the inflection points are complicated functions of the parameters.

Open the special distribution simulator and select the Wald distribution. Vary the parameters and note the shape of the probability density function. For various values of the parameters, run the simulation 1000 times and compare the empirical density function to the probability density function.

Again, the quantile function cannot be expressed in a simple closed form, so the median and other quantiles must be approximated.

Open the quantile app and select the Wald distribution. Vary the parameters and note the shape of the distribution and density functions. For selected values of the parameters, compute approximate values of the first quartile, the median, and the third quartile.

Moments

Suppose again that \( X \) has the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty) \). By definition , we can take \( X = \mu U \) where \( U \) has the basic Wald distribution with shape parameter \( \lambda / \mu \).

\( X \) has moment generating function \( M \) given by \[ M(t) = \exp\left[\frac{\lambda}{\mu} \left(1 - \sqrt{1 - \frac{2 \mu^2 t}{\lambda}}\right)\right], \quad t \lt \frac{\lambda}{2 \mu^2} \]

Details:

Recall that the MGF \( M \) of \( X \) is related to the MGF \( m \) of \( U \) by \( M(t) = m(t \mu) \). Hence the result follows from the MFG in with \( \lambda \) replaced by \( \lambda / \mu \) and \( t \) with \( t \mu \).

As promised, the parameter \( \mu \) is the mean of Wald distribution.

The mean and variance of \( X \) are

  1. \( \E(X) = \mu \)
  2. \( \var(X) = \mu^3 / \lambda \)
Details:

From and basic properties of expected value and variance, we have \( \E(X) = \mu \E(U) = \mu \cdot 1 \) and \( \var(X) = \mu^2 \var(U) = \mu^2 \frac{\mu}{\lambda} \).

Open the special distribution simulator and select the Wald distribution. Vary the parameters and note the size and location of the mean \( \pm \) standard deviation bar. For various values of the parameters, run the simulation 1000 times and compare the empirical mean and standard deviation to the distribution mean and standard deviation.

The skewness and kurtosis of \( X \) are

  1. \( \skw(X) = 3 \sqrt{\mu / \lambda} \)
  2. \( \kur(U) = 3 + 15 \mu / \lambda \)
Details:

Skewness and kurtosis are invariant under scale transformations, so \( \skw(X) = \skw(U) \) and \( \kur(X) = \kur(U) \). The results then follow from the skewness and kurtosis in , with \( \lambda \) replaced by \( \lambda / \mu \).

Related Distribution

As noted earlier, the Wald distribution is a scale family, although neither of the parameters is a scale parameter.

Suppose that \( X \) has the Wald distribution with shape parameters \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty) \) and that \( c \in (0, \infty) \). Then \( Y = c X \) has the Wald distribution with shape parameter \( c \lambda \) and mean \( c \mu \).

Details:

By definition , we can take \( X = \mu U \) where \( U \) has the basic Wald distribution with shape parameter \( \lambda / \mu \). Then \( Y = c X = (c \mu) U \). Since \( U \) has shape parameter \( c \lambda / c \mu \), the result follows from the definition.

For the next result, it's helpful to re-parameterize the Wald distribution with the mean \( \mu \) and the ratio \( r = \lambda / \mu^2 \in (0, \infty) \). This parametrization is clearly equivalent, since we can recover the shape parameter from the mean and ratio as \( \lambda = r \mu^2 \). Note also that \( r = \E(X) \big/ \var(X) \), the ratio of the mean to the variance. Finally, note that the moment generating function in becomes \[ M(t) = \exp\left[r \mu \left(1 - \sqrt{1 - \frac{2}{r}t}\right)\right], \quad t \lt \frac{r}{2} \] and of course, this function characterizes the Wald distribution with this parametrization. Our next result is that the Wald distribution is closed under convolution (corresponding to sums of independent variables) when the ratio is fixed.

Suppose that \( X_1 \) has the Wald distribution with mean \( \mu_1 \in (0, \infty) \) and ratio \( r \in (0, \infty) \); \( X_2 \) has the Wald distribution with mean \( \mu_2 \in (0, \infty) \) and ratio \( r \); and that \( X_1 \) and \( X_2 \) are independent. Then \( Y = X_1 + X_2 \) has the Wald distribution with mean \( \mu_1 + \mu_2 \) and ratio \( r \).

Details:

For \( i \in \{1, 2\} \), the MGF of \( X_i \) is \[ M_i(t) = \exp\left[r \mu_i \left(1 - \sqrt{1 - \frac{2}{r}t}\right)\right], \quad t \lt \frac{r}{2} \] Hence the MGF of \( Y = X_i + X_2 \) is \[ M(t) = M_1(t) M_2(t) = \exp\left[r\left(\mu_1 + \mu_2\right) \left(1 - \sqrt{1 - \frac{2}{r}t}\right)\right], \quad t \lt \frac{r}{2} \] Hence \( Y \) has the Wald distribution with mean \( \mu_1 + \mu_2 \) and ratio \( r \).

In , note that the shape parameter of \( X_1 \) is \(r \mu_1^2\), the shape parameter of \( X_2 \) is \( r \mu_2^2 \), and the shape parameter of \( Y \) is \( \lambda = r (\mu_1 + \mu_2)^2 \). Also, of course, the result generalizes to a sum of any finite number of independent Wald variables, as long as the ratio is fixed. The next couple of results are simple corollaries.

Suppose that \( (X_1, X_2, \ldots, X_n) \) is a sequence of independent variables, each with the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty) \). Then

  1. \( Y_n = \sum_{i=1}^n X_i \) has the Wald distribution with shape parameter \( n^2 \lambda \) and mean \( n \mu \).
  2. \( M_n = \frac{1}{n} \sum_{i=1}^n X_i \) has the Wald distribution with shape parameter \( n \lambda \) and mean \( \mu \).
Details:
  1. This follows from and induction. The mean of \( Y_n \) of course is \( n \mu \). The common ratio is \( r = \lambda / \mu^2 \), and hence the shape parameter of \( Y_n \) is \( (\lambda / \mu^2) (n \mu)^2 = n^2 \lambda \).
  2. This follows from (a) and . The mean of \( M_n \) of course is \( \mu \) and the shape parameter is \( (1 / n) (n^2 \lambda) = n \lambda \).

In the context of , \( (X_1, X_2, \ldots, X_n) \) is a random sample of size \( n \) from the Wald distribution, and \( \frac{1}{n} \sum_{i = 1}^n X_i \) is the sample mean. The Wald distribution is infinitely divisible:

Suppose that \( X \) has the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty)\). For every \( n \in \N_+ \), \( X \) has the same distribution as \( \sum_{i = 1}^n X_i \) where \( (X_1, X_2, \ldots, X_n) \) are independent, and each has the Wald distribution with shape parameter \( \lambda / n^2 \) and mean \( \mu / n \).

The Lévy distribution is related to the Wald distribution, not surprising since the Lévy distribution governs the first time that a standard Brownian motion hits a fixed positive value.

For fixed \( \lambda \in (0, \infty) \), the Wald distribution with shape parameter \( \lambda \) and mean \( \mu \in (0, \infty) \) converges to the Lévy distribution with location parameter 0 and scale parameter \(\lambda \) as \( \mu \to \infty \).

Details:

From the formula for the CDF in , note that \[ F(x) \to \Phi\left(-\sqrt{\frac{\lambda}{x}}\right) + \Phi\left(-\sqrt{\frac{\lambda}{x}}\right) = 2 \Phi\left(-\sqrt{\frac{\lambda}{x}}\right) = 2\left[1 - \Phi\left(\sqrt{\frac{\lambda}{x}}\right)\right] \text{ as } \mu \to \infty \] But the last expression is the distribution function of the Lévy distribution with location parameter 0 and shape parameter \( \lambda \).

The other limiting distribution, this time with the mean fixed, is less interesting.

For fixed \( \mu \in (0, \infty) \), the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \) converges to point mass at \( \mu \) as \( \lambda \to \infty \).

Details:

This time, it's better to use \( M \), the moment generating function in . By rationalizing we see that for fixed \( \mu \in (0, \infty)\) and \( t \in \R \) \[ \frac{\lambda}{\mu} \left(1 - \sqrt{1 - \frac{2 \mu^2 t}{\lambda}}\right) = \frac{2 \mu t}{1 + \sqrt{1 - 2 \mu^2 t / \lambda}} \to \mu t \text{ as } \lambda \to \infty\] Hence \( M(t) \to e^{\mu t} \) as \( \lambda \to \infty \) and the limit is the MGF of the constant random variable \(\mu\).

Finally, the Wald distribution is a member of the general exponential family of distributions.

The Wald distribution is a general exponential distribution with natural parameters \( -\lambda / (2 \mu^2) \) and \( -\lambda / 2 \), and natural statistics \( X \) and \( 1 / X \).

Details:

This follows from the PDF \( f \) in . If we expand the square and simplify, we can write \( f \) in the form \[ f(x) = \sqrt{\frac{\lambda}{2 \pi}} \exp\left(\frac{\lambda}{2 \mu}\right) x^{-3/2} \exp\left(-\frac{\lambda}{2 \mu^2} x - \frac{\lambda}{2} \frac{1}{x}\right), \quad x \in (0, \infty) \]