\(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\N}{\mathbb{N}}\) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\P}{\mathbb{P}}\) \(\newcommand{\var}{\,\text{var}}\) \(\newcommand{\skw}{\,\text{skew}}\) \(\newcommand{\kur}{\,\text{kurt}}\)
  1. Random
  2. 4. Special Distributions
  3. The Wald Distribution

The Wald Distribution

The Wald distribution, named for Abraham Wald, is important in the study of Brownian motion. Specifically, the distribution governs the first time that a Brownian motion with positive drift hits a fixed, positive value. In Brownian motion, the distribution of the random position at a fixed time has a normal (Gaussian) distribution, and thus the Wald distribution, which governs the random time at a fixed position, is sometimes called the inverse Gaussian distribution.

The Basic Wald Distribution

Distribution Functions

As usual, let \( \Phi \) denote the standard normal distribution function.

Random variable \( U \) has the basic Wald distribution with shape parameter \( \lambda \in (0, \infty) \) if \( U \) has a continuous distribution on \( (0, \infty) \) with distribution function \( G \) given by \[ G(u) = \Phi\left[\sqrt{\frac{\lambda}{u}}(u - 1)\right] + e^{2 \lambda} \Phi\left[-\sqrt{\frac{\lambda}{u}} (u + 1)\right], \quad u \in (0, \infty) \]

Proof that \( G \) is a distribution function:

Note that as \( u \to \infty \), \( \sqrt{\frac{\lambda}{u}}(u - 1) \to \infty \) and \( -\sqrt{\frac{\lambda}{u}}(u + 1) \to -\infty \), and hence \( G(u) \to 1 \). As \( u \downarrow 0 \), \(\sqrt{\frac{\lambda}{u}}(u - 1) \to -\infty \) and \( -\sqrt{\frac{\lambda}{u}}(u + 1) \to -\infty\), and hence \( G(u) \to 0 \). Of course, \( G \) is clearly continuous on \( (0, \infty) \), so it remains to show that \( G \) is increasing on this interval. Differentiating gives \[ G^\prime(u) = \phi\left[\sqrt{\frac{\lambda}{u}}(u - 1)\right] \left[\frac{\sqrt{\lambda}}{2}\left(u^{-1/2} + u^{-3/2}\right)\right] + e^{2 \lambda} \phi\left[-\sqrt{\frac{\lambda}{u}}(u + 1)\right] \left[-\frac{\sqrt{\lambda}}{2}\left(u^{-1/2} - u^{-3/2}\right)\right] \] where \( \phi(z) = \frac{1}{\sqrt{2 \pi}} e^{-z^2/2} \) is the standard normal PDF. Simple algebra shows that \[ \phi\left[\sqrt{\frac{\lambda}{u}}(u - 1)\right] = e^{2 \lambda} \phi\left[-\sqrt{\frac{\lambda}{u}}(u + 1)\right] = \frac{1}{\sqrt{2 \pi}} \exp\left[-\frac{\lambda}{2 u} (u - 1)^2\right]\] so simplifying further gives \[ G^\prime(u) = \sqrt{\frac{\lambda}{2 \pi}} u^{-3/2} \exp\left[-\frac{\lambda}{2 u} (u - 1)^2\right] \gt 0, \quad u \in (0, \infty)\]

In the special case \( \lambda = 1 \), \( U \) has the standard Wald distribution

\( U \) has probability density function \( g \) given by \[ g(u) = \sqrt{\frac{\lambda}{2 \pi u^3}} \exp\left[-\frac{\lambda}{2 u}(u - 1)^2\right], \quad u \in (0, \infty) \]

  1. \( g \) increases and then decreases with mode \( u_0 = \sqrt{1 + \left(\frac{3}{2 \lambda}\right)^2} - \frac{3}{2 \lambda} \)
  2. \( g \) is concave upward then downward then upward again. The inflection points are very complicated functions of \( \lambda \).
Proof:

The formula for the PDF follows immediately from the proof of the distribution function above, since \( g = G^\prime \). The first order properties come from \[ g^\prime(u) = \sqrt{\frac{\lambda}{8 \pi u^7}} \exp\left[-\frac{\lambda}{2 u}(u - 1)^2\right] \left[\lambda(1 - u^2) - 3 u\right], \quad u \in (0, \infty) \] and the second order properties from \[ g^{\prime\prime}(u) = \sqrt{\frac{\lambda}{32 \pi u^{11}}} \exp\left[-\frac{\lambda}{2 u}(u - 1)^2\right]\left[15 u^2 + \lambda^2 (u^2 - 1)^2 + 2 \lambda u(3 u^2 - 5)\right], \quad u \in (0, \infty) \]

Thus \( g \) has the classic unimodal shape. For the mode, note that \( u_0 \downarrow 0 \) as \( \lambda \downarrow 0 \) and \( u_0 \uparrow 1 \) as \( \lambda \uparrow \infty \).

Open the special distribution simulator and select the Wald distribution. Vary the shape parameter and note the shape of the probability density function. For various values of the parameter, run the simulation 1000 times and compare the empirical density function to the probability density function.

The quantile function of the standard Wald distribution does not have a simple closed form, so the median and other quantiles must be approximated.

Open the special distribution calculator and select the Wald distribution. Vary the shape parameter and note the shape of the distribution and probability density functions. For selected values of the parameter, compute approximate values of the first quartile, the median, and the third quartile.

Moments

Suppose again that random variable \( U \) has the standard Wald distribution with shape parameter \( \lambda \in (0, \infty) \).

\( U \) has moment generating function \( m \) given by \[ m(t) = \E\left(e^{t U}\right) = \exp\left[\lambda \left(1 - \sqrt{1 - \frac{2 t}{\lambda}}\right)\right], \quad t \lt \frac{\lambda}{2} \]

Since the moment generating function is finite in an interval containing 0, the basic Wald distribution has moments of all orders.

The mean and variance of \( U \) are

  1. \( \E(U) = 1 \)
  2. \( \var(U) = \frac{1}{\lambda} \)
Proof:

Differentiating gives \begin{align} m^\prime(t) & = \exp\left[\lambda \left(1 - \sqrt{1 - \frac{2 t}{\lambda}}\right)\right] \left(1 - \frac{2 t}{\lambda}\right)^{-1/2} \\ m^{\prime\prime}(t) &= \exp\left[\lambda \left(1 - \sqrt{1 - \frac{2 t}{\lambda}}\right)\right] \left[\left(1 - \frac{2 t}{\lambda}\right)^{-1} + \frac{1}{\lambda} \left(1 - \frac{2 t}{\lambda}\right)^{-3/2}\right] \end{align} and hence \( \E(U) = m^\prime(0) = 1 \) and \( \E\left(U^2\right) = m^{\prime\prime}(0) = 1 + \frac{1}{\lambda} \).

So interestingly, the mean is 1 for all values of the shape parameter, while \( \var(U) \to \infty \) as \( \lambda \downarrow 0 \) and \( \var(U) \to 0 \) as \( \lambda \to \infty \).

Open the special distribution simulator and select the Wald distribution. Vary the shape parameter and note the size and location of the mean \( \pm \) standard deviation bar. For various values of the parameters, run the simulation 1000 times and compare the empirical mean and standard deviation to the distribution mean and standard deviation.

The skewness and kurtosis of \( U \) are

  1. \( \skw(U) = 3 / \sqrt{\lambda} \)
  2. \( \kur(U) = 3 + 15 / \lambda \)

It follows that the excess kurtosis is \( \kur(U) - 3 = 15 / \lambda \). Note that \( \skw(U) \to \infty \) as \( \lambda \downarrow 0 \) and \( \skw(U) \to 0 \) as \( \lambda \to \infty \). Similarly, \( \kur(U) \to \infty \) as \( \lambda \downarrow 0 \) and \( \kur(U) \to 3 \) as \( \lambda \to \infty \).

The General Wald Distribution

The basic Wald distribution is generalized into a scale family.

Suppose that \( \lambda, \, \mu \in (0, \infty) \) and that \( U \) has the basic Wald distribution with shape parameter \( \lambda / \mu \). Then \( X = \mu U \) has the Wald distribution with shape parameter \( \lambda \) and mean \( \mu \).

Justification for the name of the parameter \( \mu \) as the mean is given below. Note that the generalization is consistent—when \( \mu = 1 \) we have the basic Wald distribution with shape parameter \( \lambda \).

Distribution Functions

Suppose that \( X \) has the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty) \).

\( X \) has distribution function \( F \) given by \[ F(x) = \Phi\left[\sqrt{\frac{\lambda}{x}} \left(\frac{x}{\mu} - 1\right)\right] + \exp\left(\frac{2 \lambda}{\mu}\right) \Phi\left[-\sqrt{\frac{\lambda}{x}} \left(\frac{x}{\mu} + 1\right)\right], \quad x \in (0, \infty) \] where again, \( \Phi \) is the standard normal distribution function.

Proof:

Recall that the CDF \( F \) of \( X \) is related to the CDF \( G \) of \( U \) by \[ F(x) = G\left(\frac{x}{\mu}\right), \quad x \in (0, \infty) \] so the result follows from the CDF above, with \( \lambda \) replaced by \( \lambda / \mu \), and \( x \) with \( x / \mu \).

\( X \) has probability density function \( f \) given by \[ f(x) = \sqrt{\frac{\lambda}{2 \pi x^3}} \exp\left[\frac{-\lambda (x - \mu)^2}{2 \mu^2 x}\right], \quad x \in (0, \infty) \]

  1. \( f \) increases and then decreases with mode \( x_0 = \mu\left[\sqrt{1 + \left(\frac{3 \mu}{2 \lambda}\right)^2} - \frac{3 \mu}{2 \lambda}\right] \)
  2. \( f \) is concave upward then downward then upward again. The inflection points are complicated functions of the parameters.
Proof:

Recall that the PDF \( f \) of \( X \) is related to the PDF \( g \) of \( U \) by \[ f(x) = \frac{1}{\mu} g\left(\frac{x}{\mu}\right), \quad x \in (0, \infty) \] Hence the result follows from the PDF above with \( \lambda \) replaced by \( \lambda / \mu \) and \( x \) with \( x / \mu \).

Open the special distribution simulator and select the Wald distribution. Vary the parameters and note the shape of the probability density function. For various values of the parameters, run the simulation 1000 times and compare the empirical density function to the probability density function.

Again, the quantile function cannot be expressed in a simple closed form, so the median and other quantiles must be approximated.

Open the special distribution calculator and select the Wald distribution. Vary the parameters and note the shape of the distribution and density functions. For selected values of the parameters, compute approximate values of the first quartile, the median, and the third quartile.

Moments

Suppose again that \( X \) has the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty) \)

\( X \) has moment generating function \( M \) given by \[ M(t) = \exp\left[\frac{\lambda}{\mu} \left(1 - \sqrt{1 - \frac{2 \mu^2 t}{\lambda}}\right)\right], \quad t \lt \frac{\lambda}{2 \mu^2} \]

Proof:

Recall that the MGF \( M \) of \( X \) is related to the MGF \( m \) of \( U \) by \( M(t) = m(t \mu) \). Hence the result follows from the result above with \( \lambda \) replaced by \( \lambda / \mu \) and \( t \) with \( t \mu \).

As promised, the parameter \( \mu \) is the mean of Wald distribution.

The mean and variance of \( X \) are

  1. \( \E(X) = \mu \)
  2. \( \var(X) = \mu^3 / \lambda \)
Proof:

From the results above and basic properties of expected value and variance, we have \( \E(X) = \mu \E(U) = \mu \cdot 1 \) and \( \var(X) = \mu^2 \var(U) = \mu^2 \frac{\mu}{\lambda} \).

Open the special distribution simulator and select the Wald distribution. Vary the parameters and note the size and location of the mean \( \pm \) standard deviation bar. For various values of the parameters, run the simulation 1000 times and compare the empirical mean and standard deviation to the distribution mean and standard deviation.

The skewness and kurtosis of \( X \) are

  1. \( \skw(X) = 3 \sqrt{\mu / \lambda} \)
  2. \( \kur(U) = 3 + 15 \mu / \lambda \)
Proof:

Skewness and kurtosis are invariant under scale transformations, so \( \skw(X) = \skw(U) \) and \( \kur(X) = \kur(U) \). The results then follow from the results above, with \( \lambda \) replaced by \( \lambda / \mu \).

Related Distribution

As noted earlier, the Wald distribution is a scale family, although neither of the parameters is a scale parameter.

Suppose that \( X \) has the Wald distribution with shape parameters \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty) \) and that \( c \in (0, \infty) \). Then \( Y = c X \) has the Wald distribution with shape parameter \( c \lambda \) and mean \( c \mu \).

Proof:

The PDF \( h \) of \( Y \) is related to the PDF \( f \) of \( X \), given above, by \[ h(x) = \frac{1}{c} f\left(\frac{x}{c}\right), \quad x \in (0, \infty) \] Substituting and simplifying gives the result.

For the next result, it's helpful to re-parameterize the Wald distribution with the mean \( \mu \) and the ratio \( r = \lambda / \mu^2 \in (0, \infty) \). This parametrization is clearly equivalent, since we can recover the shape parameter from the mean and ratio as \( \lambda = r \mu^2 \). Note also that \( r = \E(X) \big/ \var(X) \), the ratio of the mean to the variance. Finally, note that the moment generating function above becomes \[ M(t) = \exp\left[r \mu \left(1 - \sqrt{1 - \frac{2}{r}t}\right)\right], \quad t \lt \frac{r}{2} \] and of course, this function characterizes the Wald distribution with this parametrization. Our next result is that the Wald distribution is closed under convolution (corresponding to sums of independent variables) when the ratio is fixed.

Suppose that \( X_1 \) has the Wald distribution with mean \( \mu_1 \in (0, \infty) \) and ratio \( r \in (0, \infty) \); \( X_2 \) has the Wald distribution with mean \( \mu_2 \in (0, \infty) \) and ratio \( r \); and that \( X_1 \) and \( X_2 \) are independent. Then \( Y = X_1 + X_2 \) has the Wald distribution with mean \( \mu_1 + \mu_2 \) and ratio \( r \).

Proof:

For \( i \in \{1, 2\} \), the MGF of \( X_i \) is \[ M_i(t) = \exp\left[r \mu_i \left(1 - \sqrt{1 - \frac{2}{r}t}\right)\right], \quad t \lt \frac{r}{2} \] Hence the MGF of \( Y = X_i + X_2 \) is \[ M(t) = M_1(t) M_2(t) = \exp\left[r\left(\mu_1 + \mu_2\right) \left(1 - \sqrt{1 - \frac{2}{r}t}\right)\right], \quad t \lt \frac{r}{2} \] Hence \( Y \) has the Wald distribution with mean \( \mu_1 + \mu_2 \) and ratio \( r \).

In the previous result, note that the shape parameter of \( X_1 \) is \(r \mu_1^2\), the shape parameter of \( X_2 \) is \( r \mu_2^2 \), and the shape parameter of \( Y \) is \( \lambda = r (\mu_1 + \mu_2)^2 \). Also, of course, the result generalizes to a sum of any finite number of independent Wald variables, as long as the ratio is fixed. The next couple of results are simple corollaries.

Suppose that \( (X_1, X_2, \ldots, X_n) \) is a sequence of independent variables, each with the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty) \). Then

  1. \( Y_n = \sum_{i=1}^n X_i \) has the Wald distribution with shape parameter \( n^2 \lambda \) and mean \( n \mu \).
  2. \( M_n = \frac{1}{n} \sum_{i=1}^n X_i \) has the Wald distribution with shape parameter \( n \lambda \) and mean \( \mu \).
Proof:
  1. This follows from the previous result and induction. The mean of \( Y_n \) of course is \( n \mu \). The common ratio is \( r = \lambda / \mu^2 \), and hence the shape parameter of \( Y_n \) is \( (\lambda / \mu^2) (n \mu)^2 = n^2 \lambda \).
  2. This follows from (a) and the scaling result above. The mean of \( M_n \) of course is \( \mu \) and the shape parameter is \( (1 / n) (n^2 \lambda) = n \lambda \).

In the context of the previous theorem, \( (X_1, X_2, \ldots, X_n) \) is a random sample of size \( n \) from the Wald distribution, and \( \frac{1}{n} \sum_{i=1}^n X_i \) is the sample mean. The Wald distribution is infinitely divisible:

Suppose that \( X \) has the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \in (0, \infty)\). For every \( n \in \N_+ \), \( X \) has the same distribution as \( \sum_{i=1}^n X_i \) where \( (X_1, X_2, \ldots, X_n) \) are independent, and each has the Wald distribution with shape parameter \( \lambda / n^2 \) and mean \( \mu / n \).

The Lévy distribution is related to the Wald distribution, not surprising since the Lévy distribution governs the first time that a standard Brownian motion hits a fixed positive value.

For fixed \( \lambda \in (0, \infty) \), the Wald distribution with shape parameter \( \lambda \) and mean \( \mu \in (0, \infty) \) converges to the Lévy distribution with location parameter 0 and scale parameter \(\lambda \) as \( \mu \to \infty \).

Proof:

From the formula for the CDF above, note that \[ F(x) \to \Phi\left(-\sqrt{\frac{\lambda}{x}}\right) + \Phi\left(-\sqrt{\frac{\lambda}{x}}\right) = 2 \Phi\left(-\sqrt{\frac{\lambda}{x}}\right) = 2\left[1 - \Phi\left(\sqrt{\frac{\lambda}{x}}\right)\right] \text{ as } \mu \to \infty \] But the last expression is the distribution function of the Lévy distribution with location parameter 0 and shape parameter \( \lambda \).

The other limiting distribution, this time with the mean fixed, is also interesting.

For fixed \( \mu \in (0, \infty) \), the Wald distribution with shape parameter \( \lambda \in (0, \infty) \) and mean \( \mu \) converges to the normal distribution with mean \( \mu \) and variance 1 as \( \lambda \to \infty \).

Proof:

This time, it's better to use the moment generating function \( M \) above. By rationalizing we see that for fixed \( \mu \in (0, \infty)\) and \( t \in \R \) \[ \frac{\lambda}{\mu} \left(1 - \sqrt{1 - \frac{2 \mu^2 t}{\lambda}}\right) = \frac{2 \mu t}{1 + \sqrt{1 - 2 \mu^2 t / \lambda}} \to \mu t \text{ as } \lambda \to \infty\] Hence \( M(t) \to e^{\mu t} \) as \( \lambda \to \infty \) and the limit is the MGF of the normal distribution with mean \( \mu \) and variance 1.

Finally, the Wald distribution is a member of the general exponential family of distributions.

The Wald distribution is a general exponential distribution with natural parameters \( -\lambda / (2 \mu^2) \) and \( -\lambda / 2 \), and natural statistics \( X \) and \( 1 / X \).

Proof:

This follows from the PDF \( f \) above. If we expand the square and simplify, we can write \( f \) in the form \[ f(x) = \sqrt{\frac{\lambda}{2 \pi}} \exp\left(\frac{\lambda}{2 \mu}\right) x^{-3/2} \exp\left(-\frac{\lambda}{2 \mu^2} x - \frac{\lambda}{2} \frac{1}{x}\right), \quad x \in (0, \infty) \]