The Student t Distribution

In this section we will study a distribution that has special importance in statistics. In particular, this distribution will arise in the study of a standardized version of the sample mean when the underlying distribution is normal.

Basic Theory

Definition

Suppose that \(Z\) has the standard normal distribution, \(V\) has the chi-squared distribution with \(n \in (0, \infty)\) degrees of freedom, and that \(Z\) and \(V\) are independent. Random variable \[ T = \frac{Z}{\sqrt{V / n}} \] has the student \(t\) distribution with \(n\) degrees of freedom.

The student \( t \) distribution is well defined for any \(n \gt 0\), but in usual practice, only positive integer values of \(n\) are of interest. This distribution was first studied by William Gosset, who published under the pseudonym Student.

Distribution Functions

Recall that the gamma function \(\Gamma\) is the special function defined by \[ \Gamma(k) = \int_0^\infty x^{k-1} e^{-x} \, dx, \quad k \in (0, \infty) \]

Suppose that \(T\) has the \( t \) distribution with \( n \in (0, \infty) \) degrees of freedom. Then \( T \) has a continuous distribution on \( \R \) with probability density function \(f\) given by \[ f(t) = \frac{\Gamma[(n + 1) / 2]}{\sqrt{n \pi} \, \Gamma(n / 2)} \left( 1 + \frac{t^2}{n} \right)^{-(n + 1) / 2}, \quad t \in \R \]

Details:

For \( v \gt 0 \), the conditional distribution of \( T \) given \( V = v \) is normal with mean 0 and variance \( n / v \). By definition, \( V \) has the chi-square distribution with \( n \) degrees of freedom. Hence, the joint PDF of \( (T, V) \) is \[ g(t, v) = \sqrt{\frac{v}{2 \pi n}} e^{-v z^2 / 2 n} \frac{1}{2^{n/2} \Gamma(n/2)} v^{n/2-1} e^{-v/2} = \frac{1}{2^{(n+1)/2} \sqrt{n \pi} \, \Gamma(n/2)} v^{(n+1)/2 - 1} e^{-v(1 + t^2/n)/2}, \quad t \in \R, \, v \in (0, \infty) \] The PDF of \( T \) is \[ f(t) = \int_0^\infty g(t, v) \, dv = \frac{1}{2^{(n+1)/2} \sqrt{n \pi} \, \Gamma(n/2)} \int_0^\infty v^{(n+1)/2 - 1} e^{-v(1 + t^2/n)/2} \, dv, \quad t \in \R \] Except for the missing normalizing constant, the integrand is the gamma PDF with shape parameter \( (n + 1)/2 \) and scale parameter \( 2 \big/ (1 + t^2/n) \). Hence \[ f(t) = \frac{1}{2^{(n+1)/2} \sqrt{n \pi} \, \Gamma(n/2)} \Gamma\left[(n + 1)/2\right] \left(\frac{2}{1 + t^2/n}\right)^{(n+1)/2}, \quad t \in \R\] Simplifying gives the result.

The proof of provides a good way of thinking of the \(t\) distribution: the distribution arises when the variance of a mean 0 normal distribution is randomized in a certain way.

In the special distribution simulator, select the student \(t\) distribution. Vary \(n\) and note the shape of the probability density function. For selected values of \(n\), run the simulation 1000 times and compare the empirical density function to the true probability density function.

The student \(t\) probability density function \( f \) with \( n \in (0, \infty) \) degrees of freedom in has the following properties:

\(f\) is symmetric about \(t = 0\).
\(f\) is increasing and then decreasing with mode \(t = 0\).
\(f\) is concave upward, then downward, then upward again with inflection points at \( \pm \sqrt{n / (n + 1)}\).
\(f(t) \to 0\) as \(t \to \infty\) and as \(t \to -\infty\).

In particular, the distribution is unimodal with mode and median at \(t = 0\). Note also that the inflection points converge to \( \pm 1 \) as \( n \to \infty \).

The distribution function and the quantile function of the general \(t\) distribution do not have simple, closed-form representations. Approximate values of these functions can be obtained from most mathematical and statistical software packages.

In the quantile app, select the student distribution. Vary the parameter and note the shape of the probability density, distribution, and quantile functions. In each of the following cases, find the first and third quartiles:

\(n = 2\)
\(n = 5\)
\(n = 10\)
\(n = 20\)

Moments

Suppose that \(T\) has a \(t\) distribution. The representation in definition can be used to find the mean, variance and other moments of \(T\). The main point to remember in the proofs that follow is that since \( V \) has the chi-square distribution with \( n \) degrees of freedom, \( E\left(V^k\right) = \infty \) if \( k \le -\frac{n}{2} \), while if \( k \gt -\frac{n}{2} \), \[ \E\left(V^k\right) = 2^k \frac{\Gamma(k + n / 2)}{\Gamma(n/2)} \]

Suppose that \(T\) has the \(t\) distribution with \(n \in (0, \infty)\) degrees of freedom. Then

\(\E(T)\) is undefined if \(0 \lt n \le 1\)
\(\E(T) = 0\) if \(1 \lt n \lt \infty\)

Details:

By independence, \( \E(T) = \sqrt{n} \E\left(V^{-1/2}\right) \E(Z) \). Of course \( \E(Z) = 0\). On the other hand, \( \E\left(V^{-1/2}\right) = \infty \) if \( n \le 1 \) and \( \E\left(V^{-1/2}\right) \lt \infty \) if \( n \gt 1 \).

Suppose again that \(T\) has the \(t\) distribution with \(n \in (0, \infty)\) degrees of freedom then

\(\var(T)\) is undefined if \(0 \lt n \le 1\)
\(\var(T) = \infty\) if \(1 \lt n \le 2\)
\(\var(T) = \frac{n}{n - 2}\) if \(2 \lt n \lt \infty\)

Details:

By independence, \( \E\left(T^2\right) = n \E\left(Z^2\right) \E\left(V^{-1}\right) \). Of course \( \E\left(Z^2\right) = 1 \). On the other hand, \( \E\left(V^{-1}\right) = \infty \) if \( n \le 2 \) and \( \E\left(V^{-1}\right) = 1 \big/ (n - 2) \) if \( n \gt 2 \). The results now follow from .

In the simulation of the special distribution simulator, select the student \(t\) distribution. Vary \(n\) and note the location and shape of the mean \( \pm \) standard deviation bar. For selected values of \(n\), run the simulation 1000 times and compare the empirical mean and standard deviation to the distribution mean and standard deviation.

Suppose again that \(T\) has the \(t\) distribution with \(n \in (0, \infty)\) degrees of freedom and \( k \in \N \). Then

\(\E\left(T^k\right)\) is undefined if \(k\) is odd and \(k \ge n\)
\(\E\left(T^k\right) = \infty\) if \(k\) is even and \(k \ge n\)
\(\E\left(T^k\right) = 0\) if \(k\) is odd and \(k \lt n\)
If \(k\) is even and \(k \lt n\) then \[ \E\left(T^k\right) = \frac{n^{k/2} 1 \cdot 3 \cdots (k - 1) \Gamma\left((n - k) \big/ 2\right)}{2^{k/2} \Gamma(n/2)} = \frac{n^{k/2} k! \Gamma\left((n - k)\big/2\right)}{2^k (k/2)! \Gamma(n/2)} \]

Details:

By independence, \( \E\left(T^k\right) = n^{k/2} \E\left(Z^k\right) \E\left(V^{-k/2}\right) \). Recall that \( \E\left(Z^k\right) = 0 \) if \( k \) is odd, while \[ \E\left(Z^k\right) = 1 \cdot 3 \cdots (k - 1) = \frac{k!}{(k/2)! 2^{k/2}} \] if \( k \) is even. Also, \( \E\left(V^{-k/2}\right) = \infty \) if \( k \ge n \), while \[ \E\left(V^{-k/2}\right) = \frac{2^{-k/2} \Gamma\left((n - k) \big/ 2\right)}{\Gamma(n/2)} \] if \( k \lt n \). The results now follow by considering the various cases.

Suppose again that \( T \) has the \( t \) distribution with \( n \in (0, \infty) \) degrees of freedom. Then

\( \skw(T) = 0 \) if \( n \gt 3 \)
\( \kur(T) = 3 + \frac{6}{n - 4} \) if \( n \gt 4 \)

Details:

This follows from the symmetry of the distribution of \( T \), although \( \skw(T) \) only exists if \( \E\left(T^3\right) \) exists.
For \( n \gt 4 \), \[ \kur(T) = \frac{\E(T^4)}{\left[\E\left(T^2\right)\right]^2} = \frac{3 n^2 \Gamma\left[(n - 4) / 2\right] \big/ 4 \Gamma(n/2)}{\left(n \big/ (n - 2) \right)^2} = \frac{3 (n - 2)^2 \Gamma\left[(n - 4) / 2\right]}{4 \Gamma(n/2)}\] But \( \Gamma(n/2) = (n/2 - 1) (n/2 - 2) \Gamma(n/2 - 2) \). Simplifying gives the result.

Note that \( \kur(T) \to 3 \) as \( n \to \infty \) and hence the excess kurtosis \( \kur(T) - 3 \to 0 \) as \( n \to \infty \).

In the special distribution simulator, select the student \(t\) distribution. Vary \(n\) and note the shape of the probability density function in light of the previous result on skewness and kurtosis. For selected values of \(n\), run the simulation 1000 times and compare the empirical density function to the true probability density function.

Since \( T \) does not have moments of all orders, there is no interval about 0 on which the moment generating function of \( T \) is finite. The characteristic function exists, of course, but has no simple representation, except in terms of special functions.

Related Distributions

The \(t\) distribution with 1 degree of freedom is known as the Cauchy distribution. The probability density function is \[ f(t) = \frac{1}{\pi (1 + t^2)}, \quad t \in \R \]

You probably noticed that, qualitatively at least, the \(t\) probability density function is very similar to the standard normal probability density function. The similarity is quantitative as well:

Let \( f_n \) denote the \( t \) probability density function with \( n \in (0, \infty) \) degrees of freedom, given in . Then for fixed \(t \in \R\), \[ f_n(t) \to \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} t^2} \text{ as } n \to \infty \]

Details:

From a basic limit theorem in calculus, \[ \left( 1 + \frac{t^2}{n} \right)^{-(n + 1) / 2} \to e^{-t^2/2} \text{ as } n \to \infty \] An application of Stirling's approximation shows that \[ \frac{\Gamma[(n + 1) / 2]}{\sqrt{n \pi} \, \Gamma(n / 2)} \to \frac{1}{\sqrt{2 \pi}} \text{ as } n \to \infty \]

Note that the function on the right is the probability density function of the standard normal distribution. We can also get convergence of the \( t \) distribution to the standard normal distribution from the basic random variable representation in definition .

Suppose that \( T_n \) has the \( t \) distribution with \( n \in \N_+ \) degrees of freedom, so that we can represent \( T_n \) as \[ T_n = \frac{Z}{\sqrt{V_n / n}} \] where \( Z \) has the standard normal distribution, \( V_n \) has the chi-square distribution with \( n \) degrees of freedom, and \( Z \) and \( V_n \) are independent. Then \( T_n \to Z \) as \( n \to \infty \) with probability 1.

Details:

We can represent \( V_n \) as \( V_n = Z_1^2 + Z_2^2 + \cdots Z_n^2 \) where \( (Z_1, Z_2, \ldots, Z_n) \) are independent, standard normal variables, independent of \( Z \). Note that \(V_n / n \to 1\) as \(n \to \infty\) with probability 1 by the strong law of large numbers.

The \(t\) distribution has more probability in the tails, and consequently less probability near 0, compared to the standard normal distribution.

The Non-Central \( t \) Distribution

One natural way to generalize the student \( t \) distribution is to replace the standard normal variable \( Z \) in definition in with a normal variable having an arbitrary mean (but still unit variance). The reason this particular generalization is important is because it arises in hypothesis tests about the mean based on a random sample from the normal distribution, when the null hypothesis is false. For details see the sections on tests in the normal model and tests in the bivariate normal model in the chapter on hypothesis testing.

Suppose that \(Z\) has the standard normal distribution, \( \mu \in \R \), \(V\) has the chi-squared distribution with \(n \in (0, \infty)\) degrees of freedom, and that \(Z\) and \(V\) are independent. Random variable \[ T = \frac{Z + \mu}{\sqrt{V / n}} \] has the non-central student \(t\) distribution with \(n\) degrees of freedom and non-centrality parameter \( \mu \).

The standard functions that characterize a distribution—the probability density function, distribution function, and quantile function—do not have simple representations for the non-central \( t \) distribution, but can only be expressed in terms of other special functions. Similarly, the moments do not have simple, closed form expressions either. For the beginning student of statistics, the most important fact is that the probability density function of the non-central \( t \) distribution is similar (but not exactly the same) as that of the standard \( t \) distribution (with the same degrees of freedom), but shifted and scaled. The density function is shifted to the right or left, depending on whether \( \mu \gt 0 \) or \( \mu \lt 0 \).

Computational Exercises

Suppose that \(T\) has the \(t\) distribution with \(n = 10\) degrees of freedom. For each of the following, compute the true value using the quantile app and then compute the normal approximation. Compare the results.

\(\P(-0.8 \lt T \lt 1.2)\)
The 90th percentile of \(T\).

Details:

\(\P(-0.8 \lt T \lt 1.2) = 0.650\), \(\P(-0.8 \lt T \lt 1.2) \approx 0.673\)
\(x_{0.90} = 1.372\), \(x_{0.90} \approx 1.281\)