The normal distribution holds an honored role in probability and statistics, mostly because of the central limit theorem, one of the fundamental theorems that forms a bridge between the two subjects. In addition, as we will see, the normal distribution has many nice mathematical properties. The normal distribution is also called the Gaussian distribution, in honor of Carl Friedrich Gauss, who was among the first to use the distribution.
The standard normal distribution is a continuous distribution on \( \R \) with probability density function \(\phi\) given by \[ \phi(z) = \frac{1}{\sqrt{2 \pi}} e^{-z^2 / 2}, \quad z \in \R \]
Let \(c = \int_{-\infty}^{\infty} e^{-z^2 / 2} dz\). We need to show that \( c = \sqrt{2 \pi} \). That is, \(\sqrt{2 \pi}\) is the normalzing constant for the function \(z \mapsto e^{-z^2 / 2}\). The proof uses a nice trick: \[ c^2 = \int_{-\infty}^\infty e^{-x^2 / 2} \, dx \int_{-\infty}^\infty e^{-y^2 / 2} \, dy = \int_{\R^2} e^{-(x^2 + y^2) / 2} \, d(x, y)\] We now convert the double integral to polar coordinates: \( x = r \cos \theta \), \( y = r \sin \theta \) where \( r \in [0, \infty) \) and \( \theta \in [0, 2 \pi) \). So, \( x^2 + y^2 = r^2 \) and \( d(x, y) = r \, d(r, \theta) \). Thus, converting back to iterated integrals, \[ c^2 = \int_0^{2 \pi} \int_0^\infty r e^{-r^2 / 2} \, dr \, d\theta \] Substituting \( u = r^2 / 2 \) in the inner integral gives \( \int_0^\infty e^{-u} \, du = 1 \) and then the outer integral is \( \int_0^{2 \pi} 1 \, d\theta = 2 \pi \). Thus, \( c^2 = 2 \pi \) and so \( c = \sqrt{2 \pi} \).
The standard normal probability density function has the famous bell shape
that is known to just about everyone.
The standard normal density function \(\phi\) satisfies the following properties:
These results follow from standard calculus. Note that \(\phi^\prime(z) = - z \phi(z)\) (which gives (b)) and hence also \( \phi^{\prime \prime}(z) = (z^2 - 1) \phi(z) \) (which gives (c)).
In the Special Distribution Simulator, select the normal distribution and keep the default settings. Note the shape and location of the standard normal density function. Run the simulation 1000 times, and compare the empirical density function to the probability density function.
The standard normal distribution function \(\Phi\), given by \[ \Phi(z) = \int_{-\infty}^z \phi(t) \, dt = \int_{-\infty}^z \frac{1}{\sqrt{2 \pi}} e^{-t^2 / 2} \, dt \] and its inverse, the quantile function \(\Phi^{-1}\), cannot be expressed in closed form in terms of elementary functions. However approximate values of these functions can be obtained from the quantile app, and from most mathematics and statistics software. Indeed these functions are so important that they are considered special functions of mathematics.
The standard normal distribution function \(\Phi\) satisfies the following properties:
Part (a) follows from the symmetry of \( \phi \). Part (b) follows from part (a). Part (c) follows from part (a) with \( z = 0 \).
In the quantile app, select the normal distribution and keep the default settings.
In the quantile app, select the normal distribution and keep the default settings. Find the quantiles of the following orders for the standard normal distribution:
Suppose that random variable \( Z \) has the standard normal distribution.
The mean and variance of \( Z \) are
In the Special Distribution Simulator, select the normal distribution and keep the default settings. Note the shape and size of the mean \( \pm \) standard deviation bar.. Run the simulation 1000 times, and compare the empirical mean and standard deviation to the true mean and standard deviation.
More generally, we can compute all of the moments. The key is the following recursion formula.
For \( n \in \N_+ \), \( \E\left(Z^{n+1}\right) = n \E\left(Z^{n-1}\right) \)
First we use the differential equation in the proof of above, namely \( \phi^\prime(z) = - z \phi(z) \). \[ \E\left(Z^{n+1}\right) = \int_{-\infty}^\infty z^{n+1} \phi(z) \, dz = \int_{-\infty}^\infty z^n z \phi(z) \, dz = - \int_{-\infty}^\infty z^n \phi^\prime(z) \, dz \] Now we integrate by parts, with \( u = z^n \) and \( dv = \phi^\prime(z) \, dz \) to get \[ \E\left(Z^{n+1}\right) = -z^n \phi(z) \bigg|_{-\infty}^\infty + \int_{-\infty}^\infty n z^{n-1} \phi(z) \, dz = 0 + n \E\left(Z^{n-1}\right) \]
The moments of the standard normal distribution are now easy to compute.
For \(n \in \N\),
The result follows from the mean and variance in and recursion relation in .
Of course, the fact that the odd-order moments are 0 also follows from the symmetry of the distribution. The following theorem gives the skewness and kurtosis of the standard normal distribution.
The skewness and kurtosis of \( Z \) are
Because of the last result, (and the use of the standard normal distribution literally as a standard), the excess kurtosis of a random variable is defined to be the ordinary kurtosis minus 3. Thus, the excess kurtosis of the normal distribution is 0.
Many other important properties of the normal distribution are most easily obtained using the moment generating function or the characteristic function.
The moment generating function \( m \) and characteristic function \( \chi \) of \( Z \) are given by
Thus, the standard normal distribution has the curious property that the characteristic function is a multiple of the probability density function: \[ \chi = \sqrt{2 \pi} \phi \] The moment generating function can be used to give another derivation of the moments of \( Z \), since we know that \( \E\left(Z^n\right) = m^{(n)}(0) \).
The general normal distribution is the location-scale family associated with the standard normal distribution.
Suppose that \(\mu \in \R\) and \( \sigma \in (0, \infty) \) and that \(Z\) has the standard normal distribution. Then \(X = \mu + \sigma Z\) has the normal distribution with location parameter \(\mu\) and scale parameter \(\sigma\).
Suppose that \( X \) has the normal distribution with location parameter \( \mu \in \R \) and scale parameter \( \sigma \in (0, \infty) \). The basic properties of the density function and distribution function of \( X \) follow from general results for location scale families.
The probability density function \(f\) of \( X \) is given by \[ f(x) = \frac{1}{\sigma} \phi\left(\frac{x - \mu}{\sigma}\right) = \frac{1}{\sqrt{2 \, \pi} \, \sigma} \exp \left[ -\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2 \right], \quad x \in \R \]
This follows from the change of variables formula corresponding to the transformation \( x = \mu + \sigma z \).
The probability density function \(f\) satisfies the following properties:
In the special distribution simulator, select the normal distribution. Vary the parameters and note the shape and location of the probability density function. With your choice of parameter settings, run the simulation 1000 times and compare the empirical density function to the true probability density function.
Let \(F\) denote the distribution function of \( X \), and as above, let \(\Phi\) denote the standard normal distribution function.
The distribution function \(F\) and quantile function \( F^{-1} \) satsify the following properties:
Part (a) follows since \( X = \mu + \sigma Z \). Parts (b) and (c) follow from (a).
In the quantile app, select the normal distribution. Vary the parameters and note the shape of the density function and the distribution function.
Suppose again that \( X \) has the normal distribution with location parameter \( \mu \in \R \) and scale parameter \( \sigma \in (0, \infty) \). As the notation suggests, the location and scale parameters are also the mean and standard deviation, respectively.
The mean and variance of \( X \) are
This follows from the representation \( X = \mu + \sigma Z \) and basic properties of expected value and variance.
So the parameters of the normal distribution are usually referred to as the mean and standard deviation rather than location and scale. The central moments of \(X\) can be computed easily from the moments of the standard normal distribution. The ordinary (raw) moments of \(X\) can be computed from the central moments, but the formulas are a bit messy.
For \(n \in \N\),
All of the odd central moments of \(X\) are 0, a fact that also follows from the symmetry of the probability density function.
In the special distribution simulator select the normal distribution. Vary the mean and standard deviation and note the size and location of the mean \(\pm\) standard deviation bar. With your choice of parameter settings, run the simulation 1000 times and compare the empirical mean and standard deviation to the true mean and standard deviation.
The following result gives the skewness and kurtosis.
The skewness and kurtosis of \( X \) are
The moment generating function \( M \) and characteristic function \( \chi \) of \( X \) are given by
The normal family of distributions satisfies two very important properties: invariance under linear transformations of the variable and invariance with respect to sums of independent variables. The first property is essentially a restatement of the fact that the normal distribution is a location-scale family.
Suppose that \(X\) is normally distributed with mean \(\mu\) and variance \(\sigma^2\). If \(a \in \R\) and \(b \in \R \setminus \{0\}\), then \(a + b X\) is normally distributed with mean \(a + b \mu\) and variance \(b^2 \sigma^2\).
The MGF of \(a + b X\) is \[ \E\left[e^{t (a + b X)}\right] = e^{ta} \E\left[e^{(t b) X}\right] = e^{ta} e^{\mu (t b) + \sigma^2 (t b)^2 / 2} = e^{(a + b \mu)t + b^2 \sigma^2 t^2 / 2} \] which we recognize as the MGF of the normal distribution with mean \(a + b \mu\) and variance \(b^2 \sigma^2\).
Recall that in general, if \(X\) is a random variable with mean \(\mu\) and standard deviation \(\sigma \gt 0\), then \(Z = (X - \mu) / \sigma\) is the standard score of \(X\). A corollary of the last result is that if \(X\) has a normal distribution then the standard score \(Z\) has a standard normal distribution. Conversely, any normally distributed variable can be constructed from a standard normal variable.
Standard score.
Suppose that \(X_1\) and \(X_2\) are independent random variables, and that \(X_i\) is normally distributed with mean \(\mu_i\) and variance \(\sigma_i^2\) for \(i \in \{1, 2\}\). Then \(X_1 + X_2\) is normally distributed with
The MGF of \(X_1 + X_2\) is the product of the MGFs, so \[ \E\left(\exp\left[t (X_1 + X_2)\right]\right) = \exp\left(\mu_1 t + \sigma_1^2 t^2 / 2\right) \exp\left(\mu_2 t + \sigma_2^2 t^2 / 2\right) = \exp\left[\left(\mu_1 + \mu_2\right)t + \left(\sigma_1^2 + \sigma_2^2\right) t^2 / 2\right] \] which we recognize as the MGF of the normal distribution with mean \(\mu_1 + \mu_2\) and variance \(\sigma_1^2 + \sigma_2^2\).
Theorem generalizes to a sum of \(n\) independent, normal variables. The important part is that the sum is still normal; the expressions for the mean and variance are standard results that hold for the sum of independent variables generally. As a consequence of this result and the one for linear transformations , it follows that the normal distribution is stable.
The normal distribution is stable. Specifically, suppose that \( X \) has the normal distribution with mean \( \mu \in \R \) and variance \( \sigma^2 \in (0, \infty)\). If \( (X_1, X_2, \ldots, X_n) \) are independent copies of \( X \), then \( X_1 + X_2 + \cdots + X_n \) has the same distribution as \( \left(n - \sqrt{n}\right) \mu + \sqrt{n} X \), namely normal with mean \( n \mu \) and variance \( n \sigma^2 \).
By \( X_1 + X_2 + \cdots + X_n \) has the normal distribution with mean \( n \mu \) and variance \( n \sigma^2 \). By , \( \left(n - \sqrt{n}\right) \mu + \sqrt{n} X \) has the normal distribution with mean \( \left(n - \sqrt{n}\right) \mu + \sqrt{n} \mu = n \mu \) and variance \( \left(\sqrt{n}\right)^2 \sigma^2 = n \sigma^2 \).
All stable distributions are infinitely divisible, so the normal distribution belongs to this family as well. For completeness, here is the explicit statement:
The normal distribution is infinitely divisible. Specifically, if \( X \) has the normal distribution with mean \( \mu \in \R \) and variance \( \sigma^2 \in (0, \infty) \), then for \( n \in \N_+ \), \( X \) has the same distribution as \( X_1 + X_2 + \cdots + X_n\) where \( (X_1, X_2, \ldots, X_n) \) are independent, and each has the normal distribution with mean \( \mu / n \) and variance \( \sigma^2 / n \).
Finally, the normal distribution belongs to the family of general exponential distributions.
The normal distribution with mean \(\mu\) and variance \(\sigma^2\) is a two-parameter exponential family with natural parameters \(\left( \frac{\mu}{\sigma^2}, -\frac{1}{2 \, \sigma^2} \right)\), and natural statistics \(\left(X, X^2\right)\).
Expanding the square, the normal PDF can be written in the form \[ f(x) = \frac{1}{\sqrt{2 \pi} \sigma} \exp\left(-\frac{\mu^2}{2 \sigma^2}\right) \exp\left(\frac{\mu}{\sigma^2} x - \frac{1}{2 \sigma^2} x^2 \right), \quad x \in \R\] so the result follows from the definition of the general exponential family.
A number of other special distributions studied in this chapter are constructed from normally distributed variables. These include
Also, as mentioned at the beginning of this section, the importance of the normal distribution stems in large part from the central limit theorem, one of the fundamental theorems of probability. By virtue of this theorem, the normal distribution is connected to many other distributions, by means of limits and approximations, including the special distributions in the following list. Details are given in the individual sections.
Suppose that the volume of beer in a bottle of a certain brand is normally distributed with mean 0.5 liter and standard deviation 0.01 liter.
Let \(X\) denote the volume of beer in liters
A metal rod is designed to fit into a circular hole on a certain assembly. The radius of the rod is normally distributed with mean 1 cm and standard deviation 0.002 cm. The radius of the hole is normally distributed with mean 1.01 cm and standard deviation 0.003 cm. The machining processes that produce the rod and the hole are independent. Find the probability that the rod is to big for the hole.
Let \(X\) denote the radius of the rod and \(Y\) the radius of the hole. \(\P(Y - X \lt 0) = 0.0028\)
The weight of a peach from a certain orchard is normally distributed with mean 8 ounces and standard deviation 1 ounce. Find the probability that the combined weight of 5 peaches exceeds 45 ounces.
Let \(X\) denote the combined weight of the 5 peaches, in ounces. \(\P(X \gt 45) = 0.0127\)
In some settings, it's convenient to consider a constant as having a normal distribution (with mean being the constant and variance 0, of course). This convention simplifies the statements of theorems and definitions in these settings. Of course, the formulas for the probability density function and the distribution function do not hold for a constant, but the other results involving the moment generating function , linear transformations , and sums are still valid. Moreover, the result for linear transformations would hold for all \(a\) and \(b\).