Recall the basic model of statistics: we have a population of objects of interest, and we have various measurements (variables) that we make on these objects. We select objects from the population and record the variables for the objects in the sample; these become our data. Once again, our first discussion is from a descriptive point of view. That is, we do not assume that the data are generated by an underlying probability distribution. Remember however, that the data themselves form a probability distribution.
Suppose that \(\bs{x} = (x_1, x_2, \ldots, x_n)\) is a sample of size \(n\) from a real-valued variable \(x\). Recall that the sample mean is \[ m = \frac{1}{n} \sum_{i=1}^n x_i \] and is the most important measure of the center of the data set.
The sample variance is defined to be \[ s^2 = \frac{1}{n - 1} \sum_{i=1}^n (x_i - m)^2 \] The square root \(s\) of the sample variance \(s^2\) is the sample standard deviation
If we need to indicate the dependence on the data vector \(\bs{x}\), we write \(s^2(\bs{x})\). The difference \(x_i - m\) is the deviation of \(x_i\) from the mean \(m\) of the data set. So the variance is the mean square deviation and is a measure of the spread of the data set with respet to the mean. The reason for dividing by \(n - 1\) rather than \(n\) is best understood in terms of the inferential point of view that we discuss below; this definition makes the sample variance an unbiased estimator of the distribution variance. However, the reason for the averaging can also be understood in terms of a related concept.
\(\sum_{i=1}^n (x_i - m) = 0\).
\(\sum_{i=1}^n (x_i - m) = \sum_{i=1}^n x_i - \sum_{i=1}^n m = n m - n m = 0\).
So if we know \(n - 1\) of the deviations, we can compute the last one. This means that there are only \(n - 1\) freely varying deviations, that is to say, \(n - 1\) degrees of freedom in the set of deviations. In the definition of sample variance, we average the squared deviations, not by dividing by the number of terms, but rather by dividing by the number of degrees of freedom in those terms. However, this argument notwithstanding, it would be reasonable, from a purely descriptive point of view, to divide by \(n\) in the definition of the sample variance. Moreover, when \(n\) is sufficiently large, it hardly matters whether we divide by \(n\) or by \(n - 1\).
The standard deviation \(s\) is the root mean square deviation and is also a measure of the spread of the data with respect to the mean. Both measures of spread are important. Variance has nicer mathematical properties, but its physical unit is the square of the unit of \(x\). For example, if the underlying variable \(x\) is the height of a person in inches, the variance is in square inches. On the other hand, the standard deviation has the same physical unit as the original variable, but its mathematical properties are not as nice.
Recall that the data set \(\bs{x}\) naturally gives rise to a probability distribution, namely the empirical distribution that places probability \(\frac{1}{n}\) at \(x_i\) for each \(i\). Thus, if the data are distinct, this is the uniform distribution on \(\{x_1, x_2, \ldots, x_n\}\). The sample mean \(m\) is simply the expected value) of the empirical distribution. Similarly, if we were to divide by \(n\) rather than \(n - 1\), the sample variance would be the variance of the empirical distribution. Most of the properties and results this section follow from much more general properties and results for the variance of a probability distribution (although for the most part, we give independent proofs).
Measures of center and measures of spread are best thought of together, in the context of an error function. The error function measures how well a single number \(a\) represents the entire data set \(\bs{x}\). The values of \(a\) (if they exist) that minimize the error functions are our measures of center; the minimum value of the error function is the corresponding measure of spread. Of course, we hope for a single value of \(a\) that minimizes the error function, so that we have a unique measure of center.
Let's apply this procedure to the mean square error function defined by \[ \mse(a) = \frac{1}{n - 1} \sum_{i=1}^n (x_i - a)^2, \quad a \in \R \] Minimizing \(\mse\) is a standard problem in calculus.
The graph of \(\mse\) is a parabola opening upward.
We can tell from the form of \(\mse\) that the graph is a parabola opening upward. Taking the derivative gives \[ \frac{d}{da} \mse(a) = -\frac{2}{n - 1}\sum_{i=1}^n (x_i - a) = -\frac{2}{n - 1}(n m - n a) \] Hence \(a = m\) is the unique value that minimizes \(\mse\). Of course, \(\mse(m) = s^2\).
Trivially, if we defined the mean square error function by dividing by \(n\) rather than \(n - 1\), then the minimum value would still occur at \(m\), the sample mean, but the minimum value would be the alternate version of the sample variance in which we divide by \(n\). On the other hand, if we were to use the root mean square deviation function \(\text{rmse}(a) = \sqrt{\mse(a)}\), then because the square root function is strictly increasing on \([0, \infty)\), the minimum value would again occur at \(m\), the sample mean, but the minimum value would be \(s\), the sample standard deviation. The important point is that with all of these error functions, the unique measure of center is the sample mean, and the corresponding measures of spread are the various ones that we are studying.
Next, let's apply our procedure to the mean absolute error function defined by \[ \mae(a) = \frac{1}{n - 1} \sum_{i=1}^n \left|x_i - a\right|, \quad a \in \R \]
The mean absolute error function satisfies the following properties:
For parts (a) and (b), note that for each \(i\), \(\left|x_i - a\right|\) is a continuous function of \(a\) with the graph consisting of two lines (of slopes \(\pm 1\)) meeting at \(x_i\).
Mathematically, \(\mae\) has some problems as an error function. First, the function will not be smooth (differentiable) at points where two lines of different slopes meet. More importantly, the values that minimize mae may occupy an entire interval, thus leaving us without a unique measure of center. Exercise and show that these pathologies can really happen. It turns out that \(\mae\) is minimized at any point in the median interval of the data set \(\bs{x}\). The proof of this result follows from a much more general result for probability distributions. So the medians are the natural measures of center associated with \(\mae\) as a measure of error, in the same way that the sample mean is the measure of center associated with the \(\mse\) as a measure of error.
In this section, we establish some essential properties of the sample variance and standard deviation. First, the following alternate formula for the sample variance is better for computational purposes, and for certain theoretical purposes as well.
The sample variance can be computed as \[ s^2 = \frac{1}{n - 1} \sum_{i=1}^n x_i^2 - \frac{n}{n - 1} m^2 \]
Note that \begin{align} \sum_{i=1}^n (x_i - m)^2 & = \sum_{i=1}^n \left(x_i^2 - 2 m x_i + m^2\right) = \sum_{i=1}^n x_i^2 - 2 m \sum_{i=1}^n x_i - \sum_{i=1}^n m^2\\ & = \sum_{i=1}^n x_i^2 - 2 n m^2 + n m^2 = \sum_{i=1}^n x_i^2 - n m^2 \end{align} Dividing by \(n - 1\) gives the result.
If we let \(\bs{x}^2 = (x_1^2, x_2^2, \ldots, x_n^2)\) denote the sample from the variable \(x^2\), then the computational formula in can be written succinctly as \[ s^2(\bs{x}) = \frac{n}{n - 1} \left[m(\bs{x}^2) - m^2(\bs{x})\right] \] The following theorem gives another computational formula for the sample variance, directly in terms of the variables and thus without the computation of an intermediate statistic.
The sample variance can be computed as \[ s^2 = \frac{1}{2 n (n - 1)} \sum_{i=1}^n \sum_{j=1}^n (x_i - x_j)^2 \]
Note that \begin{align} \frac{1}{2 n} \sum_{i=1}^n \sum_{j=1}^n (x_i - x_j)^2 & = \frac{1}{2 n} \sum_{i=1}^n \sum_{j=1}^n (x_i - m + m - x_j)^2 \\ & = \frac{1}{2 n} \sum_{i=1}^n \sum_{j=1}^n \left[(x_i - m)^2 + 2 (x_i - m)(m - x_j) + (m - x_j)^2\right] \\ & = \frac{1}{2 n} \sum_{i=1}^n \sum_{j=1}^n (x_i - m)^2 + \frac{1}{n} \sum_{i=1}^n \sum_{j=1}^n (x_i - m)(m - x_j) + \frac{1}{2 n} \sum_{i=1}^n \sum_{j=1}^n (m - x_j)^2 \\ & = \frac{1}{2} \sum_{i=1}^n (x_i - m)^2 + 0 + \frac{1}{2} \sum_{j=1}^n (m - x_j)^2 \\ & = \sum_{i=1}^n (x_i - m)^2 \end{align} Dividing by \(n - 1\) gives the result.
The sample variance is nonnegative:
Part (a) is obvious. For part (b) note that if \(s^2 = 0\) then \(x_i = m\) for each \(i\). Conversely, if \(\bs{x}\) is a constant vector, then \(m\) is that same constant.
So \(s^2 = 0\) if and only if the data set is constant (and then, of course, the mean is the common value).
If \(c\) is a constant then
For part (a), recall that \(m(c \bs{x}) = c m(\bs{x})\). Hence \[ s^2(c \bs{x}) = \frac{1}{n - 1}\sum_{i=1}^n \left[c x_i - c m(\bs{x})\right]^2 = \frac{1}{n - 1} \sum_{i=1}^n c^2 \left[x_i - m(\bs{x})\right]^2 = c^2 s^2(\bs{x}) \]
If \(\bs{c}\) is a sample of size \(n\) from a constant \(c\) then
Recall that \(m(\bs{x} + \bs{c}) = m(\bs{x}) + c\). Hence \[ s^2(\bs{x} + \bs{c}) = \frac{1}{n - 1} \sum_{i=1}^n \left\{(x_i + c) - \left[m(\bs{x}) + c\right]\right\}^2 = \frac{1}{n - 1} \sum_{i=1}^n \left[x_i - m(\bs{x})\right]^2 = s^2(\bs{x})\]
As a special case of these results, suppose that \(\bs{x} = (x_1, x_2, \ldots, x_n)\) is a sample of size \(n\) corresponding to a real variable \(x\), and that \(a\) and \(b\) are constants. The sample corresponding to the variable \(y = a + b x\), in our vector notation, is \(\bs{a} + b \bs{x}\). Then \(m(\bs{a} + b \bs{x}) = a + b m(\bs{x})\) and \(s(\bs{a} + b \bs{x}) = \left|b\right| s(\bs{x})\). Linear transformations of this type, when \(b \gt 0\), arise frequently when physical units are changed. In this case, the transformation is often called a location-scale transformation; \(a\) is the location parameter and \(b\) is the scale parameter. For example, if \(x\) is the length of an object in inches, then \(y = 2.54 x\) is the length of the object in centimeters. If \(x\) is the temperature of an object in degrees Fahrenheit, then \(y = \frac{5}{9}(x - 32)\) is the temperature of the object in degree Celsius.
Now, for \(i \in \{1, 2, \ldots, n\}\), let \( z_i = (x_i - m) / s\). The number \(z_i\) is the standard score associated with \(x_i\). Note that since \(x_i\), \(m\), and \(s\) have the same physical units, the standard score \(z_i\) is dimensionless (that is, has no physical units); it measures the directed distance from the mean \(m\) to the data value \(x_i\) in standard deviations.
The sample of standard scores \(\bs{z} = (z_1, z_2, \ldots, z_n)\) has mean 0 and variance 1. That is,
These results follow from Theroems 7 and 8. In vector notation, note that \(\bs{z} = (\bs{x} - \bs{m})/s\). Hence \(m(\bs{z}) = (m - m) / s = 0\) and \(s(\bs{z}) = s / s = 1\).
Suppose that instead of the actual data \(\bs{x}\), we have a frequency distribution corresponding to a partition with classes (intervals) \((A_1, A_2, \ldots, A_k)\), class marks (midpoints of the intervals) \((t_1, t_2, \ldots, t_k)\), and frequencies \((n_1, n_2, \ldots, n_k)\). Recall that the relative frequency of class \(A_j\) is \(p_j = n_j / n\). In this case, approximate values of the sample mean and variance are, respectively, \begin{align} m & = \frac{1}{n} \sum_{j=1}^k n_j \, t_j = \sum_{j = 1}^k p_j \, t_j \\ s^2 & = \frac{1}{n - 1} \sum_{j=1}^k n_j (t_j - m)^2 = \frac{n}{n - 1} \sum_{j=1}^k p_j (t_j - m)^2 \end{align} These approximations are based on the hope that the data values in each class are well represented by the class mark. In fact, these are the standard definitions of sample mean and variance for the data set in which \(t_j\) occurs \(n_j\) times for each \(j\).
We continue our discussion of the sample variance, but now we assume that the variables are random. Thus, suppose that we have a basic random experiment, and that \(X\) is a real-valued random variable for the experiment with mean \(\mu\) and standard deviation \(\sigma\). We will need some higher order moments as well. Let \(\sigma_3 = \E\left[(X - \mu)^3\right]\) and \(\sigma_4 = \E\left[(X - \mu)^4\right]\) denote the 3rd and 4th moments about the mean. Recall that \(\sigma_3 \big/ \sigma^3 = \skw(X)\), the skewness of \(X\), and \(\sigma_4 \big/ \sigma^4 = \kur(X)\), the kurtosis of \(X\). We assume that \(\sigma_4 \lt \infty\).
We repeat the basic experiment \(n\) times to form a new, compound experiment, with a sequence of independent random variables \(\bs{X} = (X_1, X_2, \ldots, X_n)\), each with the same distribution as \(X\). In statistical terms, \(\bs{X}\) is a random sample of size \(n\) from the distribution of \(X\). All of the statistics above make sense for \(\bs{X}\), of course, but now these statistics are random variables. We will use the same notationt, except for the usual convention of denoting random variables by capital letters. Finally, note that the deterministic properties and relations established above still hold.
In addition to being a measure of the center of the data \(\bs{X}\), the sample mean \[ M = \frac{1}{n} \sum_{i=1}^n X_i \] is a natural estimator of the distribution mean \(\mu\). In this section, we will derive statistics that are natural estimators of the distribution variance \(\sigma^2\). The statistics that we will derive are different, depending on whether \(\mu\) is known or unknown; for this reason, \(\mu\) is referred to as a nuisance parameter for the problem of estimating \(\sigma^2\).
First we will assume that \(\mu\) is known. Although this is almost always an artificial assumption, it is a nice place to start because the analysis is relatively easy and will give us insight for the standard case. A natural estimator of \(\sigma^2\) is the following statistic, which we will refer to as the special sample variance. \[ W^2 = \frac{1}{n} \sum_{i=1}^n (X_i - \mu)^2 \]
\(W^2\) is the sample mean for a random sample of size \(n\) from the distribution of \((X - \mu)^2\), and satisfies the following properties:
These result follow immediately from standard results in the section on the Law of Large Numbers and the section on the Central Limit Theorem. For part (b), note that \[\var\left[(X - \mu)^2\right] = \E\left[(X - \mu)^4\right] -\left(\E\left[(X - \mu)^2\right]\right)^2 = \sigma_4 - \sigma^4\]
In particular part (a) means that \(W^2\) is an unbiased estimator of \(\sigma^2\). From part (b), note that \(\var(W^2) \to 0\) as \(n \to \infty\); this means that \(W^2\) is a consistent estimator of \(\sigma^2\). The square root of the special sample variance is a special version of the sample standard deviation, denoted \(W\).
\(\E(W) \le \sigma\). So \(W\) is a negativley biased estimator that tends to underestimate \(\sigma\).
This follows from the unbiased property and Jensen's inequality. Since \(w \mapsto \sqrt{w}\) is concave downward on \([0, \infty)\), we have \(\E(W) = \E\left(\sqrt{W^2}\right) \le \sqrt{\E\left(W^2\right)} = \sqrt{\sigma^2} = \sigma\).
Next we compute the covariance and correlation between the sample mean and the special sample variance.
The covariance and correlation of \(M\) and \(W^2\) are
Note that the correlation does not depend on the sample size, and that the sample mean and the special sample variance are uncorrelated if \(\sigma_3 = 0\) (equivalently \(\skw(X) = 0\)).
Consider now the more realistic case in which \(\mu\) is unknown. In this case, a natural approach is to average, in some sense, the squared deviations \((X_i - M)^2\) over \(i \in \{1, 2, \ldots, n\}\). It might seem that we should average by dividing by \(n\). However, another approach is to divide by whatever constant would give us an unbiased estimator of \(\sigma^2\). This constant turns out to be \(n - 1\), leading to the standard sample variance: \[ S^2 = \frac{1}{n - 1} \sum_{i=1}^n (X_i - M)^2 \]
\(\E\left(S^2\right) = \sigma^2\).
By expanding (as was shown in the last section), \[ \sum_{i=1}^n (X_i - M)^2 = \sum_{i=1}^n X_i^2 - n M^2 \] Recall that \(\E(M) = \mu\) and \(\var(M) = \sigma^2 / n\). Taking expected values in the displayed equation gives \[ \E\left(\sum_{i=1}^n (X_i - M)^2\right) = \sum_{i=1}^n (\sigma^2 + \mu^2) - n \left(\frac{\sigma^2}{n} + \mu^2\right) = n (\sigma^2 + \mu^2) -n \left(\frac{\sigma^2}{n} + \mu^2\right) = (n - 1) \sigma^2 \]
Of course, the square root of the sample variance is the sample standard deviation, denoted \(S\).
\(\E(S) \le \sigma\). So \(S\) is a negativley biased estimator than tends to underestimate \(\sigma\).
\(S^2 \to \sigma^2\) as \(n \to \infty\) with probability 1.
This follows from the strong law of large numbers. Recall again that \[ S^2 = \frac{1}{n - 1} \sum_{i=1}^n X_i^2 - \frac{n}{n - 1} M^2 = \frac{n}{n - 1}[M(\bs{X}^2) - M^2(\bs{X})] \] But with probability 1, \(M(\bs{X}^2) \to \sigma^2 + \mu^2\) as \(n \to \infty\) and \(M^2(\bs{X}) \to \mu^2\) as \(n \to \infty\).
Since \(S^2\) is an unbiased estimator of \(\sigma^2\), the variance of \(S^2\) is the mean square error, a measure of the quality of the estimator.
\(\var\left(S^2\right) = \frac{1}{n} \left( \sigma_4 - \frac{n - 3}{n - 1} \sigma^4 \right)\).
Recall from that \[ S^2 = \frac{1}{2 n (n - 1)} \sum_{i=1}^n \sum_{j=1}^n (X_i - X_j)^2 \] Hence, using the bilinear property of covariance we have \[ \var(S^2) = \cov(S^2, S^2) = \frac{1}{4 n^2 (n - 1)^2} \sum_{i=1}^n \sum_{j=1}^n \sum_{k=1}^n \sum_{k=1}^n \cov[(X_i - X_j)^2, (X_k - X_l)^2] \] We compute the covariances in this sum by considering disjoint cases:
Substituting gives the result.
Note that \(\var(S^2) \to 0\) as \(n \to \infty\), and hence \(S^2\) is a consistent estimator of \(\sigma^2\). On the other hand, it's not surprising that the variance of the standard sample variance (where we assume that \(\mu\) is unknown) is greater than the variance of the special standard variance (in which we assume \(\mu\) is known).
\(\var\left(S^2\right) \gt \var\left(W^2\right)\).
Next we compute the covariance between the sample mean and the sample variance.
The covariance and correlation between the sample mean and sample variance are
In particular, note that \(\cov(M, S^2) = \cov(M, W^2)\). Again, the sample mean and variance are uncorrelated if \(\sigma_3 = 0\) so that \(\skw(X) = 0\). Our last result gives the covariance and correlation between the special sample variance and the standard one. Curiously, the covariance the same as the variance of the special sample variance.
The covariance and correlation between \(W^2\) and \(S^2\) are
Note that \(\cor\left(W^2, S^2\right) \to 1\) as \(n \to \infty\), not surprising since with probability 1, \(S^2 \to \sigma^2\) and \(W^2 \to \sigma^2\) as \(n \to \infty\).
A particularly important special case occurs when the sampling distribution is normal. This case is explored in the section on Special Properties of Normal Samples.
Suppose that \(x\) is the temperature (in degrees Fahrenheit) for a certain type of electronic component after 10 hours of operation. A sample of 30 components has mean 113° and standard deviation \(18°\).
Suppose that \(x\) is the length (in inches) of a machined part in a manufacturing process. A sample of 50 parts has mean 10.0 and standard deviation 2.0.
Professor Moriarity has a class of 25 students in her section of Stat 101 at Enormous State University (ESU). The mean grade on the first midterm exam was 64 (out of a possible 100 points) and the standard deviation was 16. Professor Moriarity thinks the grades are a bit low and is considering various transformations for increasing the grades. In each case below give the mean and standard deviation of the transformed grades, or state that there is not enough information.
One of the students did not study at all, and received a 10 on the midterm. Professor Moriarity considers this score to be an outlier.
All statistical software packages will compute means, variances and standard deviations, draw dotplots and histograms, and in general perform the numerical and graphical procedures discussed in this section. For real statistical experiments, particularly those with large data sets, the use of statistical software is essential. On the other hand, there is some value in performing the computations by hand, with small, artificial data sets, in order to master the concepts and definitions. In this subsection, do the computations and draw the graphs with minimal technological aids.
Suppose that \(x\) is the number of math courses completed by an ESU student. A sample of 10 ESU students gives the data \(\bs{x} = (3, 1, 2, 0, 2, 4, 3, 2, 1, 2)\).
\(i\) | \(x_i\) | \(x_i - m\) | \((x_i - m)^2\) |
---|---|---|---|
\(1\) | \(3\) | \(1\) | \(1\) |
\(2\) | \(1\) | \(-1\) | \(1\) |
\(3\) | \(2\) | \(0\) | \(0\) |
\(4\) | \(0\) | \(-2\) | \(4\) |
\(5\) | \(2\) | \(0\) | \(0\) |
\(6\) | \(4\) | \(2\) | \(4\) |
\(7\) | \(3\) | \(1\) | \(1\) |
\(8\) | \(2\) | \(0\) | \(0\) |
\(9\) | \(1\) | \(-1\) | \(1\) |
\(10\) | \(2\) | \(0\) | \(0\) |
Total | 20 | 0 | 14 |
Mean | 2 | 0 | \(14/9\) |
Suppose that a sample of size 12 from a discrete variable \(x\) has empirical density function given by \(f(-2) = 1/12\), \(f(-1) = 1/4\), \(f(0) = 1/3\), \(f(1) = 1/6\), \(f(2) = 1/6\).
The following table gives a frequency distribution for the commuting distance to the math/stat building (in miles) for a sample of ESU students.
Class | Freq | Rel Freq | Density | Cum Freq | Cum Rel Freq | Midpoint |
---|---|---|---|---|---|---|
\((0, 2]\) | 6 | |||||
\((2, 6]\) | 16 | |||||
\((6, 10]\) | 18 | |||||
\((10, 20])\) | 10 | |||||
Total |
Class | Freq | Rel Freq | Density | Cum Freq | Cum Rel Freq | Midpoint |
---|---|---|---|---|---|---|
\((0, 2]\) | 6 | 0.12 | 0.06 | 6 | 0.12 | 1 |
\((2, 6]\) | 16 | 0.32 | 0.08 | 22 | 0.44 | 4 |
\((6, 10]\) | 18 | 0.36 | 0.09 | 40 | 0.80 | 8 |
\((10, 20]\) | 10 | 0.20 | 0.02 | 50 | 1 | 15 |
Total | 50 | 1 |
In the error function app, select root mean square error. As you add points, note the shape of the graph of the error function, the value that minimizes the function, and the minimum value of the function.
In the error function app, select mean absolute error. As you add points, note the shape of the graph of the error function, the values that minimizes the function, and the minimum value of the function.
Suppose that our data vector is \((2, 1, 5, 7)\). Explicitly give \(\mae\) as a piecewise function and sketch its graph. Note that
Suppose that our data vector is \((3, 5, 1)\). Explicitly give \(\mae\) as a piecewise function and sketch its graph. Note that
Many of the apps in this project are simulations of experiments with a basic random variable of interest. When you run the simulation, you are performing independent replications of the experiment. In most cases, the app displays the standard deviation of the distribution, both numerically in a table and graphically as the radius of the blue, horizontal bar in the graph box. When you run the simulation, the sample standard deviation is also displayed numerically in the table and graphically as the radius of the red horizontal bar in the graph box.
In the binomial coin experiment, the random variable is the number of heads. For various values of the parameters \(n\) (the number of coins) and \(p\) (the probability of heads), run the simulation 1000 times and compare the sample standard deviation to the distribution standard deviation.
In the simulation of the matching experiment, the random variable is the number of matches. For selected values of \(n\) (the number of balls), run the simulation 1000 times and compare the sample standard deviation to the distribution standard deviation.
Run the simulation of the gamma experiment 1000 times for various values of the rate parameter \(r\) and the shape parameter \(k\). Compare the sample standard deviation to the distribution standard deviation.
Suppose that \(X\) has probability density function \(f(x) = 12 \, x^2 \, (1 - x)\) for \(0 \le x \le 1\). The distribution of \(X\) is a member of the beta family. Compute each of the following
Suppose now that \((X_1, X_2, \ldots, X_{10})\) is a random sample of size 10 from the beta distribution in . Find each of the following:
Suppose that \(X\) has probability density function \(f(x) = \lambda e^{-\lambda x}\) for \(0 \le x \lt \infty\), where \(\lambda \gt 0\) is a parameter. Thus \(X\) has the exponential distribution with rate parameter \(\lambda\). Compute each of the following
Suppose now that \((X_1, X_2, \ldots, X_5)\) is a random sample of size 5 from the exponential distribution in . Find each of the following:
Recall that for an ace-six flat die, faces 1 and 6 have probability \(\frac{1}{4}\) each, while faces 2, 3, 4, and 5 have probability \(\frac{1}{8}\) each. Let \(X\) denote the score when an ace-six flat die is thrown. Compute each of the following:
Suppose now that an ace-six flat die is tossed 8 times. Find each of the following:
Statistical software should be used for the problems in this subsection.
Consider the petal length and species variables in Fisher's iris data.
Consider the erosion variable in the Challenger data set.
Consider Michelson's velocity of light data.
Consider Short's paralax of the sun data.
Consider Cavendish's density of the earth data.
Consider the M&M data.
Consider the body weight, species, and gender variables in the Cicada data.
Consider Pearson's height data.