1. Random
  2. 8. Hypothesis Testing
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6

2. Tests in the Normal Model

Basic Theory

The Normal Model

The normal distribution is perhaps the most important distribution in the study of mathematical statistics, in part because of the central limit theorem. As a consequence of this theorem, a measured quantity that is subject to numerous small, random errors will have, at least approximately, a normal distribution. Such variables are ubiquitous in statistical experiments, in subjects varying from the physical and biological sciences to the social sciences.

So in this section, we assume that X=(X1,X2,,Xn) is a random sample from the normal distribution with mean μ and standard deviation σ. Our goal in this section is to to construct hypothesis tests for μ and σ; these are among of the most important special cases of hypothesis testing. This section parallels the section on estimation in the normal model in the chapter on set estimation, and in particular, the duality between interval estimation and hypothesis testing will play an important role. But first we need to review some basic facts that will be critical for our analysis.

Recall that the sample mean M and sample variance S2 are M=1ni=1nXi,S2=1n1i=1n(XiM)2

From our study of point estimation, recall that M is an unbiased and consistent estimator of μ while S2 is an unbiased and consistent estimator of σ2. From these basic statistics we can construct the test statistics that will be used to construct our hypothesis tests. The following results are special properties of samples form the normal distribution.

Define Z=Mμσ/n,T=MμS/n,V=n1σ2S2

  1. Z has the standard normal distribution.
  2. T has the student t distribution with n1 degrees of freedom.
  3. V has the chi-square distribution with n1 degrees of freedom.
  4. Z and V are independent.

It follows that each of these random variables is a pivot variable for (μ,σ) since the distributions do not depend on the parameters, but the variables themselves functionally depend on one or both parameters. The pivot variables will lead to natural test statistics that can then be used to perform the hypothesis tests of the parameters. To construct our tests, we will need quantiles of these standard distributions. The quantiles can be computed using the quantile app or from most mathematical and statistical software packages. Here is the notation we will use:

Let p(0,1) and kN+.

  1. z(p) denotes the quantile of order p for the standard normal distribution.
  2. tk(p) denotes the quantile of order p for the student t distribution with k degrees of freedom.
  3. χk2(p) denotes the quantile of order p for the chi-square distribution with k degrees of freedom

Since the standard normal and student t distributions are symmetric about 0, it follows that z(1p)=z(p) and tk(1p)=tk(p) for p(0,1) and kN+. On the other hand, the chi-square distribution is not symmetric.

Tests for the Mean with Known Standard Deviation

For our first discussion, we assume that the distribution mean μ is unknown but the standard deviation σ is known. This is not always an artificial assumption. There are often situations where σ is stable over time, and hence is at least approximately known, while μ changes because of different treatments. Examples are given in the computational exercises below.

For a conjectured μ0R, define the test statistic Z=Mμ0σ/n

  1. If μ=μ0 then Z has the standard normal distribution.
  2. If μμ0 then Z has the normal distribution with mean μμ0σ/n and variance 1.

So in case (b), μμ0σ/n can be viewed as a non-centrality parameter. The graph of the probability density function of Z is like that of the standard normal probability density function, but shifted to the right or left by the non-centrality parameter, depending on whether μ>μ0 or μ<μ0.

For α(0,1), each of the following tests has significance level α:

  1. Reject H0:μ=μ0 versus H1:μμ0 if and only if Z<z(1α/2) or Z>z(1α/2) if and only if M<μ0z(1α/2)σn or M>μ0+z(1α/2)σn.
  2. Reject H0:μμ0 versus H1:μ>μ0 if and only if Z>z(1α) if and only if M>μ0+z(1α)σn.
  3. Reject H0:μμ0 versus H1:μ<μ0 if and only if Z<z(1α) if and only if M<μ0z(1α)σn.
Details:

In part (a), H0 is a simple hypothesis, and under H0, Z has the standard normal distribution. So α is probability of falsely rejecting H0 by definition of the quantiles. In parts (b) and (c), Z has a non-central normal distribution under H0 as discussed in [4]. So if H0 is true, the the maximum type 1 error probability α occurs when μ=μ0. The decision rules in terms of M are equivalent to the corresponding ones in terms of Z by simple algebra.

Part (a) is the standard two-sided test, while (b) is the right-tailed test and (c) is the left-tailed test. Note that in each case, the hypothesis test is the dual of the corresponding interval estimate constructed in the section on estimation in the normal model.

For each of the tests in [7], we fail to reject H0 at significance level α if and only if μ0 is in the corresponding 1α confidence interval, that is

  1. Mz(1α/2)σnμ0M+z(1α/2)σn
  2. μ0M+z(1α)σn
  3. μ0Mz(1α)σn
Details:

This follows from [5]. In each case, we start with the inequality that corresponds to not rejecting H0 and solve for μ0.

The two-sided test in (a) corresponds to α/2 in each tail of the distribution of the test statistic Z, under H0. This set is said to be unbiased. But of course we can construct other biased tests by partitioning the confidence level α between the left and right tails in a non-symmetric way.

For every α,p(0,1), the following test has significance level α: Reject H0:μ=μ0 versus H1:μμ0 if and only if Z<z(αpα) or Zz(1pα).

  1. p=12 gives the symmetric, unbiased test.
  2. p0 gives the left-tailed test.
  3. p1 gives the right-tailed test.
Details:

As before H0 is a simple hypothesis, and if H0 is true, Z has the standard normal distribution. So the probability of falsely rejecting H0 is α by definition of the quantiles. Parts (a)–(c) follow from properties of the standard normal quantile function.

The P-value of these test can be computed in terms of the standard normal distribution function Φ.

The P-values of the standard tests in [20] are respectively

  1. 2[1Φ(|Z|)]
  2. 1Φ(Z)
  3. Φ(Z)

Recall that the power function of a test of a parameter is the probability of rejecting the null hypothesis, as a function of the true value of the parameter. Our next series of results will explore the power functions of the tests in [7].

The power function of the general two-sided test in [7] is given by Q(μ)=Φ(z(αpα)nσ(μμ0))+Φ(nσ(μμ0)z(1pα)),μR

  1. Q is decreasing on (,m0) and increasing on (m0,) where m0=μ0+[z(αpα)+z(1pα)]n2σ.
  2. Q(μ0)=α.
  3. Q(μ)1 as μ and Q(μ)1 as μ.
  4. If p=12 then Q is symmetric about μ0 (and m0=μ0).
  5. As p increases, Q(μ) increases if μ>μ0 and decreases if μ<μ0.

So by varying p, we can make the test more powerful for some values of μ, but only at the expense of making the test less powerful for other values of μ.

The power function of the left-tailed test in [7] is given by Q(μ)=Φ(z(α)+nσ(μμ0)),μR

  1. Q is increasing on R.
  2. Q(μ0)=α.
  3. Q(μ)1 as μ and Q(μ)0 as μ.

The power function of the right-tailed test in [7], is given by Q(μ)=Φ(z(α)nσ(μμ0)),μR

  1. Q is decreasing on R.
  2. Q(μ0)=α.
  3. Q(μ)0 as μ and Q(μ)1 as μ.

For any of the three tests in above in [7], increasing the sample size n or decreasing the standard deviation σ results in a uniformly more powerful test.

In the mean test experiment, select the normal test statistic and select the normal sampling distribution with standard deviation σ=2 sample size n=20, and μ0=0. Run the experiment 1000 times for several values of the true distribution mean μ. For each value of μ, note the distribution of the P-value.

In the mean estimate experiment, select the normal pivot variable and select the normal distribution with μ=0 and standard deviation σ=2, confidence level 1α=0.90, and sample size n=10. For each of the three types of confidence intervals, run the experiment 20 times. State the corresponding hypotheses and significance level, and for each run, give the set of μ0 for which the null hypothesis would be rejected.

In many cases, the first step is to design the experiment so that the significance level is α and so that the test has a given power β for a given alternative μ1.

For either of the one-sided tests in [7], the sample size n needed for a test with significance level α and power β for the alternative μ1 is n=(σ[z(β)z(α)]μ1μ0)2

Details:

This follows from setting the power function equal to β and solving for n

For the unbiased, two-sided test, the sample size n needed for a test with significance level α and power β for the alternative μ1 is approximately n=(σ[z(β)z(α/2)]μ1μ0)2

Details:

In the power function for the two-sided test given in [9], we can neglect the first term if μ1<μ0 and neglect the second term if μ1>μ0.

Tests of the Mean with Unknown Standard Deviation

For our next discussion, we construct tests of μ without requiring the assumption that σ is known. And in applications of course, σ is usually unknown.

For a conjectured μ0R, define the test statistic T=Mμ0S/n

  1. If μ=μ0, the statistic T has the student t distribution with n1 degrees of freedom.
  2. If μμ0 then T has a non-central t distribution with n1 degrees of freedom and non-centrality parameter μμ0σ/n.

In case (b), the graph of the probability density function of T is much (but not exactly) the same as that of the ordinary t distribution with n1 degrees of freedom, but shifted to the right or left by the non-centrality parameter, depending on whether μ>μ0 or μ<μ0.

For α(0,1), each of the following tests has significance level α:

  1. Reject H0:μ=μ0 versus H1:μμ0 if and only if T<tn1(1α/2) or T>tn1(1α/2) if and only if M<μ0tn1(1α/2)Sn or T>μ0+tn1(1α/2)Sn.
  2. Reject H0:μμ0 versus H1:μ>μ0 if and only if T>tn1(1α) if and only if M>μ0+tn1(1α)Sn.
  3. Reject H0:μμ0 versus H1:μ<μ0 if and only if T<tn1(1α) if and only if M<μ0tn1(1α)Sn.
Details:

In part (a), T has the chi-square distribution with n1 degrees of freedom under H0. So if H0 is true, the probability of falsely rejecting H0 is α by definition of the quantiles. In parts (b) and (c), T has a non-central t distribution with n1 degrees of freedom under H0, as in [17]. Hence if H0 is true, the maximum type 1 error probability α occurs when μ=μ0. The decision rules in terms of M are equivalent to the corresponding ones in terms of T by simple algebra.

Part (a) is the standard two-sided test, while (b) is the right-tailed test and (c) is the left-tailed test. Note that in each case, the hypothesis test is the dual of the corresponding interval estimate constructed in the section on estimation in the normal model.

For each of the tests in [18], we fail to reject H0 at significance level α if and only if μ0 is in the corresponding 1α confidence interval.

  1. Mtn1(1α/2)Snμ0M+tn1(1α/2)Sn
  2. μ0M+tn1(1α)Sn
  3. μ0Mtn1(1α)Sn
Details:

This follows from [18]. In each case, we start with the inequality that corresponds to not rejecting H0 and then solve for μ0.

The two-sided test in (a) corresponds to α/2 in each tail of the distribution of the test statistic T, under H0. This set is said to be unbiased. But of course we can construct other biased tests by partitioning the confidence level α between the left and right tails in a non-symmetric way.

For every α,p(0,1), the following test has significance level α: Reject H0:μ=μ0 versus H1:μμ0 if and only if T<tn1(αpα) or Ttn1(1pα) if and only if M<μ0+tn1(αpα)Sn or M>μ0+tn1(1pα)Sn.

  1. p=12 gives the symmetric, unbiased test.
  2. p0 gives the left-tailed test.
  3. p1 gives the right-tailed test.
Details:

Once again, H0 is a simple hypothesis, and under H0 the test statistic T has the student t distribution with n1 degrees of freedom. So if H0 is true, the probability of falsely rejecting H0 is α by definition of the quantiles. Parts (a)–(c) follow from properties of the quantile function.

The P-value of these test can be computed in terms of the distribution function Φn1 of the t-distribution with n1 degrees of freedom.

The P-values of the standard tests in [20] are respectively

  1. 2[1Φn1(|T|)]
  2. 1Φn1(T)
  3. Φn1(T)

In the mean test experiment, select the student test statistic and select the normal sampling distribution with standard deviation σ=2, sample size n=20, and μ0=1. Run the experiment 1000 times for several values of the true distribution mean μ. For each value of μ, note the empirical distribution of P.

In the mean estimate experiment, select the student pivot variable and select the normal sampling distribution with mean 0 and standard deviation 2. Select confidence level 0.90 and sample size 10. For each of the three types of intervals, run the experiment 20 times. State the corresponding hypotheses and significance level, and for each run, give the set of μ0 for which the null hypothesis would be rejected.

The power function for the t tests in [20] can be computed explicitly in terms of the non-central t distribution function. Qualitatively, the graphs of the power functions are similar to the case when σ is known, given in [9] (two-sided), [10] (left-tailed), and [11] (right-tailed).

If an upper bound σ0 on the standard deviation σ is known, then conservative estimates on the sample size needed for a given confidence level and a given margin of error can be obtained using the methods for the normal pivot variable, in [15] for the two-sided case and [16] for the one-sided cases.

Tests of the Standard Deviation

For our next discussion, we will construct hypothesis tests for the distribution standard deviation σ. So our assumption is that σ is unknown, and of course almost always, μ would be unknown as well.

For a conjectured value σ0(0,), define the test statistic V=n1σ02S2

  1. If σ=σ0, then V has the chi-square distribution with n1 degrees of freedom.
  2. If σσ0 then V has the gamma distribution with shape parameter (n1)/2 and scale parameter 2σ2/σ02.

Recall that the ordinary chi-square distribution with n1 degrees of freedom is the gamma distribution with shape parameter (n1)/2 and scale parameter 12. So in case (b), the ordinary chi-square distribution is scaled by σ2/σ02. In particular, the scale factor is greater than 1 if σ>σ0 and less than 1 if σ<σ0.

For every α(0,1), the following test has significance level α:

  1. Reject H0:σ=σ0 versus H1:σσ0 if and only if V<χn12(α/2) or V>χn12(1α/2) if and only if S2<χn12(α/2)σ02n1 or S2>χn12(1α/2)σ02n1
  2. Reject H0:σσ0 versus H1:σ<σ0 if and only if V<χn12(α) if and only if S2<χn12(α)σ02n1
  3. Reject H0:σσ0 versus H1:σ>σ0 if and only if V>χn12(1α) if and only if S2>χn12(1α)σ02n1
Details:

The logic is largely the same as with our other hypothesis test. In part (a), H0 is a simple hypothesis, and under H0, the test statistic V has the chi-square distribution with n1 degrees of freedom. So if H0 is true, the probability of falsely rejecting H0 is α by definition of the quantiles. In parts (b) and (c), V has the more general gamma distribution under H0, as discussed in [24]. If H0 is true, the maximum type 1 error probability is α and occurs when σ=σ0.

Part (a) is the unbiased, two-sided test that corresponds to α/2 in each tail of the chi-square distribution of the test statistic V, under H0. Part (b) is the left-tailed test and part (c) is the right-tailed test. Once again, we have a duality between the hypothesis tests and the interval estimates constructed in the section on estimation in the normal model.

For each of the tests in [25], we fail to reject H0 at significance level α if and only if σ02 is in the corresponding 1α confidence interval. That is

  1. n1χn12(1α/2)S2σ02n1χn12(α/2)S2
  2. σ02n1χn12(α)S2
  3. σ02n1χn12(1α)S2
Details:

This follows from [25]. In each case, we start with the inequality that corresponds to not rejecting H0 and then solve for σ02.

As before, we can construct more general two-sided tests by partitioning the significance level α between the left and right tails of the chi-square distribution in an arbitrary way.

For every α,p(0,1), the following test has significance level α: Reject H0:σ=σ0 versus H1:σσ0 if and only if Vχn12(αpα) or Vχn12(1pα) if and only if S2<χn12(αpα)σ02n1 or S2>χn12(1pα)σ02n1.

  1. p=12 gives the equal-tail test.
  2. p0 gives the left-tail test.
  3. p1 gives the right-tail test.
Details:

As before, H0 is a simple hypothesis, and under H0 the test statistic V has the chi-square distribution with n1 degrees of freedom. So if H0 is true, the probability of falsely rejecting H0 is α by definition of the quantiles. Parts (a)–(c) follow from properties of the quantile function.

Recall again that the power function of a test of a parameter is the probability of rejecting the null hypothesis, as a function of the true value of the parameter. The power functions of the tests for σ can be expressed in terms of the distribution function Gn1 of the chi-square distribution with n1 degrees of freedom.

The power function of the general two-sided test in [27] is given by the following formula, and satisfies the given properties: Q(σ)=1Gn1(σ02σ2χn12(1pα))+Gn1(σ02σ2χn12(αpα))

  1. Q is decreasing on (,σ0) and increasing on (σ0,).
  2. Q(σ0)=α.
  3. Q(σ)1 as σ and Q(σ)1 as σ0.

The power function of the left-tailed test in [25] is given by the following formula, and satisfies the given properties: Q(σ)=1Gn1(σ02σ2χn12(1α))

  1. Q is increasing on (0,).
  2. Q(σ0)=α.
  3. Q(σ)1 as σ and Q(σ)0 as σ0.

The power function for the right-tailed test in [25] is given by the following formula, and satisfies the given properties: Q(σ)=Gn1(σ02σ2χn12(α))

  1. Q is decreasing on (0,).
  2. Q(σ0)=α.
  3. Q(σ)0 as σ) and Q(σ)0 as σ and as σ0.

In the variance test experiment, select the normal distribution with mean 0, and select significance level 0.1, sample size 10, and test standard deviation 1.0. For various values of the true standard deviation, run the simulation 1000 times. Record the relative frequency of rejecting the null hypothesis and plot the empirical power curve.

  1. Two-sided test
  2. Left-tailed test
  3. Right-tailed test

In the variance estimate experiment, select the normal distribution with mean 0 and standard deviation 2, and select confidence level 0.90 and sample size 10. Run the experiment 20 times. State the corresponding hypotheses and significance level, and for each run, give the set of test standard deviations for which the null hypothesis would be rejected.

  1. Two-sided confidence interval
  2. Confidence lower bound
  3. Confidence upper bound

Exercises

Robustness

The primary assumption that we made is that the underlying sampling distribution is normal. Of course, in real statistical problems, we are unlikely to know much about the sampling distribution, let alone whether or not it is normal. Suppose in fact that the underlying distribution is not normal. When the sample size n is relatively large, the distribution of the sample mean will still be approximately normal by the central limit theorem, and thus our tests of the mean μ should still be approximately valid. On the other hand, tests of the variance σ2 are less robust to deviations form the assumption of normality. The following exercises explore these ideas.

In the mean test experiment, select the gamma distribution with shape parameter 1 and scale parameter 1. For the three different tests and for various sample sizes, and values of μ0, run the experiment 1000 times. For each configuration, note the empirical distribution of P.

In the mean test experiment, select the uniform distribution on [0,4]. For the three different tests and for various sample sizes and values of μ0, run the experiment 1000 times. For each configuration, note the empirical distribution of P.

How large n needs to be for the testing procedure to work well depends, of course, on the underlying distribution; the more this distribution deviates from normality, the larger n must be. Fortunately, convergence to normality in the central limit theorem is rapid and hence, as you observed in the exercises, we can get away with relatively small sample sizes (30 or more) in most cases.

In the variance test experiment, select the gamma distribution with shape parameter 1 and scale parameter 1. For the three different tests and for various significance levels, sample sizes, and values of σ0, run the experiment 1000 times. For each configuration, note the relative frequency of rejecting H0. When H0 is true, compare the relative frequency with the significance level.

In the variance test experiment, select the uniform distribution on [0,4]. For the three different tests and for various significance levels, sample sizes, and values of μ0, run the experiment 1000 times. For each configuration, note the relative frequency of rejecting H0. When H0 is true, compare the relative frequency with the significance level.

Computational Exercises

The length of a certain machined part is supposed to be 10 centimeters. In fact, due to imperfections in the manufacturing process, the actual length is a random variable. The standard deviation is due to inherent factors in the process, which remain fairly stable over time. From historical data, the standard deviation is known with a high degree of accuracy to be 0.3. The mean, on the other hand, may be set by adjusting various parameters in the process and hence may change to an unknown value fairly frequently. We are interested in testing H0:μ=10 versus H1:μ10.

  1. Suppose that a sample of 100 parts has mean 10.1. Perform the test at the 0.1 level of significance.
  2. Compute the P-value for the data in (a).
  3. Compute the power of the test in (a) at μ=10.05.
  4. Compute the approximate sample size needed for significance level 0.1 and power 0.8 when μ=10.05.
Details:
  1. Test statistic 3.33, critical values ±1.645. Reject H0.
  2. P=0.0010
  3. The power of the test at 10.05 is approximately 0.0509.
  4. Sample size 223

A bag of potato chips of a certain brand has an advertised weight of 250 grams. Actually, the weight (in grams) is a random variable. Suppose that a sample of 75 bags has mean 248 and standard deviation 5. At the 0.05 significance level, perform the following tests:

  1. H0:μ250 versus H1:μ<250
  2. H0:σ7 versus H1:σ<7
Details:
  1. Test statistic 3.464, critical value 1.665. Reject H0.
  2. P<0.0001 so reject H0.

At a telemarketing firm, the length of a telephone solicitation (in seconds) is a random variable. A sample of 50 calls has mean 310 and standard deviation 25. At the 0.1 level of significance, can we conclude that

  1. μ>300?
  2. σ>20?
Details:
  1. Test statistic 2.828, critical value 1.2988. Reject H0.
  2. P=0.0071 so reject H0.

At a certain farm the weight of a peach (in ounces) at harvest time is a random variable. A sample of 100 peaches has mean 8.2 and standard deviation 1.0. At the 0.01 level of significance, can we conclude that

  1. μ>8?
  2. σ<1.5?
Details:
  1. Test statistic 2.0, critical value 2.363. Fail to reject H0.
  2. P<0.0001 so reject H0.

The hourly wage for a certain type of construction work is a random variable with standard deviation 1.25. For sample of 25 workers, the mean wage was $6.75. At the 0.01 level of significance, can we conclude that μ<7.00?

Details:

Test statistic 1, critical value 2.328. Fail to reject H0.

Data Analysis Exercises

Using Michelson's data, test to see if the velocity of light is greater than 730 (+299000) km/sec, at the 0.005 significance level.

Details:

Test statistic 15.49, critical value 2.6270. Reject H0.

Using Cavendish's data, test to see if the density of the earth is less than 5.5 times the density of water, at the 0.05 significance level .

Details:

Test statistic 1.269, critical value 1.7017. Fail to reject H0.

Using Short's data, test to see if the parallax of the sun differs from 9 seconds of a degree, at the 0.1 significance level.

Details:

Test statistic 3.730, critical value ±1.6749. Reject H0.

Using Fisher's iris data, perform the following tests, at the 0.1 level:

  1. The mean petal length of Setosa irises differs from 15 mm.
  2. The mean petal length of Verginica irises is greater than 52 mm.
  3. The mean petal length of Versicolor irises is less than 44 mm.
Details:
  1. Test statistic 1.563, critical values ±1.672. Fail to reject H0.
  2. Test statistic 4.556, critical value 1.2988. Reject H0.
  3. Test statistic 1.028, critical value 1.2988. Fail to Reject H0.