SIX-STEP PROCEDURE FOR TESTING A HYPOTHESIS
There is a six-step procedure that systematizes hypothesis testing; when we get to step 5, we are ready to reject or not reject the hypothesis. However, hypothesis testing as used by statisticians does not provide proof that something is true, in the manner in which a mathematician “proves” a statement. It does provide a kind of “proof beyond a reasonable doubt,” in the manner of the court system. Hence, there are specific rules of evidence, or procedures, that are followed. The steps are shown in the following diagram. We will discuss in detail each of the steps.
Step 1: State the Null Hypothesis (H0) and the Alternate Hypothesis (H1)
Page 318The first step is to state the hypothesis being tested. It is called the null hypothesis, designated H0, and read “H sub zero.” The capital letter H stands for hypothesis, and the subscript zero implies “no difference.” There is usually a “not” or a “no” term in the null hypothesis, meaning that there is “no change.” For example, the null hypothesis is that the mean number of miles driven on the steel-belted tire is not different from 60,000. The null hypothesis would be written H0: μ = 60,000. Generally speaking, the null hypothesis is developed for the purpose of testing. We either reject or fail to reject the null hypothesis. The null hypothesis is a statement that is not rejected unless our sample data provide convincing evidence that it is false.
We should emphasize that, if the null hypothesis is not rejected on the basis of the sample data, we cannot say that the null hypothesis is true. To put it another way, failing to reject the null hypothesis does not prove that H0 is true; it means we have failed to disprove H0. To prove without any doubt the null hypothesis is true, the population parameter would have to be known. To actually determine it, we would have to test, survey, or count every item in the population. This is usually not feasible. The alternative is to take a sample from the population.
It should also be noted that we often begin the null hypothesis by stating, “There is no significant difference between . . .” or “The mean impact strength of the glass is not significantly different from. . . .” When we select a sample from a population, the sample statistic is usually numerically different from the hypothesized population parameter. As an illustration, suppose the hypothesized impact strength of a glass plate is 70 psi, and the mean impact strength of a sample of 12 glass plates is 69.5 psi. We must make a decision about the difference of 0.5 psi. Is it a true difference, that is, a significant difference, or is the difference between the sample statistic (69.5) and the hypothesized population parameter (70.0) due to chance (sampling)? To answer this question, we conduct a test of significance, commonly referred to as a test of hypothesis. To define what is meant by a null hypothesis:
NULL HYPOTHESIS A statement about the value of a population parameter developed for the purpose of testing numerical evidence.
The alternate hypothesis describes what you will conclude if you reject the null hypothesis. It is written H1 and is read “H sub one.” It is also referred to as the research hypothesis. The alternate hypothesis is accepted if the sample data provide us with enough statistical evidence that the null hypothesis is false.
ALTERNATE HYPOTHESIS A statement that is accepted if the sample data provide sufficient evidence that the null hypothesis is false.
The following example will help clarify what is meant by the null hypothesis and the alternate hypothesis. A recent article indicated the mean age of U.S. commercial aircraft is 15 years. To conduct a statistical test regarding this statement, the first step is to determine the null and the alternate hypotheses. The null hypothesis represents the current or reported condition. It is written H0: μ = 15. The alternate hypothesis is that the statement is not true, that is, H1: μ ≠ 15. It is important to remember that no matter how the problem is stated, the null hypothesis will always contain the equal sign. The equal sign (=) will never appear in the alternate hypothesis. Why? Because the null hypothesis is the statement being tested, and we need a specific value to include in our calculations. We turn to the alternate hypothesis only if the data suggest the null hypothesis is untrue.
Step 2: Select a Level of Significance
Page 319After setting up the null hypothesis and alternate hypothesis, the next step is to state the level of significance.
LEVEL OF SIGNIFICANCE The probability of rejecting the null hypothesis when it is true.
The level of significance is designated α, the Greek letter alpha. It is also sometimes called the level of risk. This may be a more appropriate term because it is the risk you take of rejecting the null hypothesis when it is really true.
There is no one level of significance that is applied to all tests. A decision is made to use the .05 level (often stated as the 5% level), the .01 level, the .10 level, or any other level between 0 and 1. Traditionally, the .05 level is selected for consumer research projects, .01 for quality assurance, and .10 for political polling. You, the researcher, must decide on the level of significance before formulating a decision rule and collecting sample data.
To illustrate how it is possible to reject a true hypothesis, suppose a firm manufacturing personal computers uses a large number of printed circuit boards. Suppliers bid on the boards, and the one with the lowest bid is awarded a sizable contract. Suppose the contract specifies that the computer manufacturer’s quality-assurance department will sample all incoming shipments of circuit boards. If more than 6% of the boards sampled are substandard, the shipment will be rejected. The null hypothesis is that the incoming shipment of boards contains 6% or less substandard boards. The alternate hypothesis is that more than 6% of the boards are defective.
A shipment of 4,000 circuit boards was received from Allied Electronics, and the quality assurance department selected a random sample of 50 circuit boards for testing. Of the 50 circuit boards sampled, 4 boards, or 8%, were substandard. The shipment was rejected because it exceeded the maximum of 6% substandard printed circuit boards. If the shipment was actually substandard, then the decision to return the boards to the supplier was correct. However, suppose the 4 substandard printed circuit boards selected in the sample of 50 were the only substandard boards in the shipment of 4,000 boards. Then only of 1% were defective (4/4,000 = .001). In that case, less than 6% of the entire shipment was substandard and rejecting the shipment was an error. In terms of hypothesis testing, we rejected the null hypothesis when we should have failed to reject the null hypothesis. By rejecting a true null hypothesis, we committed a Type I error. The probability of committing a Type I error is α.
TYPE I ERROR Rejecting the null hypothesis, H0, when it is true.
The probability of committing another type of error, called a Type II error, is designated by the Greek letter beta (β).
TYPE II ERROR Not rejecting the null hypothesis when it is false.
The firm manufacturing personal computers would commit a Type II error if, unknown to the manufacturer, an incoming shipment of printed circuit boards from Allied Electronics contained 15% substandard boards, yet the shipment was Page 320accepted. How could this happen? Suppose 2 of the 50 boards in the sample (4%) tested were substandard, and 48 of the 50 were good boards. According to the stated procedure, because the sample contained less than 6% substandard boards, the shipment was accepted. It could be that by chance the 48 good boards selected in the sample were the only acceptable ones in the entire shipment consisting of thousands of boards!
In retrospect, the researcher cannot study every item or individual in the population. Thus, there is a possibility of two types of error—a Type I error, wherein the null hypothesis is rejected when it should not be rejected, and a Type II error, wherein the null hypothesis is not rejected when it should have been rejected.
We often refer to the probability of these two possible errors as alpha, α, and beta, β. Alpha (α) is the probability of making a Type I error, and beta (β) is the probability of making a Type II error. The following table summarizes the decisions the researcher could make and the possible consequences.
Step 3: Select the Test Statistic
There are many test statistics. In this chapter, we use both z and t as the test statistics. In later chapters, we will use such test statistics as F and χ2, called chi-square.
TEST STATISTIC A value, determined from sample information, used to determine whether to reject the null hypothesis.
In hypothesis testing for the mean (μ) when σ is known, the test statistic z is computed by:
The z value is based on the sampling distribution of , which follows the normal distribution with a mean () equal to μ and a standard deviation , which is equal to σ/. We can thus determine whether the difference between and μ is statistically significant by finding the number of standard deviations , is from μ, using formula (10-1).
Step 4: Formulate the Decision Rule
A decision rule is a statement of the specific conditions under which the null hypothesis is rejected and the conditions under which it is not rejected. The region or area of rejection defines the location of all those values that are so large or so small that the probability of their occurrence under a true null hypothesis is rather remote.
Chart 10–1 portrays the rejection region for a test of significance that will be conducted later in the chapter.
During World War II, allied military planners needed estimates of the number of German tanks. The information provided by traditional spying methods was not reliable, but statistical methods proved to be valuable. For example, espionage and reconnaissance led analysts to estimate that 1,550 tanks were produced during June 1941. However, using the serial numbers of captured tanks and statistical analysis, military planners estimated that only 244 tanks were produced. The actual number produced, as determined from German production records, was 271. The estimate using statistical analysis turned out to be much more accurate. A similar type of analysis was used to estimate the number of Iraqi tanks destroyed during Desert Storm.
Page 321CHART 10–1 Sampling Distribution of the Statistic z, a Right-Tailed Test, .05 Level of Significance
Note in the chart that:
The area where the null hypothesis is not rejected is to the left of 1.645. We will explain how to get the 1.645 value shortly.
The area of rejection is to the right of 1.645.
A one-tailed test is being applied. (This will also be explained later.)
The .05 level of significance was chosen.
The sampling distribution of the statistic z follows the normal probability distribution.
The value 1.645 separates the regions where the null hypothesis is rejected and where it is not rejected.
The value 1.645 is the critical value.
CRITICAL VALUE The dividing point between the region where the null hypothesis is rejected and the region where it is not rejected.
Step 5: Make a Decision
The fifth step in hypothesis testing is to compute the value of the test statistic, compare its value to the critical value, and make a decision to reject or not to reject the null hypothesis. Referring to Chart 10–1, if, based on sample information, z is computed to be 2.34, the null hypothesis is rejected at the .05 level of significance. The decision to reject H0 was made because 2.34 lies in the region of rejection, that is, beyond 1.645. We reject the null hypothesis, reasoning that it is highly improbable that a computed z value this large is due to sampling error (chance).
Had the computed value been 1.645 or less, say 0.71, the null hypothesis is not rejected. It is reasoned that such a small computed value could be attributed to chance, that is, sampling error. As we have emphasized, only one of two decisions is possible in hypothesis testing—either reject or do not reject the null hypothesis.
However, because the decision is based on a sample, it is always possible to make either of two decision errors. It is possible to make a Type I error when the null hypothesis is rejected when it should not be rejected. Or it is also possible to make a Type II error when the null hypothesis is not rejected and it should have been rejected. Fortunately, we can decide on the probability of making a Type I error, α (alpha), and we can compute the probabilities associated with a Type II error, β (beta).
Step 6: Interpret the Result
Page 322The final step in the hypothesis testing procedure is to interpret the results. The process does not end with the value of a sample statistic or the decision to reject or not reject the null hypothesis. What can we say or report based on the results of the statistical test? Here are two examples:
An investigative reporter for a Colorado newspaper reports that the mean monthly income of convenience stores in the state is $130,000. You decide to conduct a test of hypothesis to verify the report. The null hypothesis and the alternate hypothesis are:
H0: μ = $130,000 H1: μ ≠ $130,000
A sample of convenience stores provides a sample mean and standard deviation, and you compute a z statistic. The results of the hypothesis test result in a decision to not reject the null hypothesis. How do you interpret the result? Be cautious with your interpretation because by not rejecting the null hypothesis, you did not prove the null hypothesis to be true. Based on the sample data, the difference between the sample mean and hypothesized population mean was not large enough to reject the null hypothesis.
In a recent speech to students, the dean of the College of Business reported that the mean credit card debt for college students is $3,000. You decide to conduct a test of hypothesis to investigate the statement’s truth. The null hypothesis and the alternate hypothesis are:
H0: μ = $3,000
H1: μ ≠ $3,000
A sample of college students provides a sample mean and standard deviation, and you compute a z statistic. The results of the hypothesis test result in a decision to reject the null hypothesis. How do you interpret the result? The evidence does not support the dean’s statement. You have disproved the null hypothesis with a stated probability of a Type I error, α. Based on the sample data, the mean amount of student credit card debt is different from $3,000.
ONE-TAILED AND TWO-TAILED TESTS OF SIGNIFICANCE
Refer to Chart 10–1. It shows a one-tailed test. It is called a one-tailed test because the rejection region is only in one tail of the curve. In this case, it is in the right, or upper, tail of the curve. To illustrate, suppose that the packaging department at General Foods Page 323Corporation is concerned that some boxes of Grape Nuts are significantly overweight. The cereal is packaged in 453-gram boxes, so the null hypothesis is H0: μ ≤ 453. This is read, “the population mean (μ) is equal to or less than 453.” The alternate hypothesis is, therefore, H1: μ > 453. This is read, “μ is greater than 453.” Note that the inequality sign in the alternate hypothesis (>) points to the region of rejection in the upper tail. (See Chart 10–1.) Also observe that the null hypothesis includes the equal sign. That is, H0: μ ≤ 453. The equality condition always appears in H0, never in H1.
Chart 10–2 portrays a situation where the rejection region is in the left (lower) tail of the standard normal distribution. As an illustration, consider the problem of automobile manufacturers, large automobile leasing companies, and other organizations that purchase large quantities of tires. They want the tires to average, say, 60,000 miles of wear under normal usage. They will, therefore, reject a shipment of tires if tests reveal that the mean life of the tires is significantly below 60,000 miles. They gladly accept a shipment if the mean life is greater than 60,000 miles! They are not concerned with this possibility, however. They are concerned only if they have sample evidence to conclude that the tires will average less than 60,000 miles of useful life. Thus, the test is set up to satisfy the concern of the automobile manufacturers that the mean life of the tires is not less than 60,000 miles. This statement appears in the null hypothesis. The null and alternate hypotheses in this case are written H0: μ ≥ 60,000 and H1: μ < 60,000.
Chart 10–2 Sampling Distribution for the Statistic z, Left-Tailed Test, .05 Level of Significance
One way to determine the location of the rejection region is to look at the direction in which the inequality sign in the alternate hypothesis is pointing (either < or >). In the tire wear problem, it is pointing to the left, and the rejection region is therefore in the left tail.
In summary, a test is one-tailed when the alternate hypothesis, H1, states a direction, such as:
H0: The mean income of female stockbrokers is less than or equal to $65,000 per year.
H1: The mean income of female stockbrokers is greater than $65,000 per year.
If no direction is specified in the alternate hypothesis, we use a two-tailed test. Changing the previous problem to illustrate, we can say:
H0: The mean income of female stockbrokers is $65,000 per year.
H1: The mean income of female stockbrokers is not equal to $65,000 per year.
If the null hypothesis is rejected and H1 accepted in the two-tailed case, the mean income could be significantly greater than $65,000 per year or it could be significantly less than $65,000 per year. To accommodate these two possibilities, the 5% area of rejection is divided equally into the two tails of the sampling distribution (2.5% each). Chart 10–3 shows the two areas and the critical values. Note that the total area in the normal distribution is 1.0000, found by .9500 + .0250 + .0250.
Page 324
CHART 10–3 Regions of Nonrejection and Rejection for a Two-Tailed Test, .05 Level of Significance
LO10-5
Conduct a test of a hypothesis about a population mean.
TESTING FOR A POPULATION MEAN: KNOWN POPULATION STANDARD DEVIATION
A Two-Tailed Test
An example will show the details of the six-step hypothesis testing procedure and the subsequent decision-making process. We also wish to use a two-tailed test. That is, we are not concerned whether the sample results are larger or smaller than the proposed population mean. Rather, we are interested in whether it is different from the proposed value for the population mean. We begin, as we did in the previous chapter, with a situation in which we have historical information about the population and in fact know its standard deviation.
EXAMPLE
Jamestown Steel Company manufactures and assembles desks and other office equipment at several plants in western New York State. The weekly production of the Model A325 desk at the Fredonia Plant follows a normal probability distribution with a mean of 200 and a standard deviation of 16. Recently, because of market expansion, new production methods have been introduced and new employees hired. The vice president of manufacturing would like to investigate whether there has been a change in the weekly production of the Model A325 desk. Is the mean number of desks produced at the Fredonia Plant different from 200 at the .01 significance level?
SOLUTION
Page 325In this example, we know two important pieces of information: (1) the population of weekly production follows the normal distribution and (2) the standard deviation of this normal distribution is 16 desks per week. So it is appropriate to use the z statistic. We use the statistical hypothesis testing procedure to investigate whether the production rate has changed from 200 per week.
Step 1: State the null hypothesis and the alternate hypothesis. The null hypothesis is “The population mean is 200.” The alternate hypothesis is “The mean is different from 200” or “The mean is not 200.” These two hypotheses are written:
H0: μ = 200
H1: μ ≠ 200
This is a two-tailed test because the alternate hypothesis does not state a direction. In other words, it does not state whether the mean production is greater than 200 or less than 200. The vice president wants only to find out whether the production rate is different from 200.
Before moving to Step 2, we wish to emphasize two points.
The null hypothesis has the equal sign. Why? Because the value we are testing is always in the null hypothesis. Logically, the alternate hypothesis never contains the equal sign.
Both the null hypothesis and the alternate hypothesis contain Greek letters—in this case m, which is the symbol for the population mean. Tests of hypothesis always refer to population parameters, never to sample statistics. To put it another way, you will never see the symbol xas part of the null hypothesis or the alternate hypothesis.
Step 2: Select the level of significance. In the example description, the significance level selected is .01. This is a, the probability of committing a Type I error, and it is the probability of rejecting a true null hypothesis.
Step 3: Select the test statistic. The test statistic is z when the population standard deviation is known. Transforming the production data to standard units (z values) permits their use not only in this problem but also in other hypothesis-testing problems. Formula (10–1) for z is repeated next with the various letters identified.
Step 4: Formulate the decision rule. We formulate the decision rule by first determining the critical values of z. Because this is a two-tailed test, half of .01, or .005, is placed in each tail. The area where H0 is not rejected, located between the two tails, is therefore .99. Using the Student’s tDistribution table in Appendix B.5, move to the top margin called “Level of Significance for Two-Tailed Tests, α,” select the column with α = .01, and move to the last row, which is labeled ∞, Page 326or infinite degrees of freedom. The z value in this cell is 2.576. For your convenience, Appendix B.5, Student’s t Distribution, is repeated in the inside back cover. All the facets of this problem are shown in the diagram in Chart 10–4.
CHART 10–4 Decision Rule for the .01 Significance Level
The decision rule is: if the computed value of z is not between −2.576 and 2.576, reject the null hypothesis. If z falls between −2.576 and 2.576, do not reject the null hypothesis.
Step 5: Make a decision. Take a sample from the population (weekly production), compute z, apply the decision rule, and arrive at a decision to rejectH0 or not to reject H0. The mean number of desks produced last year (50 weeks because the plant was shut down 2 weeks for vacation) is 203.5. The standard deviation of the population is 16 desks per week. Computing the z value from formula (10–1):
Because 1.547 does not fall in the rejection region, we decide not to reject H0.
Step 6: Interpret the result. We did not reject the null hypothesis, so we have failed to show that the population mean has changed from 200 per week. To put it another way, the difference between the population mean of 200 per week and the sample mean of 203.5 could simply be due to chance. What should we tell the vice president? The sample information fails to indicate that the new production methods resulted in a change in the 200-desks-per-week production rate.
Page 327Did we prove that the assembly rate is still 200 per week? Not really. What we did, technically, was fail to disprove the null hypothesis.Failing to disprove the hypothesis that the population mean is 200 is not the same thing as proving it to be true. As we suggested in thechapter introduction, the conclusion is analogous to the American judicial system. To explain, suppose a person is accused of a crime but is acquitted by a jury. If a person is acquitted of a crime, the conclusion is that there was not enough evidence to prove the person guilty. The trial did not prove that the individual was innocent, only that there was not enough evidence to prove the defendant guilty. That is what we do in statistical hypothesis testing when we do not reject the null hypothesis. The correct interpretation is that we have failed to disprove the null hypothesis.
We selected the significance level, .01 in this case, before setting up the decision rule and sampling the population. This is the appropriate strategy. The significance level should be set by the investigator, but it should be determined before gathering the sample evidence and not changed based on the sample evidence.
How does the hypothesis testing procedure just described compare with that of confidence intervals discussed in the previous chapter? When we conducted the test of hypothesis regarding the production of desks, we changed the units from desks per week to a z value. Then we compared the computed value of the test statistic (1.547) to that of the critical values (−2.576 and 2.576). Because the computed value was in the region where the null hypothesis was not rejected, we concluded that the population mean could be 200. To use the confidence interval approach, on the other hand, we would develop a confidence interval, based on formula (9–1). See page 284. The interval would be from 197.671 to 209.329, found by 203.5 ± 2.576(16/). Note that the proposed population value, 200, is within this interval. Hence, we would conclude that the population mean could reasonably be 200.
In general, H0 is rejected if the confidence interval does not include the hypothesized value. If the confidence interval includes the hypothesized value, then H0 is not rejected. So the “do not reject region” for a test of hypothesis is equivalent to the proposed population value occurring in the confidence interval.