Stat Minitab
Chapter 10 Comparing Means of Two Populations
Note: Use of Z or t test requires knowledge of population variance sigma squared. If population variance sigma squared is not known the the t test must be utilized. Comparing Two Means: Independent Samples Population variances known for both populations population variances are known see see Page 441 Equation 10.3 The following data show the average monthly utility bills for a random sample of households in Baltimore and for a random sample of households in Houston. (The bills include phone, television, Internet, electricity and natural gas
Baltimore Houston
Sample mean $390.44 359.52
Sample Size 33 36
Population Standard Deviation $64 $58
For practice: Using the data in the table above test at α=.05 to determine if there is any significant difference in the average monthly utility bills for the two cities. Work Space:
2 Revised for Minitab 17
Use t distribution if population variances are unknown. There are two tests available when the population variances are unknown. Selection of the proper test (exact or approximate) depends on whether or not the variances (shapes) of the two populations under study are the equal (the same shape).
RULE
If population variances are equal you will use the Exact test and pool the variances of the two populations (Page 450/106. If the populations are unequal you will not pool the variances and must use the Approximate technique and Welch/Satterthwaite (Page 456/10.11) method for computing degrees of freedom. Donnelly Text First Edition F test on Pages 642-651 of Chp 13 of text. Donnelly Text Second Edition Pages 635- 641 Yes…leap ahead a bunch>>>> Begin by testing the assumption of equal variances (f-test): Pages 634-641 of Chp 13 in Donnelly text Example for population variances unknown where variances are assumed equal
Testing for equal variances
Dataset: Zocor prices in Colorado and Texas will be used in both the f test and the t test: Dataset. The TSA and 401k Minitab worksheet has been created for you in and is located in Canvas>Files>Chp10. Columns 1 and 3 are text columns showing the state in which the pharmacy is located. Columns 2 and 4 contain the prices. Determining if there are differences in the average prices of Zocor in Colorado vs. Texas Prices of Zocor at randomly selected pharmacies by city. Column 1 Column 2 Column3 Column 4 City Colorado Prices Colo City Texas Prices Texas Alamosa 125.05 Austin 145.32 Avon 137.56 Austin 131.19 Broomfield 142.50 Austin 151.65 Buena Vista 145.95 Austin 141.55 Colorado Springs 117.49 Austin 125.99 Colorado Springs 142.75 Dallas 126.29 Denver 121.99 Dallas 139.19 Denver 117.49 Dallas 156.00 Eaton 141.64 Dallas 137.56 Fort Collins 128.69 Houston 154.10 Gunnison 130.29 Houston 126.41 Pueblo 142.39 Houston 114.00 Pueblo 121.99 Houston 144.99 Pueblo 141.30 Sterling 153.43 Walsenbert 133.39
3 Revised for Minitab 17
Problem Definition: Which t-calc formula do we use? In real world situations we would test the variances first to see which form of the test we would use (assume equal variances or not.) Note: It is easier to use a one tailed F test to determine if variances are equal. To do so, the alternative hypothesis must be constructed to show you are testing to see if one variance is greater than the other. The assumed greater variance will be stated in the numerator to calculate the critical ratio. F test assumes both populations are normally distributed and the test is not robust to departures from normality. For two tailed tests no modification of the hypothesis is needed but you must share alpha over both tails even if the larger (folded technique) variance in the numerator technique is utilized. Adjusting the F test to make it two tailed will be demonstrated by your instructor. Hypothesis Tests are established as one tailed to right by placing larger variance in numerator. Note Texas is stated as the first variance in each hypothesis.
H0: 22
CT Accept null means you can pool variance test
H1: 22
. CT
Accept alternative and you must use Welch-Sattherthwaite
Decision Rule: If F critical ratio exceeds 2.48 reject null hypothesis.
**If degrees of freedom for your problem are not listed on f table round down to next closest df or use Minitab to compute the precise df. Remember, this is a folded test so the larger variance must be placed in the numerator when computing f critical ratio, Note: You may wish to use Stat>Display Descriptive Statistics in Minitab to develop the variances. This way you will know which is larger vs. smaller. Optional for your benefit only
Stat>Basis Stats>Display Descriptive Stats Select columns for which you want the statistics
Select the Statistics button at bottom of screen on the left and check only the statistics you want. Click off all other stats boxes.
4 Revised for Minitab 17
Minitab 17 Test Equal Variances Hypothesis Test for equal variances Stat>Basic Stat> 2 Variances Options>Using pull down menu select Each sample in its own column>For sample 1:select Texas>place cursor in second box and select Sample 2: Colorado (larger variance over smaller> Select options box at bottom of screen Options> Hypothesized ratio pull down arrow and select sample 1 variance/Sample 2 variance > set confidence level>Leave Hypothesis at 1.0>Alterntative hypothesis>greater than>Hit Ok>Ok
Minitab 17 Output from F test (Session Window)
Test and CI for Two Variances: Prices Tex, Prices Col Method
Null hypothesis Variance(Prices Tex) / Variance(Prices Col) = 1
Alternative hypothesis Variance(Prices Tex) / Variance(Prices Col) > 1
Significance level α = 0.05
F method was used. This method is accurate for normal data only.
Statistics
95% Lower
Bound for
Variable N StDev Variance Variances
Prices Tex 13 12.663 160.354 91.517
Prices Col 16 11.015 121.329 72.810
Ratio of standard deviations = 1.150
Ratio of variances = 1.322
95% One-Sided Confidence Intervals
5 Revised for Minitab 17
Lower Bound Lower Bound
for StDev for Variance
Method Ratio Ratio
F 0.731 0.534
***Tests
Test
Method DF1 DF2 Statistic P-Value
F 12 15 1.32 0.301
Conclusion: 1.32 f critical ratio does not exceed 2.484 f critical value from table. FTR null T2 error is possible. Interpretation: The variances for the Texas and Colorado are not significantly different l so we may use pooled the variance t test.
Do not include this graph set from F test in homework
Minitab was asked only to present data on the assumption of normality and thus only the f test p-value is presented in the graph legend as shown above. Brief Summary: Now that you have determined whether or not the variance are equal (f- test) you now determine which technique to use on determining whether there is a significant difference in average costs of Zocor in the two states. In this case, the variances for the prices of Zocor in Texas as compared to the price of Zocor in Colorado do not appear to be significantly different so we will use the exact test and pool the variances.
There is quite a bit of info in the session window. For this test you only need the data I have bolded for your homework. Use copy command to copy and past efrom your Session window and paste into your conclusion.
6 Revised for Minitab 17
Problem Definition: Is there any significant difference in the average price of Zocor sold in Texas as compared to Colorado?
H0: Col = TX
H1: Col TX
Minitab 17 The data is in separate columns within Minitab so we must direct the Minitab program to the columns>Using pull down arrow>Select Each sample in its own column>Enter the columns in which your data is contained. Order of columns does not matter in the t test Select Options and see next screen capture
Options>Set confidence interval Leave hypothesized difference at 0.0>Alternative>set correct direction for test>Click the assume equal variances box if you FTR the null of equal variances when you did the F test in the previous step. This will instruct Minitab to pool the variances for the t test. Hit OK and select the graph button
7 Revised for Minitab 17
Minitab 17 Graphs> Select Boxplot option as shown in second dialog box to the right. There are an assortment of tests to determine normality for our purposes we will use boxplots of each distribution. The boxplots will be cut and pasted into the assumption section of your report.
Session Window Output is the same for both programs and is shown below. Decision Rule: If critical ratio t is <-2.052 or > 2.052 the FTA the null.
Test: Two-Sample T-Test and CI: Prices Col, Prices Tex Two-sample T for Prices Col vs Prices Tex*
(DF = n1 + n2 - 2) = (16 + 13 – 2) = 27 Note: Variances assumed equal
N Mean StDev SE Mean
Prices Col 16 134.0 11.0 2.8
Prices Tex 13 138.0 12.7 3.5
Difference = mu (Prices Col) - mu (Prices Tex)
Estimate for difference: -4.02471
95% CI for difference: (-13.04678, 4.99735)
T-Test of difference = 0 (vs not =): T-Value = -0.92 P-Value = 0.368 DF = 27
Both use Pooled StDev = 11.7760
Conclusion: Critical ratio of -.92 is not less than the critical value of -2.052. FTR null hypothesis T2 error is possible
Confidence Interval and P value Interpretations: (-13.04678, 4.99735) The confidence
interval contains zero so there is no significant difference in average Zocor prices in either state. Also the pvalue of .368 is greater that the test alpha of .05. Interpretation: There is no significant difference in the average prices of Zocor between Colorado and Texas.
8 Revised for Minitab 17
Assumptions for t test of two means (know these) 1. The samples are random and independent 2. If n< 30, the populations from which the samples were taken are approximately normally
distributed. Boxplot is useful for this assumption. 3. If the variances are pooled the populations must have equal variances – exact test. F test
is used to satisfy the assumption of equal variances. 4. If variances cannot be assumed equal use Welch/Satterthwaite approximation procedure
for DF (approximate formula) for computing t critical ratio. Satisfying the Assumptions of normality for the t test as samples are less than 30: 1) Populations from which samples are taken are normally distributed. Since the sample sizes are less than 30, we must satisfy the assumption that the samples were taken from normally distributed populations. Figure 4: Graphs Options - Produced from graphs command in Boxplot graph Interpretation
The graphed medians fall at the center of the distribution within the IQR and the arithmetic means are close to the medians for each distribution. In regard to equal variability compare the whisker lengths on each graph to one another. The whiskers are approximately equal in length indicating equal variances for both populations. Small samples that provide characteristics of normality are usually considered normal due to the fact that these characteristics increase as the sample sizes increase.
D a
ta
Prices TexPrices Col
160
150
140
130
120
110
Boxplot of Prices Col, Prices Tex
**Assigned: You and some of your friends have decided to test the validity of an advertisement by a local pizza restaurant, which says it delivers to the dormitories faster than a local branch of a national chain. Both the local pizza restaurant and the national chain are located across the street from your college campus. The variable of interest is delivery time in minutes from the time the pizza is ordered to when it is delivered. You collect the data ordering 10 pizzas from local pizza restaurant and 10 from the national chain at different times. The following table is the record of the delivery times: At the .05 level of significance, is there any evidence that the mean delivery time the local pizza restaurant in less than the mean delivery time for the national chain?
Local (Delivery time in minutes) National Chain (Delivery time in minutes)
16.8 18.1 15.6 16.7 17.5
11.7 14.1 21.8 13.9 20.8
22.0 15.2 18.7 15.6 20.8
19.5 17.0 19.5 16.5 24.0
Complete testing in Minitab and prepare report in Word. Enter the data presented above into a Minitab worksheet. Conduct the test with alpha of .05 and complete the following.
9 Revised for Minitab 17
Using the six step process for each hypothesis test complete the following
Set up the six-step process and using an f test determine if the data has equal variances. Include only the critical value and pvalue techniques in your conclusion. Present the Minitab output in your conclusion. For your interpretation discuss whether the distributions have equal variances. Do not include graphs from the f test.
Set In your conclusion present the output from Minitab for: o Critical value/critical ratio technique o pvalue o confidence intervals.
Based on the characteristics of each of the above techniques include statements regarding Fail to reject or rejection of the null and type of error which may have been made.
Interpretation: State whether the variances are significantly different and which type of T test you will undertake given the outcome of the f test you just completed.
Use boxplot to satisfy assumption of normality as sample size is <30. Provide a brief description with the graph as to how normality is or is not supported.
F test and test with boxplot worth 12 points.
T test also called paired t, matched pairs or t test for dependence. Problem Definition: An insurance company wants to determine if there is any significant difference on average between repair costs for two contractors on the same jobs. Matched Pairs Testing Page 472 Inferences for two related populations Row Claim Contractor A Contractor B
1 Jones, C. 5500 6000
2 Smith, R. 1000 900
3 Xia, Y. 2500 2500
4 Gallo, J. 7800 8300
5 Carson, R. 6400 6200
6 Petty, M. 8800 9400
7 Tracy, L. 600 500
8 Barnes, J. 3300 3500
9 Rodriguez, J. 4500 5200
10 Van Dyke, P. 6500 6800
Hypothesis: Ho: µd = 0 There is no significant difference in average costs for Contractor A and B H1: µd ≠ 0 There is a significant difference in average costs for Contractor A and B Decision Rule: If critical ratio for t is< -2.262 or > 2.262 reject the null hypothesis of no
difference.= df = n-1
10 Revised for Minitab 17
Minitab 17 After you have entered each set of data into its own column Stat>Basic Stat>>Pair T>Select Each sample is in a column (you can put them in the same column but you will need subscripts)>Place cursor in Sample 1: and click to activate then click on the Column (Contractor A) Do the same for the second Sample Now click options
Minitab 17 Options Set confidence level leave Hypothesized difference at 0 and select correct alternative hypothesis from pull down menu Hit OK and select the Graphs button from the Paired t for the mean page (the first screen show above)
From graphs select Boxplot of differences as shown and graph on right will appear
Boxplot for your problem is placed in assumption with description as to how you assessed normality.
2001000-100-200-300-400-500-600-700
X _
Ho
Differences
Boxplot of Differences (with Ho and 95% t-confidence interval for the mean)
11 Revised for Minitab 17
Test:
Paired T-Test and CI: Contractor A, Contractor B Paired T for Contractor A - Contractor B
N Mean StDev SE Mean
Contractor A 10 4690.00 2799.38 885.24
Contractor B 10 4930.00 3008.89 951.50
Difference 10 -240.000 327.278 103.494
95% CI for mean difference: (-474.121, -5.879)
T-Test of mean difference = 0 (vs not = 0): T-Value = -2.32 P-Value = 0.046
Conclusion: Reject null. T critical ratio of -2.32 <- 2.262 critical value. The confidence interval does not contain zero and pvalue is less that alpha level. Reject the null. A type 1 error may have been made at alpha level .05. Interpretation: On average Contractor B is significantly higher than contractor A.
Boxplot of Differences - Satisfying Normality assumption using a boxplot Distribution of differences for Contractor A and B Boxplots are placed in assumption section after the Interpretation
Differences
2001000-100-200-300-400-500-600-700
X _
Ho
Boxplot of Differences (with Ho and 95% t-confidence interval for the mean)
Graph Interpretation: The graphed median falls almost at the center of the midspread along with the sample mean and hypothesized mean. The whiskers at either end of the plot indicate the distribution is skewed to the left but the samples are small so we give the distribution the benefit of the doubt. Remember that the t-test is robust in that they are not sensitive to departures from normality. Assumptions for matched pairs testing (know these)
Random and dependent samples Population of differences is normally distributed
**Assigned: Complete Problem 10.29 Page 481 using the six step process. Problem Definition: Determine whether or not the average score for after the MCAT review service is greater than the average before scores. Alpha is .05
Student 1 2 3 4 5 6 7 8 9
Before 26 21 20 31 18 33 25 23 30
After 28 26 17 34 20 31 26 22 32
Utilize the critical value technique, the confidence interval and the p-value in your conclusion.
12 Revised for Minitab 17
Add a section labeled Assumption in which you will place your boxplot. Below the boxplot discuss how you know whether or not normality may be assumed. 7 points possible *Note: You will have to enter the data from the problem into a worksheet in Minitab. Save the File>Save Project A enter name and location in which you wish to save. It is a good idea to save all your homework files in this fashion. _______________________________________________________________________
Hypothesis Testing for the Difference between Two Proportions
DES and Breast Cancer Summarized Data Problem Definition: Testing to determine if on average, mothers who used DES develop significantly more cases of breast cancer than mothers who did not use DES.
H0: DES = did not use
H1: DES > did not use Decision Rule: If Z critical ratio is greater than 2.33 reject the null hypothesis. Sample X N DES 118 3033 No DES 80 3033
Minitab 17 Stat>Basic Stats>Two Sample Proportion>Pull down menu to summarized date Note n is sample events which is x success from n trials. In this case 118 breast cancer results from 3033 women who used DES Sample 2 is x=80 successes or breast cancer cases in 3033 women who did not use DES Is there significantly more cancer in DES users than the non DES users
13 Revised for Minitab 17
Select Options> Set Confidence level>Leave difference at 0.0 and since Sample one is tested to see if greater than sample two set alternative as one tailed to the right. We are not testing against a difference so we may set test method to pool the estimate of the proportion to achieve a more accurate estimate of combined variability.
H0: DES = did not use
H1: DES > did not use I have pasted the null and alternative below so you can see the alternative is set to to match the hypothesis established at the beginning of the test.
Output from Session cut and pasted for Test step Test and CI for Two Proportions
Sample X N Sample p
1 118 3033 0.038905
2 80 3033 0.026377
Difference = p (1) - p (2)
Estimate for difference: 0.0125288
99% lower bound for difference: 0.00192024
Test for difference = 0 (vs > 0): Z = 2.75 P-Value = 0.003
Conclusion: The Z critical ratio of 2.75 is greater than the 2.33 critical value so we reject the null
hypothesis. There is a 1% chance that a T1 error has been made. 0 does not lie within the confidence interval and pvalue is less than alpha which provides same conclusion as critical value/critical ratio technique. Interpretation: Significantly more mothers developed breast cancer when DES was prescribed.
**Assigned: Test of two proportions Summarized Data: Use the six step process to determine if the proportions are equal. Do not assume normality. The Problem: People are considered obese when they are approximately 30 pounds over their healthy weight. Obesity can increase the risk of heart disease and diabetes, which could add to the cost of the healthcare system. Two random samples of adults were selected. The individuals in one of the samples lacked high school diplomas. The individuals in the other sample held college degrees. The number of people sampled and the obese individuals from each sample are as follows: Alpha: 0.01
No High School Diplomas College Degree
x=75 x =41
n=213 n=190
Test to determine if there is any significant different in proportion of individuals who are obese and have HS diplomas as compared to those with College Degrees.
14 Revised for Minitab 17
See the note below about satisfying that the samples are normally distributed.
Assumptions: You will need to satisfy that the normal distribution may be used to approximate binomial distribution. Satisfying assumption the binomial may be approximated by normal. Using the equation below for both samples compute the answers and type them neatly on your work after the interpretation. np≥5 and n(1-p) ≥5 for both samples.