Introduction to the Practice of Statistics
NINTH EDITION
David S. Moore George P. McCabe Bruce A. Craig Purdue University
Vice President, STEM: Ben Roberts Publisher: Terri Ward Senior Acquisitions Editor: Karen Carson Marketing Manager: Tom DeMarco Marketing Assistant: Cate McCaffery Development Editor: Jorge Amaral Senior Media Editor: Catriona Kaplan Assistant Media Editor: Emily Tenenbaum Director of Digital Production: Keri deManigold Senior Media Producer: Alison Lorber Associate Editor: Victoria Garvey Editorial Assistant: Katharine Munz Photo Editor: Cecilia Varas Photo Researcher: Candice Cheesman Director of Design, Content Management: Diana Blume Text and Cover Designer: Blake Logan Project Editor: Edward Dionne, MPS North America LLC Illustrations: MPS North America LLC Production Manager: Susan Wein Composition: MPS North America LLC Printing and Binding: LSC Communications Cover Illustration: Drawing Water: Spring 2011 detail (Midwest) by David Wicks “Look Back” Arrow: NewCorner/Shutterstock
Library of Congress Control Number: 2016946039
Student Edition Hardcover: ISBN-13: 978-1-319-01338-7 ISBN-10: 1-319-01338-4
Student Edition Loose-leaf: ISBN-13: 978-1-319-01362-2 ISBN-10: 1-319-01362-7
Instructor Complimentary Copy: ISBN-13: 978-1-319-01428-5 ISBN-10: 1-319-01428-3
© 2017, 2014, 2012, 2009 by W. H. Freeman and Company All rights reserved Printed in the United States of America First printing
W. H. Freeman and Company One New York Plaza Suite 4500 New York, NY 10004-1562 www.macmillanlearning.com
http://www.macmillanlearning.com
Brief Contents
To Teachers: About This Book To Students: What Is Statistics? About the Authors Data Table Index Beyond the Basics Index
PART I Looking at Data CHAPTER 1 Looking at Data—Distributions
CHAPTER 2 Looking at Data—Relationships
CHAPTER 3 Producing Data
PART II Probability and Inference CHAPTER 4 Probability: The Study of Randomness
CHAPTER 5 Sampling Distributions
CHAPTER 6 Introduction to Inference
CHAPTER 7 Inference for Means
CHAPTER 8 Inference for Proportions
PART III Topics in Inference CHAPTER 9 Inference for Categorical Data
CHAPTER 10 Inference for Regression
CHAPTER 11 Multiple Regression
CHAPTER 12 One-Way Analysis of Variance
CHAPTER 13 Two-Way Analysis of Variance Tables Answers to Odd-Numbered Exercises Notes and Data Sources Index
Contents
To Teachers: About This Book To Students: What Is Statistics? About the Authors Data Table Index Beyond the Basics Index
PART I Looking at Data CHAPTER 1 Looking at Data—Distributions Introduction
1.1 Data Key characteristics of a data set
Section 1.1 Summary Section 1.1 Exercises 1.2 Displaying Distributions with Graphs
Categorical variables: Bar graphs and pie charts Quantitative variables: Stemplots and histograms Histograms Data analysis in action: Don’t hang up on me Examining distributions Dealing with outliers Time plots
Section 1.2 Summary Section 1.2 Exercises 1.3 Describing Distributions with Numbers
Measuring center: The mean Measuring center: The median Mean versus median Measuring spread: The quartiles The five-number summary and boxplots The 1.5 × IQR rule for suspected outliers Measuring spread: The standard deviation Properties of the standard deviation Choosing measures of center and spread Changing the unit of measurement
Section 1.3 Summary Section 1.3 Exercises 1.4 Density Curves and Normal Distributions
Density curves
Measuring center and spread for density curves Normal distributions The 68–95–99.7 rule Standardizing observations Normal distribution calculations Using the standard Normal table Inverse Normal calculations Normal quantile plots
Beyond the Basics: Density estimation Section 1.4 Summary Section 1.4 Exercises Chapter 1 Exercises
CHAPTER 2 Looking at Data—Relationships Introduction
2.1 Relationships Examining relationships
Section 2.1 Summary Section 2.1 Exercises 2.2 Scatterplots
Interpreting scatterplots The log transformation Adding categorical variables to scatterplots Scatterplot smoothers Categorical explanatory variables
Section 2.2 Summary Section 2.2 Exercises 2.3 Correlation
The correlation r Properties of correlation
Section 2.3 Summary Section 2.3 Exercises 2.4 Least-Squares Regression
Fitting a line to data Prediction Least-squares regression Interpreting the regression line Facts about least-squares regression Correlation and regression Another view of r2
Section 2.4 Summary Section 2.4 Exercises 2.5 Cautions about Correlation and Regression
Residuals Outliers and influential observations
Beware of the lurking variable Beware of correlations based on averaged data Beware of restricted ranges
Beyond the Basics: Data mining Section 2.5 Summary Section 2.5 Exercises 2.6 Data Analysis for Two-Way Tables
The two-way table Joint distribution Marginal distributions Describing relations in two-way tables Conditional distributions Simpson’s paradox
Section 2.6 Summary Section 2.6 Exercises 2.7 The Question of Causation
Explaining association Establishing causation
Section 2.7 Summary Section 2.7 Exercises Chapter 2 Exercises
CHAPTER 3 Producing Data Introduction
3.1 Sources of Data Anecdotal data Available data Sample surveys and experiments
Section 3.1 Summary Section 3.1 Exercises 3.2 Design of Experiments
Comparative experiments Randomization Randomized comparative experiments How to randomize Randomization using software Randomization using random digits Cautions about experimentation Matched pairs designs Block designs
Section 3.2 Summary Section 3.2 Exercises 3.3 Sampling Design
Simple random samples How to select a simple random sample
Stratified random samples Multistage random samples Cautions about sample surveys
Beyond the Basics: Capture-recapture sampling Section 3.3 Summary Section 3.3 Exercises 3.4 Ethics
Institutional review boards Informed consent Confidentiality Clinical trials Behavioral and social science experiments
Section 3.4 Summary Section 3.4 Exercises Chapter 3 Exercises
PART II Probability and Inference CHAPTER 4 Probability: The Study of Randomness Introduction
4.1 Randomness The language of probability Thinking about randomness The uses of probability
Section 4.1 Summary Section 4.1 Exercises 4.2 Probability Models
Sample spaces Probability rules Assigning probabilities: Finite number of outcomes Assigning probabilities: Equally likely outcomes Independence and the multiplication rule Applying the probability rules
Section 4.2 Summary Section 4.2 Exercises 4.3 Random Variables
Discrete random variables Continuous random variables Normal distributions as probability distributions
Section 4.3 Summary Section 4.3 Exercises 4.4 Means and Variances of Random Variables
The mean of a random variable Statistical estimation and the law of large numbers
Thinking about the law of large numbers Beyond the Basics: More laws of large numbers
Rules for means The variance of a random variable Rules for variances and standard deviations
Section 4.4 Summary Section 4.4 Exercises 4.5 General Probability Rules
General addition rules Conditional probability General multiplication rules Tree diagrams Bayes’s rule Independence again
Section 4.5 Summary Section 4.5 Exercises Chapter 4 Exercises
CHAPTER 5 Sampling Distributions Introduction
5.1 Toward Statistical Inference Sampling variability Sampling distributions Bias and variability Sampling from large populations Why randomize?
Section 5.1 Summary Section 5.1 Exercises 5.2 The Sampling Distribution of a Sample Mean
The mean and standard deviation of x̅ The central limit theorem A few more facts
Beyond the Basics: Weibull distributions Section 5.2 Summary Section 5.2 Exercises 5.3 Sampling Distributions for Counts and Proportions
The binomial distributions for sample counts Binomial distributions in statistical sampling Finding binomial probabilities Binomial mean and standard deviation Sample proportions Normal approximation for counts and proportions The continuity correction Binomial formula The Poisson distributions
Section 5.3 Summary
Section 5.3 Exercises Chapter 5 Exercises
CHAPTER 6 Introduction to Inference Introduction Overview of inference 6.1 Estimating with Confidence
Statistical confidence Confidence intervals Confidence interval for a population mean How confidence intervals behave Choosing the sample size Some cautions
Section 6.1 Summary Section 6.1 Exercises 6.2 Tests of Significance
The reasoning of significance tests Stating hypotheses Test statistics P-values Statistical significance Tests for a population mean Two-sided significance tests and confidence intervals The P-value versus a statement of significance
Section 6.2 Summary Section 6.2 Exercises 6.3 Use and Abuse of Tests
Choosing a level of significance What statistical significance does not mean Don’t ignore lack of significance Statistical inference is not valid for all sets of data Beware of searching for significance
Section 6.3 Summary Section 6.3 Exercises 6.4 Power and Inference as a Decision
Power Increasing the power Inference as decision Two types of error Error probabilities The common practice of testing hypotheses
Section 6.4 Summary Section 6.4 Exercises Chapter 6 Exercises
CHAPTER 7 Inference for Means
Introduction
7.1 Inference for the Mean of a Population The t distributions The one-sample t confidence interval The one-sample t test Matched pairs t procedures Robustness of the t procedures
Beyond the Basics: The bootstrap Section 7.1 Summary Section 7.1 Exercises 7.2 Comparing Two Means
The two-sample z statistic The two-sample t procedures The two-sample t confidence interval The two-sample t significance test Robustness of the two-sample procedures Inference for small samples Software approximation for the degrees of freedom The pooled two-sample t procedures
Section 7.2 Summary Section 7.2 Exercises 7.3 Additional Topics on Inference
Choosing the sample size Inference for non-Normal populations
Section 7.3 Summary Section 7.3 Exercises Chapter 7 Exercises
CHAPTER 8 Inference for Proportions Introduction
8.1 Inference for a Single Proportion Large-sample confidence interval for a single proportion
Beyond the Basics: The plus four confidence interval for a single proportion Significance test for a single proportion Choosing a sample size for a confidence interval Choosing a sample size for a significance test
Section 8.1 Summary Section 8.1 Exercises 8.2 Comparing Two Proportions
Large-sample confidence interval for a difference in proportions Beyond the Basics: The plus four confidence interval for a difference in proportions
Significance test for a difference in proportions Choosing a sample size for two sample proportions
Beyond the Basics: Relative risk Section 8.2 Summary
Section 8.2 Exercises Chapter 8 Exercises
PART III Topics in Inference CHAPTER 9 Inference for Categorical Data Introduction
9.1 Inference for Two-Way Tables The hypothesis: No association Expected cell counts The chi-square test Computations Computing conditional distributions The chi-square test and the z test
Beyond the Basics: Meta-analysis Section 9.1 Summary Section 9.1 Exercises 9.2 Goodness of Fit Section 9.2 Summary Section 9.2 Exercises Chapter 9 Exercises
CHAPTER 10 Inference for Regression Introduction
10.1 Simple Linear Regression Statistical model for linear regression Preliminary data analysis and inference considerations Estimating the regression parameters Checking model assumptions Confidence intervals and significance tests Confidence intervals for mean response Prediction intervals Transforming variables
Beyond the Basics: Nonlinear regression Section 10.1 Summary Section 10.1 Exercises 10.2 More Detail about Simple Linear Regression
Analysis of variance for regression The ANOVA F test Calculations for regression inference Inference for correlation
Section 10.2 Summary Section 10.2 Exercises Chapter 10 Exercises
CHAPTER 11 Multiple Regression Introduction
11.1 Inference for Multiple Regression Population multiple regression equation Data for multiple regression Multiple linear regression model Estimation of the multiple regression parameters Confidence intervals and significance tests for regression coefficients ANOVA table for multiple regression Squared multiple correlation R2
Section 11.1 Summary Section 11.1 Exercises 11.2 A Case Study
Preliminary analysis Relationships between pairs of variables Regression on high school grades Interpretation of results Examining the residuals Refining the model Regression on SAT scores Regression using all variables Test for a collection of regression coefficients
Beyond the Basics: Multiple logistic regression Section 11.2 Summary Section 11.2 Exercises Chapter 11 Exercises
CHAPTER 12 One-Way Analysis of Variance Introduction
12.1 Inference for One-Way Analysis of Variance Data for one-way ANOVA Comparing means The two-sample t statistic An overview of ANOVA The ANOVA model Estimates of population parameters Testing hypotheses in one-way ANOVA The ANOVA table The F test Software
Beyond the Basics: Testing the equality of spread Section 12.1 Summary Section 12.1 Exercises 12.2 Comparing the Means
Contrasts
Multiple comparisons Power
Section 12.2 Summary Section 12.2 Exercises Chapter 12 Exercises
CHAPTER 13 Two-Way Analysis of Variance Introduction
13.1 The Two-Way ANOVA Model Advantages of two-way ANOVA The two-way ANOVA model Main effects and interactions
13.2 Inference for Two-Way ANOVA The ANOVA table for two-way ANOVA
Chapter 13 Summary Chapter 13 Exercises Tables Answers to Odd-Numbered Exercises Notes and Data Sources Index
To Teachers: About This Book
Statistics is the science of data. Introduction to the Practice of Statistics (IPS) is an introductory text based on this principle. We present methods of basic statistics in a way that emphasizes working with data and mastering statistical reasoning. IPS is elementary in mathematical level but conceptually rich in statistical ideas. After completing a course based on our text, we would like students to be able to think objectively about conclusions drawn from data and use statistical methods in their own work.
In IPS, we combine attention to basic statistical concepts with a comprehensive presentation of the elementary statistical methods that students will find useful in their work. IPS has been successful for several reasons:
1. IPS examines the nature of modern statistical practice at a level suitable for beginners. We focus on the production and analysis of data as well as the traditional topics of probability and inference.
2. IPS has a logical overall progression, so data production and data analysis are a major focus, while inference is treated as a tool that helps us draw conclusions from data in an appropriate way.
3. IPS presents data analysis as more than a collection of techniques for exploring data. We emphasize systematic ways of thinking about data. Simple principles guide the analysis: always plot your data; look for overall patterns and deviations from them; when looking at the overall pattern of a distribution for one variable, consider shape, center, and spread; for relations between two variables, consider form, direction, and strength; always ask whether a relationship between variables is influenced by other variables lurking in the background. We warn students about pitfalls in clear cautionary discussions.
4. IPS uses real examples to drive the exposition. Students learn the technique of least-squares regression and how to interpret the regression slope. But they also learn the conceptual ties between regression and correlation and the importance of looking for influential observations.
5. IPS is aware of current developments both in statistical science and in teaching statistics. Brief, optional Beyond the Basics sections give quick overviews of topics such as density estimation, scatterplot smoothers, data mining, nonlinear regression, and meta-analysis. Chapter 16 gives an elementary introduction to the bootstrap and other computer-intensive statistical methods.
The title of the book expresses our intent to introduce readers to statistics as it is used in practice. Statistics in practice is concerned with drawing conclusions from data. We focus on problem solving rather than on methods that may be useful in specific settings.
GAISE The College Report of the Guidelines for Assessment and Instruction in Statistics Education (GAISE) Project (www.amstat.org/education/gaise/) was funded by the American Statistical Association to make recommendations for how introductory statistics courses should be taught. This report and its update contain many interesting teaching suggestions, and we strongly recommend that you read it. The philosophy and approach of IPS closely reflect the GAISE recommendations. Let’s examine each of the latest recommendations in the context of IPS.
1. Teach statistical thinking. Through our experiences as applied statisticians, we are very familiar with the components that are needed for the appropriate use of statistical methods. We focus on formulating questions, collecting and finding data, evaluating the quality of data, exploring the relationships among variables, performing statistical analyses, and drawing conclusions. In examples and exercises throughout the text, we emphasize putting the analysis in the proper context and translating numerical and graphical summaries into conclusions.
2. Focus on conceptual understanding. With the software available today, it is very easy for almost anyone to apply a wide variety of statistical procedures, both simple and complex, to a set of data. Without a firm grasp of the concepts, such applications are frequently meaningless. By using the methods that we present on real sets of data, we believe that students will gain an excellent understanding of these concepts. Our emphasis is on the input (questions of interest, collecting or finding data, examining data) and the output (conclusions) for a statistical analysis. Formulas are given only where they will provide some insight into concepts.
3. Integrate real data with a context and a purpose. Many of the examples and exercises in IPS include data that we have obtained from collaborators or consulting clients. Other data sets have come from research related to these activities. We have also used the Internet as a data source, particularly for data related to social media and other topics of interest to undergraduates. Our emphasis on real data, rather than artificial data chosen to illustrate a
http://www.amstat.org/education/gaise/
calculation, serves to motivate students and help them see the usefulness of statistics in everyday life. We also frequently encounter interesting statistical issues that we explore. These include outliers and nonlinear relationships. All data sets are available from the text website.