Introduction to the Practice of Statistics
NINTH EDITION
David S. Moore George P. McCabe Bruce A. Craig Purdue University
Vice President, STEM: Ben Roberts Publisher: Terri Ward Senior Acquisitions Editor: Karen Carson Marketing Manager: Tom DeMarco Marketing Assistant: Cate McCaffery Development Editor: Jorge Amaral Senior Media Editor: Catriona Kaplan Assistant Media Editor: Emily Tenenbaum Director of Digital Production: Keri deManigold Senior Media Producer: Alison Lorber Associate Editor: Victoria Garvey Editorial Assistant: Katharine Munz Photo Editor: Cecilia Varas Photo Researcher: Candice Cheesman Director of Design, Content Management: Diana Blume Text and Cover Designer: Blake Logan Project Editor: Edward Dionne, MPS North America LLC Illustrations: MPS North America LLC Production Manager: Susan Wein Composition: MPS North America LLC Printing and Binding: LSC Communications Cover Illustration: Drawing Water: Spring 2011 detail (Midwest) by David Wicks “Look Back” Arrow: NewCorner/Shutterstock
Library of Congress Control Number: 2016946039
Student Edition Hardcover: ISBN-13: 978-1-319-01338-7 ISBN-10: 1-319-01338-4
Student Edition Loose-leaf: ISBN-13: 978-1-319-01362-2 ISBN-10: 1-319-01362-7
Instructor Complimentary Copy: ISBN-13: 978-1-319-01428-5 ISBN-10: 1-319-01428-3
© 2017, 2014, 2012, 2009 by W. H. Freeman and Company All rights reserved Printed in the United States of America First printing
W. H. Freeman and Company One New York Plaza Suite 4500 New York, NY 10004-1562 www.macmillanlearning.com
http://www.macmillanlearning.com
Brief Contents
To Teachers: About This Book To Students: What Is Statistics? About the Authors Data Table Index Beyond the Basics Index
PART I Looking at Data CHAPTER 1 Looking at Data—Distributions
CHAPTER 2 Looking at Data—Relationships
CHAPTER 3 Producing Data
PART II Probability and Inference CHAPTER 4 Probability: The Study of Randomness
CHAPTER 5 Sampling Distributions
CHAPTER 6 Introduction to Inference
CHAPTER 7 Inference for Means
CHAPTER 8 Inference for Proportions
PART III Topics in Inference CHAPTER 9 Inference for Categorical Data
CHAPTER 10 Inference for Regression
CHAPTER 11 Multiple Regression
CHAPTER 12 One-Way Analysis of Variance
CHAPTER 13 Two-Way Analysis of Variance Tables Answers to Odd-Numbered Exercises Notes and Data Sources Index
Contents
To Teachers: About This Book To Students: What Is Statistics? About the Authors Data Table Index Beyond the Basics Index
PART I Looking at Data CHAPTER 1 Looking at Data—Distributions Introduction
1.1 Data Key characteristics of a data set
Section 1.1 Summary Section 1.1 Exercises 1.2 Displaying Distributions with Graphs
Categorical variables: Bar graphs and pie charts Quantitative variables: Stemplots and histograms Histograms Data analysis in action: Don’t hang up on me Examining distributions Dealing with outliers Time plots
Section 1.2 Summary Section 1.2 Exercises 1.3 Describing Distributions with Numbers
Measuring center: The mean Measuring center: The median Mean versus median Measuring spread: The quartiles The five-number summary and boxplots The 1.5 × IQR rule for suspected outliers Measuring spread: The standard deviation Properties of the standard deviation Choosing measures of center and spread Changing the unit of measurement
Section 1.3 Summary Section 1.3 Exercises 1.4 Density Curves and Normal Distributions
Density curves
Measuring center and spread for density curves Normal distributions The 68–95–99.7 rule Standardizing observations Normal distribution calculations Using the standard Normal table Inverse Normal calculations Normal quantile plots
Beyond the Basics: Density estimation Section 1.4 Summary Section 1.4 Exercises Chapter 1 Exercises
CHAPTER 2 Looking at Data—Relationships Introduction
2.1 Relationships Examining relationships
Section 2.1 Summary Section 2.1 Exercises 2.2 Scatterplots
Interpreting scatterplots The log transformation Adding categorical variables to scatterplots Scatterplot smoothers Categorical explanatory variables
Section 2.2 Summary Section 2.2 Exercises 2.3 Correlation
The correlation r Properties of correlation
Section 2.3 Summary Section 2.3 Exercises 2.4 Least-Squares Regression
Fitting a line to data Prediction Least-squares regression Interpreting the regression line Facts about least-squares regression Correlation and regression Another view of r2
Section 2.4 Summary Section 2.4 Exercises 2.5 Cautions about Correlation and Regression
Residuals Outliers and influential observations
Beware of the lurking variable Beware of correlations based on averaged data Beware of restricted ranges
Beyond the Basics: Data mining Section 2.5 Summary Section 2.5 Exercises 2.6 Data Analysis for Two-Way Tables
The two-way table Joint distribution Marginal distributions Describing relations in two-way tables Conditional distributions Simpson’s paradox
Section 2.6 Summary Section 2.6 Exercises 2.7 The Question of Causation
Explaining association Establishing causation
Section 2.7 Summary Section 2.7 Exercises Chapter 2 Exercises
CHAPTER 3 Producing Data Introduction
3.1 Sources of Data Anecdotal data Available data Sample surveys and experiments
Section 3.1 Summary Section 3.1 Exercises 3.2 Design of Experiments
Comparative experiments Randomization Randomized comparative experiments How to randomize Randomization using software Randomization using random digits Cautions about experimentation Matched pairs designs Block designs
Section 3.2 Summary Section 3.2 Exercises 3.3 Sampling Design
Simple random samples How to select a simple random sample
Stratified random samples Multistage random samples Cautions about sample surveys
Beyond the Basics: Capture-recapture sampling Section 3.3 Summary Section 3.3 Exercises 3.4 Ethics
Institutional review boards Informed consent Confidentiality Clinical trials Behavioral and social science experiments
Section 3.4 Summary Section 3.4 Exercises Chapter 3 Exercises
PART II Probability and Inference CHAPTER 4 Probability: The Study of Randomness Introduction
4.1 Randomness The language of probability Thinking about randomness The uses of probability
Section 4.1 Summary Section 4.1 Exercises 4.2 Probability Models
Sample spaces Probability rules Assigning probabilities: Finite number of outcomes Assigning probabilities: Equally likely outcomes Independence and the multiplication rule Applying the probability rules
Section 4.2 Summary Section 4.2 Exercises 4.3 Random Variables
Discrete random variables Continuous random variables Normal distributions as probability distributions
Section 4.3 Summary Section 4.3 Exercises 4.4 Means and Variances of Random Variables
The mean of a random variable Statistical estimation and the law of large numbers
Thinking about the law of large numbers Beyond the Basics: More laws of large numbers
Rules for means The variance of a random variable Rules for variances and standard deviations
Section 4.4 Summary Section 4.4 Exercises 4.5 General Probability Rules
General addition rules Conditional probability General multiplication rules Tree diagrams Bayes’s rule Independence again
Section 4.5 Summary Section 4.5 Exercises Chapter 4 Exercises
CHAPTER 5 Sampling Distributions Introduction
5.1 Toward Statistical Inference Sampling variability Sampling distributions Bias and variability Sampling from large populations Why randomize?
Section 5.1 Summary Section 5.1 Exercises 5.2 The Sampling Distribution of a Sample Mean
The mean and standard deviation of x̅ The central limit theorem A few more facts
Beyond the Basics: Weibull distributions Section 5.2 Summary Section 5.2 Exercises 5.3 Sampling Distributions for Counts and Proportions
The binomial distributions for sample counts Binomial distributions in statistical sampling Finding binomial probabilities Binomial mean and standard deviation Sample proportions Normal approximation for counts and proportions The continuity correction Binomial formula The Poisson distributions
Section 5.3 Summary
Section 5.3 Exercises Chapter 5 Exercises
CHAPTER 6 Introduction to Inference Introduction Overview of inference 6.1 Estimating with Confidence
Statistical confidence Confidence intervals Confidence interval for a population mean How confidence intervals behave Choosing the sample size Some cautions
Section 6.1 Summary Section 6.1 Exercises 6.2 Tests of Significance
The reasoning of significance tests Stating hypotheses Test statistics P-values Statistical significance Tests for a population mean Two-sided significance tests and confidence intervals The P-value versus a statement of significance
Section 6.2 Summary Section 6.2 Exercises 6.3 Use and Abuse of Tests
Choosing a level of significance What statistical significance does not mean Don’t ignore lack of significance Statistical inference is not valid for all sets of data Beware of searching for significance
Section 6.3 Summary Section 6.3 Exercises 6.4 Power and Inference as a Decision
Power Increasing the power Inference as decision Two types of error Error probabilities The common practice of testing hypotheses
Section 6.4 Summary Section 6.4 Exercises Chapter 6 Exercises
CHAPTER 7 Inference for Means
Introduction
7.1 Inference for the Mean of a Population The t distributions The one-sample t confidence interval The one-sample t test Matched pairs t procedures Robustness of the t procedures
Beyond the Basics: The bootstrap Section 7.1 Summary Section 7.1 Exercises 7.2 Comparing Two Means
The two-sample z statistic The two-sample t procedures The two-sample t confidence interval The two-sample t significance test Robustness of the two-sample procedures Inference for small samples Software approximation for the degrees of freedom The pooled two-sample t procedures
Section 7.2 Summary Section 7.2 Exercises 7.3 Additional Topics on Inference
Choosing the sample size Inference for non-Normal populations
Section 7.3 Summary Section 7.3 Exercises Chapter 7 Exercises
CHAPTER 8 Inference for Proportions Introduction
8.1 Inference for a Single Proportion Large-sample confidence interval for a single proportion
Beyond the Basics: The plus four confidence interval for a single proportion Significance test for a single proportion Choosing a sample size for a confidence interval Choosing a sample size for a significance test
Section 8.1 Summary Section 8.1 Exercises 8.2 Comparing Two Proportions
Large-sample confidence interval for a difference in proportions Beyond the Basics: The plus four confidence interval for a difference in proportions
Significance test for a difference in proportions Choosing a sample size for two sample proportions
Beyond the Basics: Relative risk Section 8.2 Summary
Section 8.2 Exercises Chapter 8 Exercises
PART III Topics in Inference CHAPTER 9 Inference for Categorical Data Introduction
9.1 Inference for Two-Way Tables The hypothesis: No association Expected cell counts The chi-square test Computations Computing conditional distributions The chi-square test and the z test
Beyond the Basics: Meta-analysis Section 9.1 Summary Section 9.1 Exercises 9.2 Goodness of Fit Section 9.2 Summary Section 9.2 Exercises Chapter 9 Exercises
CHAPTER 10 Inference for Regression Introduction
10.1 Simple Linear Regression Statistical model for linear regression Preliminary data analysis and inference considerations Estimating the regression parameters Checking model assumptions Confidence intervals and significance tests Confidence intervals for mean response Prediction intervals Transforming variables
Beyond the Basics: Nonlinear regression Section 10.1 Summary Section 10.1 Exercises 10.2 More Detail about Simple Linear Regression
Analysis of variance for regression The ANOVA F test Calculations for regression inference Inference for correlation
Section 10.2 Summary Section 10.2 Exercises Chapter 10 Exercises
CHAPTER 11 Multiple Regression Introduction
11.1 Inference for Multiple Regression Population multiple regression equation Data for multiple regression Multiple linear regression model Estimation of the multiple regression parameters Confidence intervals and significance tests for regression coefficients ANOVA table for multiple regression Squared multiple correlation R2
Section 11.1 Summary Section 11.1 Exercises 11.2 A Case Study
Preliminary analysis Relationships between pairs of variables Regression on high school grades Interpretation of results Examining the residuals Refining the model Regression on SAT scores Regression using all variables Test for a collection of regression coefficients
Beyond the Basics: Multiple logistic regression Section 11.2 Summary Section 11.2 Exercises Chapter 11 Exercises
CHAPTER 12 One-Way Analysis of Variance Introduction
12.1 Inference for One-Way Analysis of Variance Data for one-way ANOVA Comparing means The two-sample t statistic An overview of ANOVA The ANOVA model Estimates of population parameters Testing hypotheses in one-way ANOVA The ANOVA table The F test Software
Beyond the Basics: Testing the equality of spread Section 12.1 Summary Section 12.1 Exercises 12.2 Comparing the Means
Contrasts
Multiple comparisons Power
Section 12.2 Summary Section 12.2 Exercises Chapter 12 Exercises
CHAPTER 13 Two-Way Analysis of Variance Introduction
13.1 The Two-Way ANOVA Model Advantages of two-way ANOVA The two-way ANOVA model Main effects and interactions
13.2 Inference for Two-Way ANOVA The ANOVA table for two-way ANOVA
Chapter 13 Summary Chapter 13 Exercises Tables Answers to Odd-Numbered Exercises Notes and Data Sources Index
To Teachers: About This Book
Statistics is the science of data. Introduction to the Practice of Statistics (IPS) is an introductory text based on this principle. We present methods of basic statistics in a way that emphasizes working with data and mastering statistical reasoning. IPS is elementary in mathematical level but conceptually rich in statistical ideas. After completing a course based on our text, we would like students to be able to think objectively about conclusions drawn from data and use statistical methods in their own work.
In IPS, we combine attention to basic statistical concepts with a comprehensive presentation of the elementary statistical methods that students will find useful in their work. IPS has been successful for several reasons:
1. IPS examines the nature of modern statistical practice at a level suitable for beginners. We focus on the production and analysis of data as well as the traditional topics of probability and inference.
2. IPS has a logical overall progression, so data production and data analysis are a major focus, while inference is treated as a tool that helps us draw conclusions from data in an appropriate way.
3. IPS presents data analysis as more than a collection of techniques for exploring data. We emphasize systematic ways of thinking about data. Simple principles guide the analysis: always plot your data; look for overall patterns and deviations from them; when looking at the overall pattern of a distribution for one variable, consider shape, center, and spread; for relations between two variables, consider form, direction, and strength; always ask whether a relationship between variables is influenced by other variables lurking in the background. We warn students about pitfalls in clear cautionary discussions.
4. IPS uses real examples to drive the exposition. Students learn the technique of least-squares regression and how to interpret the regression slope. But they also learn the conceptual ties between regression and correlation and the importance of looking for influential observations.
5. IPS is aware of current developments both in statistical science and in teaching statistics. Brief, optional Beyond the Basics sections give quick overviews of topics such as density estimation, scatterplot smoothers, data mining, nonlinear regression, and meta-analysis. Chapter 16 gives an elementary introduction to the bootstrap and other computer-intensive statistical methods.
The title of the book expresses our intent to introduce readers to statistics as it is used in practice. Statistics in practice is concerned with drawing conclusions from data. We focus on problem solving rather than on methods that may be useful in specific settings.
GAISE The College Report of the Guidelines for Assessment and Instruction in Statistics Education (GAISE) Project (www.amstat.org/education/gaise/) was funded by the American Statistical Association to make recommendations for how introductory statistics courses should be taught. This report and its update contain many interesting teaching suggestions, and we strongly recommend that you read it. The philosophy and approach of IPS closely reflect the GAISE recommendations. Let’s examine each of the latest recommendations in the context of IPS.
1. Teach statistical thinking. Through our experiences as applied statisticians, we are very familiar with the components that are needed for the appropriate use of statistical methods. We focus on formulating questions, collecting and finding data, evaluating the quality of data, exploring the relationships among variables, performing statistical analyses, and drawing conclusions. In examples and exercises throughout the text, we emphasize putting the analysis in the proper context and translating numerical and graphical summaries into conclusions.
2. Focus on conceptual understanding. With the software available today, it is very easy for almost anyone to apply a wide variety of statistical procedures, both simple and complex, to a set of data. Without a firm grasp of the concepts, such applications are frequently meaningless. By using the methods that we present on real sets of data, we believe that students will gain an excellent understanding of these concepts. Our emphasis is on the input (questions of interest, collecting or finding data, examining data) and the output (conclusions) for a statistical analysis. Formulas are given only where they will provide some insight into concepts.
3. Integrate real data with a context and a purpose. Many of the examples and exercises in IPS include data that we have obtained from collaborators or consulting clients. Other data sets have come from research related to these activities. We have also used the Internet as a data source, particularly for data related to social media and other topics of interest to undergraduates. Our emphasis on real data, rather than artificial data chosen to illustrate a
http://www.amstat.org/education/gaise/
calculation, serves to motivate students and help them see the usefulness of statistics in everyday life. We also frequently encounter interesting statistical issues that we explore. These include outliers and nonlinear relationships. All data sets are available from the text website.
4. Foster active learning in the classroom. As we mentioned earlier, we believe that statistics is exciting as something to do rather than something to talk about. Throughout the text, we provide exercises in Use Your Knowledge sections that ask the students to perform some relatively simple tasks that reinforce the material just presented. Other exercises are particularly suited to being worked on and discussed within a classroom setting.
5. Use technology for developing concepts and analyzing data. Technology has altered statistical practice in a fundamental way. In the past, some of the calculations that we performed were particularly difficult and tedious. In other words, they were not fun. Today, freed from the burden of computation by software, we can concentrate our efforts on the big picture: what questions are we trying to address with a study and what can we conclude from our analysis?
6. Use assessments to improve and evaluate student learning. Our goal for students who complete a course based on IPS is that they are able to design and carry out a statistical study for a project in their capstone course or other setting. Our exercises are oriented toward this goal. Many ask about the design of a statistical study and the collection of data. Others ask for a paragraph summarizing the results of an analysis. This recommendation includes the use of projects, oral presentations, article critiques, and written reports. We believe that students using this text will be well prepared to undertake these kinds of activities. Furthermore, we view these activities not only as assessments but also as valuable tools for learning statistics.
Teaching Recommendations We have used IPS in courses taught to a variety of student audiences. For general undergraduates from mixed disciplines, we recommend covering Chapters 1 through 8 and Chapters 9, 10, or 12. For a quantitatively strong audience—sophomores planning to major in actuarial science or statistics—we recommend moving more quickly. Add Chapters 10 and 11 to the core material in Chapters 1 through 8. In general, we recommend deemphasizing the material on probability because these students will take a probability course later in their program. For beginning graduate students in such fields as education, family studies, and retailing, we recommend that the students read the entire text (Chapters 11 and 13 lightly), again with reduced emphasis on Chapter 4 and some parts of Chapter 5. In all cases, beginning with data analysis and data production (Part I) helps students overcome their fear of statistics and builds a sound base for studying inference. We believe that IPS can easily be adapted to a wide variety of audiences.
The Ninth Edition: What’s New? Chapter 1 now begins with a short section giving an overview of data. “Toward Statistical Inference” (previously Section 3.3), which introduces the concepts of statistical inference and sampling distributions, has been moved to Section 5.1 to better assist with the transition from a single data set to sampling distributions. Coverage of mosaic plots as a visual tool for relationships between two categorical variables has been added to Chapters 2 and 9. Chapter 3 now begins with a short section giving a basic overview of data sources. Coverage of equivalence testing has been added to Chapter 7. There is a greater emphasis on sample size determination using software in Chapters 7 and 8. Resampling and bootstrapping are now introduced in Chapter 7 rather than Chapter 6. “Inference for Categorical Data” is the new title for Chapter 9, which includes goodness of fit as well as inference for two-way tables. There are more JMP screenshots and updated screenshots of Minitab, Excel, and SPSS outputs. Design A new design incorporates colorful, revised figures throughout to aid the students’ understanding of text material. Photographs related to chapter examples and exercises make connections to real-life applications and provide a visual context for topics. More figures with software output have been included. Exercises and Examples More than 30% of the exercises are new or revised, and there are more than 1700 exercises total. Exercise sets have been added at the end of sections in Chapters 9 through 12. To maintain the attractiveness of the examples to students, we have replaced or updated a large number of them. More than 30% of the 430 examples are new or revised. A list of exercises and examples categorized by application area is provided on the inside of the front cover.
In addition to the new ninth edition enhancements, IPS has retained the successful pedagogical features from previous editions:
Look Back At key points in the text, Look Back margin notes direct the reader to the first explanation of a topic, providing page numbers for easy reference.
Caution Warnings in the text, signaled by a caution icon, help students avoid common errors and misconceptions.
Challenge Exercises More challenging exercises are signaled with an icon. Challenge exercises are varied: some are mathematical, some require open-ended investigation, and others require deeper thought about the basic concepts.
Applets Applet icons are used throughout the text to signal where related interactive statistical applets can be found on the IPS website and in LaunchPad. Use Your Knowledge Exercises We have found these exercises to be a very useful learning tool. They appear throughout each section and are listed, with page numbers, before the section-ending exercises. Technology output screenshots Most statistical analyses rely heavily on statistical software. In this book, we discuss the use of Excel 2013, JMP 12, Minitab 17, SPSS 23, CrunchIt, R, and a TI-83/-84 calculator for conducting statistical analysis. As specialized statistical packages, JMP, Minitab, and SPSS are the most popular software choices both in industry and in colleges and schools of business. R is an extremely powerful statistical environment that is free to anyone; it relies heavily on members of the academic and general statistical communities for support. As an all-purpose spreadsheet program, Excel provides a limited set of statistical analysis options in comparison. However, given its pervasiveness and wide acceptance in industry and the computer world at large, we believe it is important to give Excel proper attention. It should be noted that for users who want more statistical capabilities but want to work in an Excel environment, there are a number of commercially available add-on packages (if you have JMP, for instance, it can be invoked from within Excel). Finally, instructions are provided for the TI-83/-84 calculators.
Even though basic guidance is provided in the book, it should be emphasized that IPS is not bound to any of these programs. Computer output from statistical packages is very similar, so you can feel quite comfortable using any one these packages.
Acknowledgments We are pleased that the first eight editions of Introduction to the Practice of Statistics have helped to move the teaching of introductory statistics in a direction supported by most statisticians. We are grateful to the many colleagues and students who have provided helpful comments, and we hope that they will find this new edition another step forward. In particular, we would like to thank the following colleagues who offered specific comments on the new edition: Ali Arab, Georgetown University Tessema Astatkie, Dalhousie University Fouzia Baki, McMaster University Lynda Ballou, New Mexico Institute of Mining and Technology Sanjib Basu, Northern Illinois University David Bosworth, Hutchinson Community College
Max Buot, Xavier University Nadjib Bouzar, University of Indianapolis Matt Carlton, California Polytechnic State University–San Luis Obispo Gustavo Cepparo, Austin Community College Pinyuen Chen, Syracuse University Dennis L. Clason, University of Cincinnati–Blue Ash College Tadd Colver, Purdue University Chris Edwards, University of Wisconsin–Oshkosh Irina Gaynanova, Texas A&M University Brian T. Gill, Seattle Pacific University Mary Gray, American University Gary E. Haefner, University of Cincinnati Susan Herring, Sonoma State University Lifang Hsu, Le Moyne College Tiffany Kolba, Valparaiso University Lia Liu, University of Illinois at Chicago Xuewen Lu, University of Calgary Antoinette Marquard, Cleveland State University Frederick G. Schmitt, College of Marin James D. Stamey, Baylor University Engin Sungur, University of Minnesota–Morris Anatoliy Swishchuk, University of Calgary Richard Tardanico, Florida International University Melanee Thomas, University of Calgary Terri Torres, Oregon Institute of Technology Mahbobeh Vezvaei, Kent State University Yishi Wang, University of North Carolina–Wilmington John Ward, Jefferson Community and Technical College Debra Wiens, Rocky Mountain College Victor Williams, Paine College Christopher Wilson, Butler University Anne Yust, Birmingham-Southern College Biao Zhang, The University of Toledo Michael L. Zwilling, University of Mount Union
The professionals at Macmillan, in particular, Terri Ward, Karen Carson, Jorge Amaral, Emily Tenenbaum, Ed Dionne, Blake Logan, and Susan Wein, have contributed greatly to the success of IPS. In addition, we would like to thank Tadd Colver at Purdue University for his valuable contributions to the ninth edition, including authoring the back-of-book answers, solutions, and Instructor’s Guide. We’d also like to thank Monica Jackson at American University for accuracy reviewing the back-of-book answers and solutions and for authoring the test bank. Thanks also to Michael Zwilling at University of Mount Union for accuracy reviewing the test bank, Christopher Edwards at University of Wisconsin Oshkosh for authoring the lecture slides, and James Stamey at Baylor University for authoring the Clicker slides.
Most of all, we are grateful to the many friends and collaborators whose data and research questions have enabled us to gain a deeper understanding of the science of data. Finally, we would like to acknowledge the contributions of John W. Tukey, whose contributions to data analysis have had such a great influence on us as well as a whole generation of applied statisticians.
Media and Supplements