Probability, Sampling Distributions, And Confidence Intervals
Social Statistics for a Diverse Society
Eighth Edition
2
3
Social Statistics for a Diverse Society
Eighth Edition
Chava Frankfort-Nachmias University of Wisconsin Anna Leon-Guerrero
Pacific Lutheran University
4
FOR INFORMATION:
SAGE Publications, Inc.
2455 Teller Road
Thousand Oaks, California 91320
E-mail: order@sagepub.com
SAGE Publications Ltd.
1 Oliver’s Yard
55 City Road
London, EC1Y 1SP
United Kingdom
SAGE Publications India Pvt. Ltd.
B 1/I 1 Mohan Cooperative Industrial Area
Mathura Road, New Delhi 110 044
India
SAGE Publications Asia-Pacific Pte. Ltd.
3 Church Street
#10-04 Samsung Hub
Singapore 049483
Copyright © 2018 by SAGE Publications, Inc.
All rights reserved. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher.
All trademarks depicted within this book, including trademarks appearing as part of a screenshot, figure, or other image are included solely for the purpose of illustration and are the property of their respective holders. The use of the trademarks in no way indicates any relationship with, or endorsement by, the holders of said trademarks. SPSS is a registered trademark of International Business Machines Corporation.
5
Printed in the United States of America
Library of Congress Cataloging-in-Publication Data
Names: Frankfort-Nachmias, Chava, author. | Leon-Guerrero, Anna, author.
Title: Social statistics for a diverse society / Chava Frankfort-Nachmias, University of Wisconsin, Anna Leon-Guerrero, Pacific Lutheran University.
Description: Eighth edition. | Los Angeles : SAGE, [2016] | Includes bibliographical references and index.
Identifiers: LCCN 2016039109 | ISBN 978-1-5063-4720-2 (pbk. : alk. paper)
Subjects: LCSH: Social sciences—Statistical methods. | Statistics.
Classification: LCC HA29 .N25 2016 | DDC 519.5—dc23 LC record available at https://lccn.loc.gov/2016039109
This book is printed on acid-free paper.
Acquisitions Editor: Jeff Lasser
Development Editor: Jessica Miller
Editorial Assistant: Adeline Wilson
eLearning Editor: Gabrielle Piccininni
Production Editor: Kelly DeRosa
Copy Editor: QuADS Prepress (P) Ltd.
Typesetter: C&M Digitals (P) Ltd.
Proofreader: Jennifer Grubba
Indexer: Sheila Bodell
Cover Designer: Candice Harman
Marketing Manager: Kara Kindstrom
6
https://lccn.loc.gov/2016039109
Brief Contents
Preface About the Authors CHAPTER 1 • The What and the Why of Statistics CHAPTER 2 • The Organization and Graphic Presentation of Data CHAPTER 3 • Measures of Central Tendency CHAPTER 4 • Measures of Variability CHAPTER 5 • The Normal Distribution CHAPTER 6 • Sampling and Sampling Distributions CHAPTER 7 • Estimation CHAPTER 8 • Testing Hypotheses CHAPTER 9 • Bivariate Tables CHAPTER 10 • The Chi-Square Test and Measures of Association CHAPTER 11 • Analysis of Variance CHAPTER 12 • Regression and Correlation Appendix A. Table of Random Numbers Appendix B. The Standard Normal Table Appendix C. Distribution of t Appendix D. Distribution of Chi-Square Appendix E. Distribution of F Appendix F. A Basic Math Review Learning Check Solutions Answers to Odd-Numbered Exercises Glossary Notes Index
7
Detailed Contents
Preface About the Authors CHAPTER 1 • The What and the Why of Statistics
The Research Process Asking Research Questions The Role of Theory Formulating the Hypotheses
Independent and Dependent Variables: Causality Independent and Dependent Variables: Guidelines
Collecting Data Levels of Measurement
Nominal Level of Measurement Ordinal Level of Measurement Interval-Ratio Level of Measurement Cumulative Property of Levels of Measurement Levels of Measurement of Dichotomous Variables
Discrete and Continuous Variables A Closer Look 1.1: A Cautionary Note: Measurement Error
Analyzing Data and Evaluating the Hypotheses Descriptive and Inferential Statistics Evaluating the Hypotheses
Examining a Diverse Society A Closer Look 1.2: A Tale of Simple Arithmetic: How Culture May Influence How We Count
Learning Statistics Data at Work
CHAPTER 2 • The Organization and Graphic Presentation of Data Frequency Distributions Proportions and Percentages Percentage Distributions The Construction of Frequency Distributions
Frequency Distributions for Nominal Variables Frequency Distributions for Ordinal Variables Frequency Distributions for Interval-Ratio Variables
Cumulative Distributions A Closer Look 2.1: Real Limits, Stated Limits, and Midpoints of Class Intervals
Rates Reading the Research Literature: Access to Public Benefits
8
Graphic Presentation of Data The Pie Chart The Bar Graph The Histogram The Statistical Map The Line Graph The Time-Series Chart Statistics in Practice: Foreign-Born Population 65 Years and Over
A Closer Look 2.2: A Cautionary Note: Distortions in Graphs Data at Work: Kurt Taylor Gaubatz: Graduate Program in International Studies
CHAPTER 3 • Measures of Central Tendency The Mode The Median
Finding the Median in Sorted Data An Odd Number of Cases An Even Number of Cases
Finding the Median in Frequency Distributions Locating Percentiles in a Frequency Distribution
The Mean A Closer Look 3.1: Finding the Mean in a Frequency Distribution Understanding Some Important Properties of the Arithmetic Mean
Interval-Ratio Level of Measurement Center of Gravity Sensitivity to Extremes
Reading the Research Literature: The Case of Reporting Income Statistics in Practice: The Shape of the Distribution
The Symmetrical Distribution The Positively Skewed Distribution The Negatively Skewed Distribution Guidelines for Identifying the Shape of a Distribution A Closer Look 3.2: A Cautionary Note: Representing Income
Considerations for Choosing a Measure of Central Tendency Level of Measurement Skewed Distribution Data at Work: Ben Anderstone: Political Consultant Symmetrical Distribution
CHAPTER 4 • Measures of Variability The Importance of Measuring Variability The Index of Qualitative Variation
Steps for Calculating the IQV Expressing the IQV as a Percentage
9
Statistics in Practice: Diversity in U.S. Society The Range The Interquartile Range The Box Plot The Variance and the Standard Deviation
Calculating the Deviation From the Mean Calculating the Variance and the Standard Deviation
Considerations for Choosing a Measure of Variation A Closer Look 4.1: More on Interpreting the Standard Deviation
Reading the Research Literature: Community College Mentoring Data at Work: Sruthi Chandrasekaran: Senior Research Associate
CHAPTER 5 • The Normal Distribution Properties of the Normal Distribution
Empirical Distributions Approximating the Normal Distribution Areas Under the Normal Curve Interpreting the Standard Deviation
An Application of the Normal Curve Transforming a Raw Score Into a Z Score
The Standard Normal Distribution The Standard Normal Table
1. Finding the Area Between the Mean and a Positive or Negative Z Score 2. Finding the Area Above a Positive Z Score or Below a Negative Z Score 3. Transforming Proportions and Percentages Into Z Scores
Finding a Z Score Which Bounds an Area Above It Finding a Z Score Which Bounds an Area Below It
4. Working With Percentiles in a Normal Distribution Finding the Percentile Rank of a Score Higher Than the Mean Finding the Percentile Rank of a Score Lower Than the Mean Finding the Raw Score Associated With a Percentile Higher Than 50 Finding the Raw Score Associated With a Percentile Lower Than 50
Reading the Research Literature: Child Health and Academic Achievement A Closer Look 5.1: Percentages, Proportions, and Probabilities Data at Work: Claire Wulf Winiarek: Director of Collaborative Policy Engagement
CHAPTER 6 • Sampling and Sampling Distributions Aims of Sampling Basic Probability Principles Probability Sampling
10
The Simple Random Sample The Systematic Random Sample The Stratified Random Sample
The Concept of the Sampling Distribution The Population A Closer Look 6.1: Disproportionate Stratified Samples and Diversity The Sample The Dilemma The Sampling Distribution
The Sampling Distribution of the Mean An Illustration Review The Mean of the Sampling Distribution The Standard Error of the Mean
The Central Limit Theorem The Size of the Sample The Significance of the Sampling Distribution and the Central Limit Theorem
Statistics in Practice: A Sampling Lesson Data at Work: Emily Treichler: Postdoctoral Fellow
CHAPTER 7 • Estimation Point and Interval Estimation Confidence Intervals for Means
A Closer Look 7.1: Estimation as a Type of Inference Determining the Confidence Interval
Calculating the Standard Error of the Mean Deciding on the Level of Confidence and Finding the Corresponding Z Value Calculating the Confidence Interval Interpreting the Results
Reducing Risk Estimating Sigma
Calculating the Estimated Standard Error of the Mean Deciding on the Level of Confidence and Finding the Corresponding Z Value Calculating the Confidence Interval Interpreting the Results
Sample Size and Confidence Intervals Statistics in Practice: Hispanic Migration and Earnings
A Closer Look 7.2: What Affects Confidence Interval Width? Summary Confidence Intervals for Proportions
Determining the Confidence Interval
11
Calculating the Estimated Standard Error of the Proportion Deciding on the Desired Level of Confidence and Finding the Corresponding Z Value Calculating the Confidence Interval Interpreting the Results
Reading the Research Literature: Women Victims of Intimate Violence Data at Work: Laurel Person Mecca: Research Specialist
CHAPTER 8 • Testing Hypotheses Assumptions of Statistical Hypothesis Testing Stating the Research and Null Hypotheses
The Research Hypothesis (H1) The Null Hypothesis (H0)
Probability Values and Alpha A Closer Look 8.1: More About Significance
The Five Steps in Hypothesis Testing: A Summary Errors in Hypothesis Testing
The t Statistic and Estimating the Standard Error The t Distribution and Degrees of Freedom Comparing the t and Z Statistics
Hypothesis Testing With One Sample and Population Variance Unknown Hypothesis Testing With Two Sample Means
The Assumption of Independent Samples Stating the Research and Null Hypotheses
The Sampling Distribution of the Difference Between Means Estimating the Standard Error Calculating the Estimated Standard Error The t Statistic Calculating the Degrees of Freedom for a Difference Between Means Test
The Five Steps in Hypothesis Testing About Difference Between Means: A Summary
A Closer Look 8.2: Calculating the Estimated Standard Error and the Degrees of Freedom (df) When the Population Variances Are Assumed to Be Unequal
Statistics in Practice: Cigarette Use Among Teens Hypothesis Testing With Two Sample Proportions Reading the Research Literature: Reporting the Results of Hypothesis Testing
Data at Work: Stephanie Wood: Campus Visit Coordinator CHAPTER 9 • Bivariate Tables
How to Construct a Bivariate Table How to Compute Percentages in a Bivariate Table
Calculating Percentages Within Each Category of the Independent
12
Variable Comparing the Percentages Across Different Categories of the Independent Variable
Reading the Research Literature: Hispanic and Non-Hispanic Homeless Populations
A Closer Look 9.1: How to Deal With Ambiguous Relationships Between Variables
The Properties of a Bivariate Relationship The Existence of the Relationship The Strength of the Relationship The Direction of the Relationship
Elaboration Testing for Nonspuriousness: Firefighters and Property Damage An Intervening Relationship: Religion and Attitude Toward Abortion Conditional Relationships: More on Abortion The Limitations of Elaboration
Reading the Research Literature: The Digital Divide Data at Work: Spencer Westby: Senior Editorial Analyst
CHAPTER 10 • The Chi-Square Test and Measures of Association The Concept of Chi-Square as a Statistical Test The Concept of Statistical Independence The Structure of Hypothesis Testing With Chi-Square
The Assumptions Stating the Research and the Null Hypotheses The Concept of Expected Frequencies Calculating the Expected Frequencies Calculating the Obtained Chi-Square The Sampling Distribution of Chi-Square Determining the Degrees of Freedom Making a Final Decision Review
Statistics in Practice: Respondent and Father Education A Closer Look 10.1: A Cautionary Note: Sample Size and Statistical Significance for Chi-Square
Proportional Reduction of Error A Closer Look 10.2: What Is Strong? What Is Weak? A Guide to Interpretation
Lambda: A Measure of Association for Nominal Variables Cramer’s V: A Chi-Square–Related Measure of Association for Nominal Variables Gamma and Kendall’s Tau-b: Symmetrical Measures of Association for Ordinal Variables
13
Reading the Research Literature: India’s Internet-Using Population Data at Work: Patricio Cumsille: Professor
CHAPTER 11 • Analysis of Variance Understanding Analysis of Variance The Structure of Hypothesis Testing With ANOVA
The Assumptions Stating the Research and the Null Hypotheses and Setting Alpha The Concepts of Between and Within Total Variance The F Statistic A Closer Look 11.1: Decomposition of SST Making a Decision
The Five Steps in Hypothesis Testing: A Summary Statistics in Practice: The Ethical Consumer
A Closer Look 11.2: Assessing the Relationship Between Variables Reading the Research Literature: Emerging Adulthood
Data at Work: Kevin Hemminger: Sales Support Manager/Graduate Program in Research Methods and Statistics
CHAPTER 12 • Regression and Correlation The Scatter Diagram Linear Relationships and Prediction Rules
Finding the Best-Fitting Line A Closer Look 12.1: Other Regression Techniques
Defining Error The Residual Sum of Squares (∑e2) The Least Squares Line
Computing a and b A Closer Look 12.2: Understanding the Covariance Interpreting a and b
A negative relationship: Age and Internet Hours per Week Methods for Assessing the Accuracy of Predictions
Calculating Prediction Errors Calculating r2
Testing the Significance of r2 Using ANOVA Making a Decision Pearson’s Correlation Coefficient (r)
Characteristics of Pearson’s r Statistics in Practice: Multiple Regression
A Closer Look 12.3: Spurious Correlations and Confounding Effects ANOVA for Multiple Linear Regression Reading the Research Literature: Academic Intentions and Support
Data at Work: Shinichi Mizokami: Professor Appendix A. Table of Random Numbers
14
Appendix B. The Standard Normal Table Appendix C. Distribution of t Appendix D. Distribution of Chi-Square Appendix E. Distribution of F Appendix F. A Basic Math Review Learning Check Solutions Answers to Odd-Numbered Exercises Glossary Notes Index
15
Preface
You may be reading this introduction on your first day of class. We know you have some questions and concerns about what your course will be like. Math, formulas, and calculations? Yes, those will be part of your learning experience. But there is more.
Throughout our text we highlight the relevance of statistics in our daily and professional lives. Data are used to predict public opinion, consumer spending, and even a presidential election. How Americans feel about a variety of political and social topics—race relations, gun control, immigration, the economy, health care reform, or terrorism—are measured by surveys and polls and reported daily by the news media. Your recent Amazon purchase didn’t go unnoticed. The study of consumer trends, specifically focusing on young adults, helps determine commercial programming, product advertising and placement, and, ultimately, consumer spending. And as we prepare this text, just months before the 2016 Presidential election, weekly polls have begun predicting the historic election between Hillary Clinton and Donald Trump.
Statistics are not just a part of our lives in the form of news bits or information. And it isn’t just numbers either. As social scientists we rely on statistics to help us understand our social world. We use statistical methods and techniques to track demographic trends, to assess social differences, and to better inform social policy. We encourage you to move beyond just being a consumer of statistics and determine how you can use statistics to gain insight into important social issues that affect you and others.
16
Teaching and Learning Goals
Three teaching and learning goals continue to be the guiding principles of our book, as they were in previous editions.
Our first goal is to introduce you to social statistics and demonstrate its value. Although most of you will not use statistics in your own student research, you will be expected to read and interpret statistical information presented by others in professional and scholarly publications, in the workplace, and in the popular media. This book will help you understand the concepts behind the statistics so that you will be able to assess the circumstances in which certain statistics should and should not be used.
A special characteristic of this book is its integration of statistical techniques with substantive issues of particular relevance in the social sciences. Our second goal is to demonstrate that substance and statistical techniques are truly related in social science research. Your learning will not be limited to statistical calculations and formulas. Rather, you will become proficient in statistical techniques while learning about social differences and inequality through numerous substantive examples and real-world data applications. Because the world we live in is characterized by a growing diversity—where personal and social realities are increasingly shaped by race, class, gender, and other categories of experience—this book teaches you basic statistics while incorporating social science research related to the dynamic interplay of our social worlds.
Our third goal is to enhance your learning by using straightforward prose to explain statistical concepts and by emphasizing intuition, logic, and common sense over rote memorization and derivation of formulas.
17
Distinctive and Updated Features of Our Book
Our learning goals are accomplished through a variety of specific and distinctive features throughout this book.
A Close Link Between the Practice of Statistics, Important Social Issues, and Real-World Examples.
This book is distinct for its integration of statistical techniques with pressing social issues of particular concern to society and social science. We emphasize how the conduct of social science is the constant interplay between social concerns and methods of inquiry. In addition, the examples throughout the book—mostly taken from news stories, government reports, public opinion polls, scholarly research, and the National Opinion Research Center’s General Social Survey—are formulated to emphasize to students like you that we live in a world in which statistical arguments are common. Statistical concepts and procedures are illustrated with real data and research, providing a clear sense of how questions about important social issues can be studied with various statistical techniques.
A Focus on Diversity: The United States and International.
A strong emphasis on race, class, and gender as central substantive concepts is mindful of a trend in the social sciences toward integrating issues of diversity in the curriculum. This focus on the richness of social differences within our society and our global neighbors is manifested in the application of statistical tools to examine how race, class, gender, and other categories of experience shape our social world and explain social behavior.
Chapter Reorganization and Content.
Each revision presents many opportunities to polish and expand the content of our text. In this edition, we have made a number of changes in response to feedback from reviewers and fellow instructors. We merged frequency distributions and graphic presentation into one chapter. We expanded the discussion of probability in Chapters 6 and 7. We refined the discussion on the interpretation and application of descriptive statistics (variance and standard deviation) and inferential tests (t, Z, F ratio, and regression and correlation). End- of-chapter exercises have been organized into calculation and interpretation problems.
Reading the Research Literature, Statistics in Practice, A Closer Look, and Data at Work.
In your student career and in the workplace, you may be expected to read and interpret statistical information presented by others in professional and scholarly publications. These statistical analyses are a good deal more complex than most class and textbook
18
presentations. To guide you in reading and interpreting research reports written by social scientists, most of our chapters include a Reading the Research Literature and a Statistics in Practice feature, presenting excerpts of published research reports or specific SPSS calculations using the statistical concepts under discussion. Being statistically literate involves more than just completing a calculation; it also includes learning how to apply and interpret statistical information and being able to say what it means. We include A Closer Look discussion in each chapter, advising students about the common errors and limitations in quantitative data collection and analysis. A new chapter feature for this eighth edition is Data at Work, profiling men and women who use data in their work settings and professions.
SPSS and GSS 2014.
IBM® SPSS® Statistics* is used throughout this book, although the use of computers is not required to learn from the text. Real data are used to motivate and make concrete the coverage of statistical topics. As a companion to the eighth edition’s SPSS demonstrations and exercises, we provide two GSS 2014 data sets on the study site at http://edge.sagepub.com/frankfort8e. SPSS exercises at the end of each chapter rely on variables from both data modules. There is ample opportunity for instructors to develop their own exercises using these data.
*SPSS is a registered trademark of International Business Machines Corporation.
Tools to Promote Effective Study.
Each chapter concludes with a list of Main Points and Key Terms discussed in that chapter. Boxed definitions of the Key Terms also appear in the body of the chapter, as do Learning Checks keyed to the most important points. Key Terms are also clearly defined and explained in the Glossary, another special feature in our book. Answers to all the Odd- Numbered Exercises and Learning Checks in the text are included at the end of the book, as well as on the study site at http://edge.sagepub.com/frankfort8e. Complete step-by- step solutions are provided in the instructor’s manual, available on the study site.
A Note About Rounding
Throughout this text and in ancillary materials, we followed these rounding rules: If the number you are rounding is followed by 5, 6, 7, 8, or 9, round the number up. If the number you are rounding is followed by 0, 1, 2, 3, or 4, do not change the number. For rounding long decimals, look only at the number in the place you are rounding to and the number that follows it.
edge.sagepub.com/frankfort8e
SAGE edge offers a robust online environment featuring an impressive array of tools and
19
http://edge.sagepub.com/frankfort8e
http://edge.sagepub.com/frankfort8e
http://edge.sagepub.com/frankfort8e
resources for review, study, and further exploration, keeping both instructors and students on the cutting edge of teaching and learning. SAGE edge content is open access and available on demand. Learning and teaching has never been easier!
SAGE edge for students provides a personalized approach to help students accomplish their coursework goals in an easy-to-use learning environment.
Mobile-friendly eFlashcards strengthen understanding of key terms and concepts. Mobile-friendly practice quizzes allow for independent assessment by students of their mastery of course material. A customized online action plan includes tips and feedback on progress through the course and materials, which allows students to individualize their learning experience. Web exercises and meaningful web links facilitate student use of Internet resources, further exploration of topics, and responses to critical thinking questions. EXCLUSIVE! Access to full-text SAGE journal articles that have been carefully selected to support and expand on the concepts presented in each chapter. Access to GSS 2014 data sets.
SAGE edge for instructors supports teaching by making it easy to integrate quality content and create a rich learning environment for students.
Test banks provide a diverse range of pre-written options as well as the opportunity to edit any question and/or insert personalized questions to effectively assess students’ progress and understanding. Sample syllabus provides a suggested model for instructors to use when creating the syllabi for their courses. Editable, chapter-specific PowerPoint® slides offer complete flexibility for creating a multimedia presentation for the course. EXCLUSIVE! Access to full-text SAGE journal articles have been carefully selected to support and expand on the concepts presented in each chapter to encourage students to think critically. Multimedia content includes web resources and web exercises that appeal to students with different learning styles. Lecture notes summarize key concepts by chapter to ease preparation for lectures and class discussions. Lively and stimulating ideas for class activities that can be used in class to reinforce active learning. Chapter-specific discussion questions help launch classroom interaction by prompting students to engage with the material and by reinforcing important content. A course cartridge provides easy LMS (Learning Management System) integration.
20
Acknowledgments
We are both grateful to Jerry Westby, Series Editor for SAGE Publications, for his commitment to our book and for his invaluable assistance through the production process.
Many manuscript reviewers recruited by SAGE provided invaluable feedback. For their thoughtful comments to the eighth edition, we thank
Andrew S. Fullerton, Oklahoma State University David A. Gay, University of Central Florida Dr. Lindsey Peterson, Mississippi State University Heather Macpherson Parrott, Long Island University-Post Christopher Donoghue, Montclair State University S. Michael Gaddis, The Pennsylvania State University Jann W. MacInnes, University of Florida Laura Sullivan, Brandeis University Warren Waren, Texas A&M University Joe Weinberg, University of Southern Mississippi
For their comments to the seventh edition, we thank
Walter F. Carroll, Bridgewater State University Andrew S. Fullerton, Oklahoma State University David A. Gay, University of Central Florida Judith G. Gonyea, Boston University Megan Henly, University of New Hampshire Patricia A. Jaramillo, The University of Texas at San Antonio Brett Lehman, Louisiana State University James W. Love, California State University, Fullerton Kay Kei-Ho Pih, California State University, Northridge
For their comments to the sixth edition, we thank
Diane Balduzy, Massachusetts College of Liberal Arts Ellen Berg, California State University–Sacramento Robert Carini, University of Louisville Melissa Evans-Andris, University of Louisville Meredith Greif, Georgia State University Kristen Kenneavy, Ramapo College Dave Rausch, West Texas A&M University Billy Wagner, California State University–Channel Islands Kevin Yoder, University of North Texas
21
For their comments to the fifth edition, we thank
Anna A. Amirkhanyan, The American University Robert Carini, University of Louisville Patricia Case, University of Toledo Stanley DeViney, University of Maryland Eastern Shore David Gay, University of Central Florida Dusten R. Hollist, University of Montana Ross Koppel, University of Pennsylvania Benny Marcus, Temple University Matt G. Mutchler, California State University Dominguez Hills Mahasin C. Owens-Sabir, Jackson State University Dave Rausch, West Texas A&M University Kevin Yoder, University of North Texas
We are grateful to Jessica Miller and Kelly DeRosa for guiding the book through the production process. We would also like to acknowledge Laura Kirkhuff, Krishna Pradeep Joghee, and the rest of the SAGE staff for their assistance on this edition.
We extend our deepest appreciation to Michael Clark for his fine editing and data work. Among his many contributions, Michael would relate our revision goals to his student experience, reminding us of how students can learn and successfully master this material.
Chava Frankfort-Nachmias would like to thank and acknowledge her friends and colleagues for their unending support; she also would like to thank her students:
I am grateful to my students at the University of Wisconsin–Milwaukee, who taught me that even the most complex statistical ideas can be simplified. The ideas presented in this book are the products of many years of classroom testing. I thank my students for their patience and contributions.
Finally, I thank my partner, Marlene Stern, for her love and support.
Anna Leon-Guerrero would like to thank her Pacific Lutheran University students for inspiring her to be a better teacher. My love and thanks to my husband, Brian Sullivan.
Chava Frankfort-Nachmias
University of Wisconsin–Milwaukee
Anna Leon-Guerrero
Pacific Lutheran University
22
About the Authors
Chava Frankfort-Nachmias is an Emeritus Professor of Sociology at the University of Wisconsin–Milwaukee. She is the coauthor of Research Methods in the Social Sciences (with David Nachmias), coeditor of Sappho in the Holy Land (with Erella Shadmi), and numerous publications on ethnicity and development, urban revitalization, science and gender, and women in Israel. She was the recipient of the University of Wisconsin System teaching improvement grant on integrating race, ethnicity, and gender into the social statistics and research methods curriculum. She is also the coauthor (with Anna Leon-Guerrero) of Essentials of Social Statistics.
Anna Leon-Guerrero is Professor of Sociology at Pacific Lutheran University in Washington. She received her Ph.D. in sociology from the University of California–Los Angeles. A recipient of the university’s Faculty Excellence Award and the K.T. Tang Award for Excellence in Research, she teaches courses in statistics, social theory, and social problems. She is also the author of Social Problems: Community, Policy, and Social Action.
23
1 The What and the Why of Statistics
24
Chapter Learning Objectives 1. Describe the five stages of the research process 2. Define independent and dependent variables 3. Distinguish between the three levels of measurement 4. Apply descriptive and inferential statistical procedures
Are you taking statistics because it is required in your major—not because you find it interesting? If so, you may be feeling intimidated because you associate statistics with numbers, formulas, and abstract notations that seem inaccessible and complicated. Perhaps you feel intimidated not only because you’re uncomfortable with math but also because you suspect that numbers and math don’t leave room for human judgment or have any relevance to your own personal experience. In fact, you may even question the relevance of statistics to understanding people, social behavior, or society.
In this book, we will show you that statistics can be a lot more interesting and easy to understand than you may have been led to believe. In fact, as we draw on your previous knowledge and experience and relate statistics to interesting and important social issues, you’ll begin to see that statistics is not just a course you have to take but a useful tool as well.
There are two reasons why learning statistics may be of value to you. First, you are constantly exposed to statistics every day of your life. Marketing surveys, voting polls, and social research findings appear daily in the news media. By learning statistics, you will become a sharper consumer of statistical material. Second, as a major in the social sciences, you may be expected to read and interpret statistical information related to your occupation or work. Even if conducting research is not a part of your work, you may still be expected to understand and learn from other people’s research or to be able to write reports based on statistical analyses.
Just what is statistics anyway? You may associate the word with numbers that indicate birthrates, conviction rates, per capita income, marriage and divorce rates, and so on. But the word statistics also refers to a set of procedures used by social scientists to organize, summarize, and communicate numerical information. Only information represented by numbers can be the subject of statistical analysis. Such information is called data; researchers use statistical procedures to analyze data to answer research questions and test theories. It is the latter usage—answering research questions and testing theories—that this textbook explores.
Statistics A set of procedures used by social scientists to organize, summarize, and communicate numerical information.
25
Data Information represented by numbers, which can be the subject of statistical analysis.
26
The Research Process
To give you a better idea of the role of statistics in social research, let’s start by looking at the research process. We can think of the research process as a set of activities in which social scientists engage so that they can answer questions, examine ideas, or test theories.
Research process A set of activities in which social scientists engage to answer questions, examine ideas, or test theories.
As illustrated in Figure 1.1, the research process consists of five stages:
1. Asking the research question 2. Formulating the hypotheses 3. Collecting data 4. Analyzing data 5. Evaluating the hypotheses
Each stage affects the theory and is affected by it as well. Statistics is most closely tied to the data analysis stage of the research process. As we will see in later chapters, statistical analysis of the data helps researchers test the validity and accuracy of their hypotheses.
27
Asking Research Questions
The starting point for most research is asking a research question. Consider the following research questions taken from a number of social science journals:
How will the Affordable Care Act influence the quality of health care? Has support for same-sex marriage increased during the past decade? Does race or ethnicity predict voting behavior? What factors affect the economic mobility of female workers?
Figure 1.1 The Research Process
These are all questions that can be answered by conducting empirical research—research based on information that can be verified by using our direct experience. To answer research questions, we cannot rely on reasoning, speculation, moral judgment, or subjective preference. For example, the questions “Is racial equality good for society?” and “Is an urban lifestyle better than a rural lifestyle?” cannot be answered empirically because the terms good and better are concerned with values, beliefs, or subjective preference and, therefore, cannot be independently verified. One way to study these questions is by defining good and better in terms that can be verified empirically. For example, we can define good in terms of economic growth and better in terms of psychological well-being. These questions could then be answered by conducting empirical research.
Empirical research Research based on evidence that can be verified by using our direct experience.
28
You may wonder how to come up with a research question. The first step is to pick a question that interests you. If you are not sure, look around! Ideas for research problems are all around you, from media sources to personal experience or your own intuition. Talk to other people, write down your own observations and ideas, or learn what other social scientists have written about.
Take, for instance, the relationship between gender and work. As a college student about to enter the labor force, you may wonder about the similarities and differences between women’s and men’s work experiences and about job opportunities when you graduate. Here are some facts and observations based on research reports: In 2015, women who were employed full time earned about $726 (in current dollars) per week on average; men who were employed full time earned $895 (in current dollars) per week on average.1 Women’s and men’s work are also very different. Women continue to be the minority in many of the higher ranking and higher salaried positions in professional and managerial occupations. For example, in 2014, women made up 25.3% of architects, 16.5% of civil engineers, 12.4% of police and sheriff’s patrol officers, and 2.4% of electricians. In comparison, among all those employed as preschool and kindergarten teachers, 98% were women. Among all receptionists and information clerks in 2014, 91% were women.2 Another noteworthy development in the history of labor in the United States took place in January 2010: Women outnumbered men for the first time in the labor force by holding 50.3% of all nonfarm payroll jobs.3 These observations may prompt us to ask research questions such as the following: How much change has there been in women’s work over time? Are women paid, on average, less than men for the same type of work?
Learning Check 1.1
Identify one or two social science questions amenable to empirical research. You can almost bet that you will be required to do a research project sometime in your college career.
29
The Role of Theory
You may have noticed that each preceding research question was expressed in terms of a relationship. This relationship may be between two or more attributes of individuals or groups, such as gender and income or gender segregation in the workplace and income disparity. The relationship between attributes or characteristics of individuals and groups lies at the heart of social scientific inquiry.
Most of us use the term theory quite casually to explain events and experiences in our daily life. You may have a theory about why your roommate has been so nice to you lately or why you didn’t do so well on your last exam. In a somewhat similar manner, social scientists attempt to explain the nature of social reality. Whereas our theories about events in our lives are commonsense explanations based on educated guesses and personal experience, to the social scientist, a theory is a more precise explanation that is frequently tested by conducting research.
A theory is a set of assumptions and propositions used by social scientists to explain, predict, and understand the phenomena they study.4 The theory attempts to establish a link between what we observe (the data) and our conceptual understanding of why certain phenomena are related to each other in a particular way.
Theory A set of assumptions and propositions used to explain, predict, and understand social phenomena.
For instance, suppose we wanted to understand the reasons for the income disparity between men and women; we may wonder whether the types of jobs men and women have and the organizations in which they work have something to do with their wages. One explanation for gender wage inequality is gender segregation in the workplace—the fact that American men and women are concentrated in different kinds of jobs and occupations. What is the significance of gender segregation in the workplace? In our society, people’s occupations and jobs are closely associated with their level of prestige, authority, and income. The jobs in which women and men are segregated are not only different but also unequal. Although the proportion of women in the labor force has markedly increased, women are still concentrated in occupations with low pay, low prestige, and few opportunities for promotion. Thus, gender segregation in the workplace is associated with unequal earnings, authority, and status. In particular, women’s segregation into different jobs and occupations from those of men is the most immediate cause of the pay gap. Women receive lower pay than men do even when they have the same level of education, skill, and experience as men in comparable occupations.
30
Formulating the Hypotheses
So far, we have come up with a number of research questions about the income disparity between men and women in the workplace. We have also discussed a possible explanation —a theory—that helps us make sense of gender inequality in wages. Is that enough? Where do we go from here?
Our next step is to test some of the ideas suggested by the gender segregation theory. But this theory, even if it sounds reasonable and logical to us, is too general and does not contain enough specific information to be tested. Instead, theories suggest specific concrete predictions or hypotheses about the way that observable attributes of people or groups are interrelated in real life. Hypotheses are tentative because they can be verified only after they have been tested empirically.5 For example, one hypothesis we can derive from the gender segregation theory is that wages in occupations in which the majority of workers are female are lower than the wages in occupations in which the majority of workers are male.
Hypothesis A statement predicting the relationship between two or more observable attributes.
Not all hypotheses are derived directly from theories. We can generate hypotheses in many ways—from theories, directly from observations, or from intuition. Probably, the greatest source of hypotheses is the professional or scholarly literature. A critical review of the scholarly literature will familiarize you with the current state of knowledge and with hypotheses that others have studied.
Let’s restate our hypothesis:
31
Wages in occupations in which the majority of workers are female are lower than the wages in occupations in which the majority of workers are male.
Note that this hypothesis is a statement of a relationship between two characteristics that vary: wages and gender composition of occupations. Such characteristics are called variables. A variable is a property of people or objects that takes on two or more values. For example, people can be classified into a number of social class categories, such as upper class, middle class, or working class. Family income is a variable; it can take on values from zero to hundreds of thousands of dollars or more. Similarly, gender composition is a variable. The percentage of females (or males) in an occupation can vary from 0 to 100. Wages is a variable, with values from zero to thousands of dollars or more. See Table 1.1 for examples of some variables and their possible values.
Variable A property of people or objects that takes on two or more values.
Social scientists must also select a unit of analysis; that is, they must select the object of their research. We often focus on individual characteristics or behavior, but we could also examine groups of people such as families, formal organizations like elementary schools or corporations, or social artifacts such as children’s books or advertisements. For example, we may be interested in the relationship between an individual’s educational degree and annual income. In this case, the unit of analysis is the individual. On the other hand, in a study of how corporation profits are associated with employee benefits, corporations are the unit of analysis. If we examine how often women are featured in prescription drug advertisements, the advertisements are the unit of analysis. Figure 1.2 illustrates different units of analysis frequently employed by social scientists.
Unit of analysis The object of research, such as individuals, groups, organizations, or social artifacts.
Learning Check 1.2
Remember that research question you came up with? Formulate a testable hypothesis based on your research question. Remember that your variables must take on two or more values and you must determine the unit of analysis. What is your unit of analysis?
Figure 1.2 Examples of Units of Analysis
32
33
Independent and Dependent Variables: Causality
Hypotheses are usually stated in terms of a relationship between an independent and a dependent variable. The distinction between an independent and a dependent variable is important in the language of research. Social theories often intend to provide an explanation for social patterns or causal relations between variables. For example, according to the gender segregation theory, gender segregation in the workplace is the primary explanation (although certainly not the only one) of the male-female earning gap. Why should jobs where the majority of workers are women pay less than jobs that employ mostly men? One explanation is that
societies undervalue the work women do, regardless of what those tasks are, because women do them. . . . For example, our culture tends to devalue caring or nurturant work at least partly because it is done by women. This tendency accounts for child care workers’ low rank in the pay hierarchy.6
In the language of research, the variable the researcher wants to explain (the “effect”) is called the dependent variable. The variable that is expected to “cause” or account for the dependent variable is called the independent variable. Therefore, in our example, gender composition of occupations is the independent variable, and wages is the dependent variable.
Dependent variable The variable to be explained (the effect).
Independent variable The variable expected to account for (the cause of) the dependent variable.
Cause-and-effect relationships between variables are not easy to infer in the social sciences. To establish that two variables are causally related, your analysis must meet three conditions: (1) The cause has to precede the effect in time, (2) there has to be an empirical relationship between the cause and the effect, and (3) this relationship cannot be explained by other factors.
Let’s consider the decades-old debate about controlling crime through the use of prevention versus punishment. Some people argue that special counseling for youths at the first sign of trouble and strict controls on access to firearms would help reduce crime. Others argue that overhauling federal and state sentencing laws to stop early prison releases is the solution. In the early 1990s, Washington and California adopted “three strikes and you’re out” legislation, imposing life prison terms on three-time felony offenders. Such laws are also referred to as habitual or persistent offender laws. Twenty-six other states and the federal government adopted similar measures, all advocating a “get tough” policy on crime; the most recent legislation was in 2012 in the state of Massachusetts. In 2012, California voters
34
supported a revision to the original law, imposing a life sentence only when the new felony conviction is serious or violent. Let’s suppose that years after the measure was introduced, the crime rate declined in some of these states (in fact, advocates of the measure have identified declining crime rates as evidence of its success). Does the observation that the incidence of crime declined mean that the new measure caused this reduction? Not necessarily! Perhaps the rate of crime had been going down for other reasons, such as improvement in the economy, and the new measure had nothing to do with it. To demonstrate a cause-and-effect relationship, we would need to show three things: (1) The reduction of crime actually occurred after the enactment of this measure, (2) the enactment of the “three strikes and you’re out” measure was empirically associated with a decrease in crime, and (3) the relationship between the reduction in crime and the “three strikes and you’re out” policy is not due to the influence of another variable (e.g., the improvement of overall economic conditions).
35
Independent and Dependent Variables: Guidelines
Because it is difficult to infer cause-and-effect relationships in the social sciences, be cautious about using the terms cause and effect when examining relationships between variables. However, using the terms independent variable and dependent variable is still appropriate even when this relationship is not articulated in terms of direct cause and effect. Here are a few guidelines that may help you identify the independent and dependent variables:
1. The dependent variable is always the property that you are trying to explain; it is always the object of the research.
2. The independent variable usually occurs earlier in time than the dependent variable. 3. The independent variable is often seen as influencing, directly or indirectly, the
dependent variable.
The purpose of the research should help determine which is the independent variable and which is the dependent variable. In the real world, variables are neither dependent nor independent; they can be switched around depending on the research problem. A variable defined as independent in one research investigation may be a dependent variable in another.7 For instance, educational attainment may be an independent variable in a study attempting to explain how education influences political attitudes. However, in an investigation of whether a person’s level of education is influenced by the social status of his or her family of origin, educational attainment is the dependent variable. Some variables, such as race, age, and ethnicity, because they are primordial characteristics that cannot be explained by social scientists, are never considered dependent variables in a social science analysis.
Learning Check 1.3
Identify the independent and dependent variables in the following hypotheses:
Older Americans are more likely to support stricter immigration laws than younger Americans. People who attend church regularly are more likely to oppose abortion than people who do not attend church regularly. Elderly women are more likely to live alone than elderly men. Individuals with postgraduate education are likely to have fewer children than those with less education.
What are the independent and dependent variables in your hypothesis?
36
Collecting Data
Once we have decided on the research question, the hypothesis, and the variables to be included in the study, we proceed to the next stage in the research cycle. This step includes measuring our variables and collecting the data. As researchers, we must decide how to measure the variables of interest to us, how to select the cases for our research, and what kind of data collection techniques we will be using. A wide variety of data collection techniques are available to us, from direct observations to survey research, experiments, or secondary sources. Similarly, we can construct numerous measuring instruments. These instruments can be as simple as a single question included in a questionnaire or as complex as a composite measure constructed through the combination of two or more questionnaire items. The choice of a particular data collection method or instrument to measure our variables depends on the study objective. For instance, suppose we decide to study how one’s social class is related to attitudes about women in the labor force. Since attitudes about working women are not directly observable, we need to collect data by asking a group of people questions about their attitudes and opinions. A suitable method of data collection for this project would be a survey that uses some kind of questionnaire or interview guide to elicit verbal reports from respondents. The questionnaire could include numerous questions designed to measure attitudes toward working women, social class, and other variables relevant to the study.
How would we go about collecting data to test the hypothesis relating the gender composition of occupations to wages? We want to gather information on the proportion of men and women in different occupations and the average earnings for these occupations. This kind of information is routinely collected and disseminated by the U.S. Department of Labor, the Bureau of Labor Statistics, and the U.S. Census Bureau. We could use these data to test our hypothesis.
37
Levels of Measurement
The statistical analysis of data involves many mathematical operations, from simple counting to addition and multiplication. However, not every operation can be used with every variable. The type of statistical operation we employ depends on how our variables are measured. For example, for the variable gender, we can use the number 1 to represent females and the number 2 to represent males. Similarly, 1 can also be used as a numerical code for the category “one child” in the variable number of children. Clearly, in the first example, the number is an arbitrary symbol that does not correspond to the property “female,” whereas in the second example the number 1 has a distinct numerical meaning that does correspond to the property “one child.” The correspondence between the properties we measure and the numbers representing these properties determines the type of statistical operations we can use. The degree of correspondence also leads to different ways of measuring—that is, to distinct levels of measurement. In this section, we will discuss three levels of measurement: (1) nominal, (2) ordinal, and (3) interval-ratio.
Nominal Level of Measurement
At the nominal level of measurement, numbers or other symbols are assigned a set of categories for the purpose of naming, labeling, or classifying the observations. Gender is an example of a nominal-level variable (Table 1.2). Using the numbers 1 and 2, for instance, we can classify our observations into the categories “females” and “males,” with 1 representing females and 2 representing males. We could use any of a variety of symbols to represent the different categories of a nominal variable; however, when numbers are used to represent the different categories, we do not imply anything about the magnitude or quantitative difference between the categories. Nominal categories cannot be rank-ordered. Because the different categories (e.g., males vs. females) vary in the quality inherent in each but not in quantity, nominal variables are often called qualitative. Other examples of nominal-level variables are political party, religion, and race.
Nominal measurement Numbers or other symbols are assigned to a set of categories for the purpose of naming, labeling, or classifying the observations. Nominal categories cannot be rank-ordered.
Nominal variables should include categories that are both exhaustive and mutually exclusive. Exhaustiveness means that there should be enough categories composing the variables to classify every observation. For example, the common classification of the variable marital status into the categories “married,” “single,” and “widowed” violates the requirement of exhaustiveness. As defined, it does not allow us to classify same-sex couples or heterosexual couples who are not legally married. We can make every variable exhaustive by adding the category “other” to the list of categories. However, this practice is not recommended if it leads to the exclusion of categories that have theoretical significance or a
38
substantial number of observations.
Mutual exclusiveness means that there is only one category suitable for each observation. For example, we need to define religion in such a way that no one would be classified into more than one category. For instance, the categories Protestant and Methodist are not mutually exclusive because Methodists are also considered Protestant and, therefore, could be classified into both categories.
Learning Check 1.4
Review the definitions of exhaustive and mutually exclusive. Now look at Table 1.2. What other categories could be added to each variable to be exhaustive and mutually exclusive?
Ordinal measurement Numbers are assigned to rank-ordered categories ranging from low to high.
Ordinal Level of Measurement
Whenever we assign numbers to rank-ordered categories ranging from low to high, we have an ordinal level of measurement. Social class is an example of an ordinal variable. We might classify individuals with respect to their social class status as “upper class,” “middle class,” or “working class.” We can say that a person in the category “upper class” has a higher class position than a person in a “middle-class” category (or that a “middle-class” position is higher than a “working-class” position), but we do not know the magnitude of the differences between the categories—that is, we don’t know how much higher “upper class” is compared with the “middle class.”