Recent Orders

Our Reviews

Sample Papers

How It Works

Get First 2 Pages Of Your Homework Absolutely Free!

Messages

Welcome to TutorsOnSpot.Com!

World's No. 1 Assignment Writing Market

Post Your Homework

Proposals

Post your homework and get free proposals here!

Post Your Homework

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Get Free Quotes Post Your Requirements

20.4 run ons practice 2 answers

18/10/2021 Client: muhammad11 Deadline: 2 Day

Statistics Econometric.

Introduction to Econometrics

Abel/Bernanke/Croushore Macroeconomics*

Bade/Parkin Foundations of Economics*

Berck/Helfand The Economics of the Environment

Bierman/Fernandez Game Theory with Economic Applications

Blanchard Macroeconomics*

Blau/Ferber/Winkler The Economics of Women, Men, and Work

Boardman/Greenberg/Vining/Weimer Cost-Benefit Analysis

Boyer Principles of Transportation Economics

Branson Macroeconomic Theory and Policy

Bruce Public Finance and the American Economy

Carlton/Perloff Modern Industrial Organization

Case/Fair/Oster Principles of Economics*

Chapman Environmental Economics: Theory, Application, and Policy

Cooter/Ulen Law & Economics

Daniels/VanHoose International Monetary & Financial Economics

Downs An Economic Theory of Democracy

Ehrenberg/Smith Modern Labor Economics

Farnham Economics for Managers

Folland/Goodman/Stano The Economics of Health and Health Care

Fort Sports Economics

Froyen Macroeconomics

Fusfeld The Age of the Economist

Gerber International Economics*

González-Rivera Forecasting for Economics and Business

Gordon Macroeconomics*

Greene Econometric Analysis

Gregory Essentials of Economics

Gregory/Stuart Russian and Soviet Economic Performance and Structure

Hartwick/Olewiler The Economics of Natural Resource Use

Heilbroner/Milberg The Making of the Economic Society

Heyne/Boettke/Prychitko The Economic Way of Thinking

Holt Markets, Games, and Strategic Behavior

Hubbard/O’Brien Economics* Money, Banking, and the Financial System*

Hubbard/O’Brien/Rafferty Macroeconomics*

Hughes/Cain American Economic History

Husted/Melvin International Economics

Jehle/Reny Advanced Microeconomic Theory

Johnson-Lans A Health Economics Primer

Keat/Young/Erfle Managerial Economics

Klein Mathematical Methods for Economics

Krugman/Obstfeld/Melitz International Economics: Theory & Policy*

Laidler The Demand for Money

Leeds/von Allmen The Economics of Sports

Leeds/von Allmen/Schiming Economics*

Lynn Economic Development: Theory and Practice for a Divided World

Miller Economics Today* Understanding Modern Economics

Miller/Benjamin The Economics of Macro Issues

Miller/Benjamin/North The Economics of Public Issues

Mills/Hamilton Urban Economics

Mishkin The Economics of Money, Banking, and Financial Markets*

The Economics of Money, Banking, and Financial Markets, Business School Edition*

Macroeconomics: Policy and Practice*

Murray Econometrics: A Modern Introduction

O’Sullivan/Sheffrin/Perez Economics: Principles, Applications, and Tools*

Parkin Economics*

Perloff Microeconomics* Microeconomics: Theory and Applications with Calculus*

Perloff/Brander Managerial Economics and Strategy*

Phelps Health Economics

Pindyck/Rubinfeld Microeconomics*

Riddell/Shackelford/ Stamos/Schneider

Economics: A Tool for Critically Understanding Society

Roberts The Choice: A Fable of Free Trade and Protection

Rohlf Introduction to Economic Reasoning

Roland Development Economics

Scherer Industry Structure, Strategy, and Public Policy

Schiller The Economics of Poverty and Discrimination

Sherman Market Regulation

Stock/Watson Introduction to Econometrics

Studenmund Using Econometrics: A Practical Guide

Tietenberg/Lewis Environmental and Natural Resource Economics Environmental Economics and Policy

Todaro/Smith Economic Development

Waldman/Jensen Industrial Organization: Theory and Practice

Walters/Walters/Appel/ Callahan/Centanni/ Maex/O’Neill

Econversations: Today’s Students Discuss Today’s Issues

Weil Economic Growth

Williamson Macroeconomics

The Pearson Series in Economics

*denotes MyEconLab titles. Visit www.myeconlab.com to learn more.

www.myeconlab.com
Introduction to Econometrics

James H. Stock Harvard University

Mark W. Watson Princeton University

Boston Columbus Indianapolis New York San Francisco Hoboken Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto

Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo

T h i r d E d i T i o n U p d a T E

Vice President, Product Management: Donna Battista Acquisitions Editor: Christina Masturzo Editorial Assistant: Christine Mallon Vice President, Marketing: Maggie Moylan Director, Strategy and Marketing: Scott Dustan Manager, Field Marketing: Leigh Ann Sims Product Marketing Manager: Alison Haskins Executive Field Marketing Manager: Lori DeShazo Senior Strategic Marketing Manager: Erin Gardner Team Lead, Program Management: Ashley Santora Program Manager: Carolyn Philips

Team Lead, Project Management: Jeff Holcomb Project Manager: Liz Napolitano Operations Specialist: Carol Melville Cover Designer: Jon Boylan Cover Art: Courtesy of Carolin Pflueger and the authors. Full-Service Project Management, Design, and Electronic Composition: Cenveo® Publisher Services Printer/Binder: Edwards Brothers Malloy Cover Printer: Lehigh-Phoenix Color/Hagerstown Text Font: 10/14 Times Ten Roman

About the cover: The cover shows a heat chart of 270 monthly variables measuring different aspects of employment, production, income, and sales for the United States, 1974–2010. Each horizontal line depicts a different variable, and the horizontal axis is the date. Strong monthly increases in a variable are blue and sharp monthly declines are red. The simultaneous declines in many of these measures during recessions appear in the figure as vertical red bands.

Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this textbook appear on appropriate page within text.

Photo Credits: page 410 left: Henrik Montgomery/Pressens Bild/AP Photo; page 410 right: Paul Sakuma/AP Photo; page 428 left: Courtesy of Allison Harris; page 428 right: Courtesy of Allison Harris; page 669 top left: John McCombe/AP Photo; bottom left: New York University/AFP/Newscom; top right: Denise Applewhite/Princeton University/AP Photo; bottom right: Courtesy of the University of Chicago/AP Photo.

Copyright © 2015, 2011, 2007 Pearson Education, Inc. All rights reserved. Manufactured in the United States of America. This publication is protected by Copyright, and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. To obtain permission(s) to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, 221 River Street, Hoboken, New Jersey 07030.

Many of the designations by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed in initial caps or call caps.

Library of Congress Cataloging-in-Publication Data Stock, James H. Introduction to econometrics/James H. Stock, Harvard University, Mark W. Watson, Princeton University.— Third edition update. pages cm.—(The Pearson series in economics) Includes bibliographical references and index. ISBN 978-0-13-348687-2—ISBN 0-13-348687-7 1. Econometrics. I. Watson, Mark W. II. Title. HB139.S765 2015 330.01’5195––dc23 2014018465

ISBN-10: 0-13-348687-7 www.pearsonhighered.com ISBN-13: 978-0-13-348687-2

www.pearsonhighered.com
v

Brief Contents

PART ONE Introduction and Review

CHaPter 1 economic Questions and Data 1

CHaPter 2 review of Probability 14

CHaPter 3 review of Statistics 65

PART TWO Fundamentals of Regression Analysis

CHaPter 4 Linear regression with One regressor 109

CHaPter 5 regression with a Single regressor: Hypothesis tests and Confidence Intervals 146

CHaPter 6 Linear regression with Multiple regressors 182

CHaPter 7 Hypothesis tests and Confidence Intervals in Multiple regression 217

CHaPter 8 Nonlinear regression Functions 256

CHaPter 9 assessing Studies Based on Multiple regression 315

PART THREE Further Topics in Regression Analysis

CHaPter 10 regression with Panel Data 350

CHaPter 11 regression with a Binary Dependent Variable 385

CHaPter 12 Instrumental Variables regression 424

CHaPter 13 experiments and Quasi-experiments 475

PART FOuR Regression Analysis of Economic Time Series Data

CHaPter 14 Introduction to time Series regression and Forecasting 522

CHaPter 15 estimation of Dynamic Causal effects 589

CHaPter 16 additional topics in time Series regression 638

PART FIvE The Econometric Theory of Regression Analysis

CHaPter 17 the theory of Linear regression with One regressor 676

CHaPter 18 the theory of Multiple regression 705

This page intentionally left blank

vii

Contents

Preface xxix

PART ONE Introduction and Review

CHAPTER 1 Economic Questions and Data 1

1.1 economic Questions We examine 1 Question #1: Does reducing Class Size Improve elementary School education? 2 Question #2: Is there racial Discrimination in the Market for Home Loans? 3 Question #3: How Much Do Cigarette taxes reduce Smoking? 3 Question #4: By How Much Will U.S. GDP Grow Next Year? 4 Quantitative Questions, Quantitative answers 5

1.2 Causal effects and Idealized experiments 5 estimation of Causal effects 6 Forecasting and Causality 7

1.3 Data: Sources and types 7 experimental Versus Observational Data 7 Cross-Sectional Data 8 time Series Data 9 Panel Data 11

CHAPTER 2 Review of Probability 14

2.1 random Variables and Probability Distributions 15 Probabilities, the Sample Space, and random Variables 15 Probability Distribution of a Discrete random Variable 16 Probability Distribution of a Continuous random Variable 19

2.2 expected Values, Mean, and Variance 19 the expected Value of a random Variable 19 the Standard Deviation and Variance 21 Mean and Variance of a Linear Function of a random Variable 22 Other Measures of the Shape of a Distribution 23

2.3 two random Variables 26 Joint and Marginal Distributions 26

viii Contents

Conditional Distributions 27 Independence 31 Covariance and Correlation 31 the Mean and Variance of Sums of random Variables 32

2.4 the Normal, Chi-Squared, Student t, and F Distributions 36 the Normal Distribution 36 the Chi-Squared Distribution 41 the Student t Distribution 41 the F Distribution 42

2.5 random Sampling and the Distribution of the Sample average 43 random Sampling 43 the Sampling Distribution of the Sample average 44

2.6 Large-Sample approximations to Sampling Distributions 47 the Law of Large Numbers and Consistency 48 the Central Limit theorem 50

aPPeNDIx 2.1 Derivation of results in Key Concept 2.3 63

CHAPTER 3 Review of Statistics 65

3.1 estimation of the Population Mean 66 estimators and their Properties 66 Properties of Y 68 the Importance of random Sampling 70

3.2 Hypothesis tests Concerning the Population Mean 71 Null and alternative Hypotheses 71 the p-Value 72 Calculating the p-Value When sY Is Known 73 the Sample Variance, Sample Standard Deviation, and Standard error 74 Calculating the p-Value When sY Is Unknown 76 the t-Statistic 76 Hypothesis testing with a Prespecified Significance Level 77 One-Sided alternatives 79

3.3 Confidence Intervals for the Population Mean 80

3.4 Comparing Means from Different Populations 82 Hypothesis tests for the Difference Between two Means 82 Confidence Intervals for the Difference Between two Population Means 84

Contents ix

3.5 Differences-of-Means estimation of Causal effects Using experimental Data 84 the Causal effect as a Difference of Conditional expectations 85 estimation of the Causal effect Using Differences of Means 85

3.6 Using the t-Statistic When the Sample Size Is Small 87 the t-Statistic and the Student t Distribution 87 Use of the Student t Distribution in Practice 89

3.7 Scatterplots, the Sample Covariance, and the Sample Correlation 91 Scatterplots 91 Sample Covariance and Correlation 92

aPPeNDIx 3.1 the U.S. Current Population Survey 106

aPPeNDIx 3.2 two Proofs that Y Is the Least Squares estimator of μY 107

aPPeNDIx 3.3 a Proof that the Sample Variance Is Consistent 108

PART TWO Fundamentals of Regression Analysis

CHAPTER 4 Linear Regression with One Regressor 109

4.1 the Linear regression Model 109

4.2 estimating the Coefficients of the Linear regression Model 114 the Ordinary Least Squares estimator 116 OLS estimates of the relationship Between test Scores and the Student–

teacher ratio 118 Why Use the OLS estimator? 119

4.3 Measures of Fit 121 the R2 121 the Standard error of the regression 122 application to the test Score Data 123

4.4 the Least Squares assumptions 124 assumption #1: the Conditional Distribution of ui Given Xi Has a Mean of Zero 124 assumption #2: (Xi, Yi), i = 1,…, n, are Independently and Identically

Distributed 126 assumption #3: Large Outliers are Unlikely 127 Use of the Least Squares assumptions 128

x Contents

4.5 Sampling Distribution of the OLS estimators 129 the Sampling Distribution of the OLS estimators 130

4.6 Conclusion 133

aPPeNDIx 4.1 the California test Score Data Set 141

aPPeNDIx 4.2 Derivation of the OLS estimators 141

aPPeNDIx 4.3 Sampling Distribution of the OLS estimator 142

CHAPTER 5 Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals 146

5.1 testing Hypotheses about One of the regression Coefficients 146 two-Sided Hypotheses Concerning β1 147 One-Sided Hypotheses Concerning β1 150 testing Hypotheses about the Intercept β0 152

5.2 Confidence Intervals for a regression Coefficient 153

5.3 regression When X Is a Binary Variable 155 Interpretation of the regression Coefficients 155

5.4 Heteroskedasticity and Homoskedasticity 157 What are Heteroskedasticity and Homoskedasticity? 158 Mathematical Implications of Homoskedasticity 160 What Does this Mean in Practice? 161

5.5 the theoretical Foundations of Ordinary Least Squares 163 Linear Conditionally Unbiased estimators and the Gauss–Markov

theorem 164 regression estimators Other than OLS 165

5.6 Using the t-Statistic in regression When the Sample Size Is Small 166 the t-Statistic and the Student t Distribution 166 Use of the Student t Distribution in Practice 167

5.7 Conclusion 168

aPPeNDIx 5.1 Formulas for OLS Standard errors 177

aPPeNDIx 5.2 the Gauss–Markov Conditions and a Proof of the Gauss–Markov theorem 178

Contents xi

CHAPTER 6 Linear Regression with Multiple Regressors 182

6.1 Omitted Variable Bias 182 Definition of Omitted Variable Bias 183 a Formula for Omitted Variable Bias 185 addressing Omitted Variable Bias by Dividing the Data into

Groups 187

6.2 the Multiple regression Model 189 the Population regression Line 189 the Population Multiple regression Model 190

6.3 the OLS estimator in Multiple regression 192 the OLS estimator 193 application to test Scores and the Student–teacher ratio 194

6.4 Measures of Fit in Multiple regression 196 the Standard error of the regression (SER) 196 the R2 196 the “adjusted R2” 197 application to test Scores 198

6.5 the Least Squares assumptions in Multiple regression 199 assumption #1: the Conditional Distribution of ui Given X1i, X2i, c, Xki Has a

Mean of Zero 199 assumption #2: (X1i, X2i, c, Xki, Yi), i = 1, c, n, are i.i.d. 199 assumption #3: Large Outliers are Unlikely 199 assumption #4: No Perfect Multicollinearity 200

6.6 the Distribution of the OLS estimators in Multiple regression 201

6.7 Multicollinearity 202 examples of Perfect Multicollinearity 203 Imperfect Multicollinearity 205

6.8 Conclusion 206

aPPeNDIx 6.1 Derivation of equation (6.1) 214

aPPeNDIx 6.2 Distribution of the OLS estimators When there are two regressors and Homoskedastic errors 214

aPPeNDIx 6.3 the Frisch–Waugh theorem 215

xii Contents

CHAPTER 7 Hypothesis Tests and Confidence Intervals in Multiple Regression 217

7.1 Hypothesis tests and Confidence Intervals for a Single Coefficient 217 Standard errors for the OLS estimators 217 Hypothesis tests for a Single Coefficient 218 Confidence Intervals for a Single Coefficient 219 application to test Scores and the Student–teacher ratio 220

7.2 tests of Joint Hypotheses 222 testing Hypotheses on two or More Coefficients 222 the F-Statistic 224 application to test Scores and the Student–teacher ratio 226 the Homoskedasticity-Only F-Statistic 227

7.3 testing Single restrictions Involving Multiple Coefficients 229

7.4 Confidence Sets for Multiple Coefficients 231

7.5 Model Specification for Multiple regression 232 Omitted Variable Bias in Multiple regression 233 the role of Control Variables in Multiple regression 234 Model Specification in theory and in Practice 236 Interpreting the R2 and the adjusted R2 in Practice 237

7.6 analysis of the test Score Data Set 238

7.7 Conclusion 243

aPPeNDIx 7.1 the Bonferroni test of a Joint Hypothesis 251

aPPeNDIx 7.2 Conditional Mean Independence 253

CHAPTER 8 Nonlinear Regression Functions 256

8.1 a General Strategy for Modeling Nonlinear regression Functions 258 test Scores and District Income 258 the effect on Y of a Change in X in Nonlinear Specifications 261 a General approach to Modeling Nonlinearities Using Multiple regression 266

8.2 Nonlinear Functions of a Single Independent Variable 266 Polynomials 267 Logarithms 269 Polynomial and Logarithmic Models of test Scores and District Income 277

Contents xiii

8.3 Interactions Between Independent Variables 278 Interactions Between two Binary Variables 279 Interactions Between a Continuous and a Binary Variable 282 Interactions Between two Continuous Variables 286

8.4 Nonlinear effects on test Scores of the Student–teacher ratio 293 Discussion of regression results 293 Summary of Findings 297

8.5 Conclusion 298

aPPeNDIx 8.1 regression Functions that are Nonlinear in the Parameters 309

aPPeNDIx 8.2 Slopes and elasticities for Nonlinear regression Functions 313

CHAPTER 9 Assessing Studies Based on Multiple Regression 315

9.1 Internal and external Validity 315 threats to Internal Validity 316 threats to external Validity 317

9.2 threats to Internal Validity of Multiple regression analysis 319 Omitted Variable Bias 319 Misspecification of the Functional Form of the regression Function 321 Measurement error and errors-in-Variables Bias 322 Missing Data and Sample Selection 325 Simultaneous Causality 326 Sources of Inconsistency of OLS Standard errors 329

9.3 Internal and external Validity When the regression Is Used for Forecasting 331 Using regression Models for Forecasting 331 assessing the Validity of regression Models for Forecasting 332

9.4 example: test Scores and Class Size 332 external Validity 332 Internal Validity 339 Discussion and Implications 341

9.5 Conclusion 342

aPPeNDIx 9.1 the Massachusetts elementary School testing Data 349

xiv Contents

PART THREE Further Topics in Regression Analysis

CHAPTER 10 Regression with Panel Data 350

10.1 Panel Data 351 example: traffic Deaths and alcohol taxes 352

10.2 Panel Data with two time Periods: “Before and after” Comparisons 354

10.3 Fixed effects regression 357 the Fixed effects regression Model 357 estimation and Inference 359 application to traffic Deaths 361

10.4 regression with time Fixed effects 361 time effects Only 362 Both entity and time Fixed effects 363

10.5 the Fixed effects regression assumptions and Standard errors for Fixed effects regression 365 the Fixed effects regression assumptions 365 Standard errors for Fixed effects regression 367

10.6 Drunk Driving Laws and traffic Deaths 368

10.7 Conclusion 372

aPPeNDIx 10.1 the State traffic Fatality Data Set 380

aPPeNDIx 10.2 Standard errors for Fixed effects regression 380

CHAPTER 11 Regression with a Binary Dependent variable 385

11.1 Binary Dependent Variables and the Linear Probability Model 386 Binary Dependent Variables 386 the Linear Probability Model 388

11.2 Probit and Logit regression 391 Probit regression 391 Logit regression 396 Comparing the Linear Probability, Probit, and Logit Models 398

11.3 estimation and Inference in the Logit and Probit Models 398 Nonlinear Least Squares estimation 399

Contents xv

Maximum Likelihood estimation 400 Measures of Fit 401

11.4 application to the Boston HMDa Data 402

11.5 Conclusion 409

aPPeNDIx 11.1 the Boston HMDa Data Set 418

aPPeNDIx 11.2 Maximum Likelihood estimation 418

aPPeNDIx 11.3 Other Limited Dependent Variable Models 421

CHAPTER 12 Instrumental variables Regression 424

12.1 the IV estimator with a Single regressor and a Single Instrument 425 the IV Model and assumptions 425 the two Stage Least Squares estimator 426 Why Does IV regression Work? 427 the Sampling Distribution of the tSLS estimator 431 application to the Demand for Cigarettes 433

12.2 the General IV regression Model 435 tSLS in the General IV Model 437 Instrument relevance and exogeneity in the General IV Model 438 the IV regression assumptions and Sampling Distribution of the

tSLS estimator 439 Inference Using the tSLS estimator 440 application to the Demand for Cigarettes 441

12.3 Checking Instrument Validity 442 assumption #1: Instrument relevance 443 assumption #2: Instrument exogeneity 445

12.4 application to the Demand for Cigarettes 448

12.5 Where Do Valid Instruments Come From? 453 three examples 454

12.6 Conclusion 458

aPPeNDIx 12.1 the Cigarette Consumption Panel Data Set 467

aPPeNDIx 12.2 Derivation of the Formula for the tSLS estimator in equation (12.4) 467

xvi Contents

aPPeNDIx 12.3 Large-Sample Distribution of the tSLS estimator 468

aPPeNDIx 12.4 Large-Sample Distribution of the tSLS estimator When the Instrument Is Not Valid 469

aPPeNDIx 12.5 Instrumental Variables analysis with Weak Instruments 471

aPPeNDIx 12.6 tSLS with Control Variables 473

CHAPTER 13 Experiments and Quasi-Experiments 475

13.1 Potential Outcomes, Causal effects, and Idealized experiments 476 Potential Outcomes and the average Causal effect 476 econometric Methods for analyzing experimental Data 478

13.2 threats to Validity of experiments 479 threats to Internal Validity 479 threats to external Validity 483

13.3 experimental estimates of the effect of Class Size reductions 484 experimental Design 485 analysis of the Star Data 486 Comparison of the Observational and experimental estimates of Class Size

effects 491

13.4 Quasi-experiments 493 examples 494 the Differences-in-Differences estimator 496 Instrumental Variables estimators 499 regression Discontinuity estimators 500

13.5 Potential Problems with Quasi-experiments 502 threats to Internal Validity 502 threats to external Validity 504

13.6 experimental and Quasi-experimental estimates in Heterogeneous Populations 504 OLS with Heterogeneous Causal effects 505 IV regression with Heterogeneous Causal effects 506

Contents xvii

13.7 Conclusion 509

aPPeNDIx 13.1 the Project Star Data Set 518

aPPeNDIx 13.2 IV estimation When the Causal effect Varies across Individuals 518

aPPeNDIx 13.3 the Potential Outcomes Framework for analyzing Data from experiments 520

PART FOuR Regression Analysis of Economic Time Series Data

CHAPTER 14 Introduction to Time Series Regression and Forecasting 522

14.1 Using regression Models for Forecasting 523

14.2 Introduction to time Series Data and Serial Correlation 524 real GDP in the United States 524 Lags, First Differences, Logarithms, and Growth rates 525 autocorrelation 528 Other examples of economic time Series 529

14.3 autoregressions 531 the First-Order autoregressive Model 531 the pth-Order autoregressive Model 534

14.4 time Series regression with additional Predictors and the autoregressive Distributed Lag Model 537 Forecasting GDP Growth Using the term Spread 537 Stationarity 540 time Series regression with Multiple Predictors 541 Forecast Uncertainty and Forecast Intervals 544

14.5 Lag Length Selection Using Information Criteria 547 Determining the Order of an autoregression 547 Lag Length Selection in time Series regression with Multiple Predictors 550

14.6 Nonstationarity I: trends 551 What Is a trend? 551 Problems Caused by Stochastic trends 554 Detecting Stochastic trends: testing for a Unit ar root 556 avoiding the Problems Caused by Stochastic trends 561

xviii Contents

14.7 Nonstationarity II: Breaks 561 What Is a Break? 562 testing for Breaks 562 Pseudo Out-of-Sample Forecasting 567 avoiding the Problems Caused by Breaks 573

14.8 Conclusion 573

aPPeNDIx 14.1 time Series Data Used in Chapter 14 583

aPPeNDIx 14.2 Stationarity in the ar(1) Model 584

aPPeNDIx 14.3 Lag Operator Notation 585

aPPeNDIx 14.4 arMa Models 586

aPPeNDIx 14.5 Consistency of the BIC Lag Length estimator 587

CHAPTER 15 Estimation of Dynamic Causal Effects 589

15.1 an Initial taste of the Orange Juice Data 590

15.2 Dynamic Causal effects 593 Causal effects and time Series Data 593 two types of exogeneity 596

15.3 estimation of Dynamic Causal effects with exogenous regressors 597 the Distributed Lag Model assumptions 598 autocorrelated ut, Standard errors, and Inference 599 Dynamic Multipliers and Cumulative Dynamic Multipliers 600

15.4 Heteroskedasticity- and autocorrelation-Consistent Standard errors 601 Distribution of the OLS estimator with autocorrelated errors 602 HaC Standard errors 604

15.5 estimation of Dynamic Causal effects with Strictly exogenous regressors 606 the Distributed Lag Model with ar(1) errors 607 OLS estimation of the aDL Model 610 GLS estimation 611 the Distributed Lag Model with additional Lags and ar(p) errors 613

15.6 Orange Juice Prices and Cold Weather 616

Contents xix

15.7 Is exogeneity Plausible? Some examples 624 U.S. Income and australian exports 624 Oil Prices and Inflation 625 Monetary Policy and Inflation 626 the Growth rate of GDP and the term Spread 626

15.8 Conclusion 627

aPPeNDIx 15.1 the Orange Juice Data Set 634

aPPeNDIx 15.2 the aDL Model and Generalized Least Squares in Lag Operator Notation 634

CHAPTER 16 Additional Topics in Time Series Regression 638

16.1 Vector autoregressions 638 the Var Model 639 a Var Model of the Growth rate of GDP and the term Spread 642

16.2 Multiperiod Forecasts 643 Iterated Multiperiod Forecasts 643 Direct Multiperiod Forecasts 645 Which Method Should You Use? 648

16.3 Orders of Integration and the DF-GLS Unit root test 649 Other Models of trends and Orders of Integration 649 the DF-GLS test for a Unit root 651 Why Do Unit root tests Have Nonnormal Distributions? 654

16.4 Cointegration 656 Cointegration and error Correction 656 How Can You tell Whether two Variables are Cointegrated? 658 estimation of Cointegrating Coefficients 659 extension to Multiple Cointegrated Variables 661 application to Interest rates 662

16.5 Volatility Clustering and autoregressive Conditional Heteroskedasticity 664 Volatility Clustering 664 autoregressive Conditional Heteroskedasticity 666 application to Stock Price Volatility 667

16.6 Conclusion 670

xx Contents

PART FIvE The Econometric Theory of Regression Analysis

CHAPTER 17 The Theory of Linear Regression with One Regressor 676

17.1 the extended Least Squares assumptions and the OLS estimator 677 the extended Least Squares assumptions 677 the OLS estimator 679

17.2 Fundamentals of asymptotic Distribution theory 679 Convergence in Probability and the Law of Large Numbers 680 the Central Limit theorem and Convergence in Distribution 682 Slutsky’s theorem and the Continuous Mapping theorem 683 application to the t-Statistic Based on the Sample Mean 684

17.3 asymptotic Distribution of the OLS estimator and t-Statistic 685 Consistency and asymptotic Normality of the OLS estimators 685 Consistency of Heteroskedasticity-robust Standard errors 685 asymptotic Normality of the Heteroskedasticity-robust t-Statistic 687

17.4 exact Sampling Distributions When the errors are Normally Distributed 687 Distribution of β1 with Normal errors 687 Distribution of the Homoskedasticity-Only t-Statistic 689

17.5 Weighted Least Squares 690 WLS with Known Heteroskedasticity 690 WLS with Heteroskedasticity of Known Functional Form 691 Heteroskedasticity-robust Standard errors or WLS? 694

aPPeNDIx 17.1 the Normal and related Distributions and Moments of Continuous random Variables 700

aPPeNDIx 17.2 two Inequalities 703

CHAPTER 18 The Theory of Multiple Regression 705

18.1 the Linear Multiple regression Model and OLS estimator in Matrix Form 706 the Multiple regression Model in Matrix Notation 706 the extended Least Squares assumptions 708 the OLS estimator 709

Contents xxi

18.2 asymptotic Distribution of the OLS estimator and t-Statistic 710 the Multivariate Central Limit theorem 710 asymptotic Normality of bn 711 Heteroskedasticity-robust Standard errors 712 Confidence Intervals for Predicted effects 713 asymptotic Distribution of the t-Statistic 713

18.3 tests of Joint Hypotheses 713 Joint Hypotheses in Matrix Notation 714 asymptotic Distribution of the F-Statistic 714 Confidence Sets for Multiple Coefficients 715

18.4 Distribution of regression Statistics with Normal errors 716 Matrix representations of OLS regression Statistics 716 Distribution of bn with Normal errors 717 Distribution of s2uN 718 Homoskedasticity-Only Standard errors 718 Distribution of the t-Statistic 719 Distribution of the F-Statistic 719

18.5 efficiency of the OLS estimator with Homoskedastic errors 720 the Gauss–Markov Conditions for Multiple regression 720 Linear Conditionally Unbiased estimators 720 the Gauss–Markov theorem for Multiple regression 721

18.6 Generalized Least Squares 722 the GLS assumptions 723 GLS When Ω Is Known 725 GLS When Ω Contains Unknown Parameters 726 the Zero Conditional Mean assumption and GLS 726

18.7 Instrumental Variables and Generalized Method of Moments estimation 728 the IV estimator in Matrix Form 729 asymptotic Distribution of the tSLS estimator 730 Properties of tSLS When the errors are Homoskedastic 731 Generalized Method of Moments estimation in Linear Models 734

aPPeNDIx 18.1 Summary of Matrix algebra 746

aPPeNDIx 18.2 Multivariate Distributions 749

aPPeNDIx 18.3 Derivation of the asymptotic Distribution of β n 751

xxii Contents

aPPeNDIx 18.4 Derivations of exact Distributions of OLS test Statistics with Normal errors 752

aPPeNDIx 18.5 Proof of the Gauss–Markov theorem for Multiple regression 753

aPPeNDIx 18.6 Proof of Selected results for IV and GMM estimation 754

Appendix 757 References 765 Glossary 771 Index 779

xxiii

PART ONE Introduction and Review 1.1 Cross-Sectional, time Series, and Panel Data 12 2.1 expected Value and the Mean 20 2.2 Variance and Standard Deviation 21 2.3 Means, Variances, and Covariances of Sums of random Variables 35 2.4 Computing Probabilities Involving Normal random Variables 37 2.5 Simple random Sampling and i.i.d. random Variables 44 2.6 Convergence in Probability, Consistency, and the Law of Large Numbers 48 2.7 the Central Limit theorem 52 3.1 estimators and estimates 67 3.2 Bias, Consistency, and efficiency 68 3.3 efficiency of Y : Y Is BLUe 69 3.4 the Standard error of Y 75 3.5 the terminology of Hypothesis testing 78 3.6 testing the Hypothesis E(Y) = μY,0 against the alternative E(Y) ≠ μY,0 79 3.7 Confidence Intervals for the Population Mean 81

PART TWO Fundamentals of Regression Analysis 4.1 terminology for the Linear regression Model with a Single regressor 113 4.2 the OLS estimator, Predicted Values, and residuals 117 4.3 the Least Squares assumptions 129 4.4 Large-Sample Distributions of bn0 and bn1 131 5.1 General Form of the t-Statistic 147 5.2 testing the Hypothesis b1 = b1,0 against the alternative b1 ≠ b1,0 149 5.3 Confidence Interval for β1 154 5.4 Heteroskedasticity and Homoskedasticity 159 5.5 the Gauss–Markov theorem for bn1 165 6.1 Omitted Variable Bias in regression with a Single regressor 185 6.2 the Multiple regression Model 192 6.3 the OLS estimators, Predicted Values, and residuals in the Multiple regression

Model 194 6.4 the Least Squares assumptions in the Multiple regression Model 201 6.5 Large-Sample Distribution of bn0, bn1, c, bnk 202 7.1 testing the Hypothesis bj = bj,0 against the alternative bj ≠ bj,0 219 7.2 Confidence Intervals for a Single Coefficient in Multiple regression 220

Key Concepts

xxiv Key Concepts

7.3 Omitted Variable Bias in Multiple regression 233 7.4 R2 and R 2: What they tell You—and What they Don’t 238 8.1 the expected Change on Y of a Change in X1 in the Nonlinear regression

Model (8.3) 263 8.2 Logarithms in regression: three Cases 276 8.3 a Method for Interpreting Coefficients in regressions with Binary

Variables 281 8.4 Interactions Between Binary and Continuous Variables 284 8.5 Interactions in Multiple regression 289 9.1 Internal and external Validity 316 9.2 Omitted Variable Bias: Should I Include More Variables in

My regression? 321 9.3 Functional Form Misspecification 322 9.4 errors-in-Variables Bias 324 9.5 Sample Selection Bias 326 9.6 Simultaneous Causality Bias 329 9.7 threats to the Internal Validity of a Multiple regression Study 330

PART THREE Further Topics in Regression Analysis 10.1 Notation for Panel Data 351 10.2 the Fixed effects regression Model 359 10.3 the Fixed effects regression assumptions 366 11.1 the Linear Probability Model 389 11.2 the Probit Model, Predicted Probabilities, and estimated effects 394 11.3 Logit regression 396 12.1 the General Instrumental Variables regression Model and

terminology 436 12.2 two Stage Least Squares 438 12.3 the two Conditions for Valid Instruments 439 12.4 the IV regression assumptions 440 12.5 a rule of thumb for Checking for Weak Instruments 444 12.6 the Overidentifying restrictions test (the J-Statistic) 448

PART FOuR Regression Analysis of Economic Time Series Data 14.1 Lags, First Differences, Logarithms, and Growth rates 527 14.2 autocorrelation (Serial Correlation) and autocovariance 528 14.3 autoregressions 535 14.4 the autoregressive Distributed Lag Model 540

Key Concepts xxv

14.5 Stationarity 541 14.6 time Series regression with Multiple Predictors 542 14.7 Granger Causality tests (tests of Predictive Content) 543 14.8 the augmented Dickey–Fuller test for a Unit autoregressive root 559 14.9 the QLr test for Coefficient Stability 566 14.10 Pseudo Out-of-Sample Forecasts 568 15.1 the Distributed Lag Model and exogeneity 598 15.2 the Distributed Lag Model assumptions 599 15.3 HaC Standard errors 607 15.4 estimation of Dynamic Multipliers Under Strict exogeneity 616 16.1 Vector autoregressions 639 16.2 Iterated Multiperiod Forecasts 646 16.3 Direct Multiperiod Forecasts 648 16.4 Orders of Integration, Differencing, and Stationarity 650 16.5 Cointegration 657

PART FIvE Regression Analysis of Economic Time Series Data 17.1 the extended Least Squares assumptions for regression with a

Single regressor 678 18.1 the extended Least Squares assumptions in the Multiple regression

Model 707 18.2 the Multivariate Central Limit theorem 711 18.3 Gauss–Markov theorem for Multiple regression 722 18.4 the GLS assumptions 724

This page intentionally left blank

xxvii

the Distribution of earnings in the United States in 2012 33 a Bad Day on Wall Street 39 Financial Diversification and Portfolios 46 Landon Wins! 70 the Gender Gap of earnings of College Graduates in the United States 86 a Novel Way to Boost retirement Savings 90 the “Beta” of a Stock 120 the economic Value of a Year of education: Homoskedasticity or Heteroskedasticity? 162 the Mozart effect: Omitted Variable Bias? 186 the return to education and the Gender Gap 287 the Demand for economics Journals 290 Do Stock Mutual Funds Outperform the Market? 327 James Heckman and Daniel McFadden, Nobel Laureates 410 Who Invented Instrumental Variables regression? 428 a Scary regression 446 the externalities of Smoking 450 the Hawthorne effect 482 What Is the effect on employment of the Minimum Wage? 497 Can You Beat the Market? Part I 536 the river of Blood 546 Can You Beat the Market? Part II 570 Orange trees on the March 623 NeWS FLaSH: Commodity traders Send Shivers through Disney World 625 Nobel Laureates in time Series econometrics 669

General Interest Boxes

This page intentionally left blank

xxix

econometrics can be a fun course for both teacher and student. The real world of economics, business, and government is a complicated and messy place, full of competing ideas and questions that demand answers. Is it more effective to tackle drunk driving by passing tough laws or by increasing the tax on alcohol? Can you make money in the stock market by buying when prices are historically low, relative to earnings, or should you just sit tight, as the random walk theory of stock prices suggests? Can we improve elementary education by reducing class sizes, or should we simply have our children listen to Mozart for 10 minutes a day? Econometrics helps us sort out sound ideas from crazy ones and find quantitative answers to important quantitative questions. Econometrics opens a window on our complicated world that lets us see the relationships on which people, busi- nesses, and governments base their decisions.

Introduction to Econometrics is designed for a first course in undergradu- ate econometrics. It is our experience that to make econometrics relevant in an introductory course, interesting applications must motivate the theory and the theory must match the applications. This simple principle represents a sig- nificant departure from the older generation of econometrics books, in which theoretical models and assumptions do not match the applications. It is no won- der that some students question the relevance of econometrics after they spend much of their time learning assumptions that they subsequently realize are unre- alistic so that they must then learn “solutions” to “problems” that arise when the applications do not match the assumptions. We believe that it is far better to motivate the need for tools with a concrete application and then to provide a few simple assumptions that match the application. Because the theory is imme- diately relevant to the applications, this approach can make econometrics come alive.

New to the third edition

• Updated treatment of standard errors for panel data regression

• Discussion of when and why missing data can present a problem for regression analysis

• The use of regression discontinuity design as a method for analyzing quasi- experiments

Preface

xxx Preface

• Updated discussion of weak instruments

• Discussion of the use and interpretation of control variables integrated into the core development of regression analysis

• Introduction of the “potential outcomes” framework for experimental data

• Additional general interest boxes

• Additional exercises, both pencil-and-paper and empirical

This third edition builds on the philosophy of the first and second editions that applications should drive the theory, not the other way around.

One substantial change in this edition concerns inference in regression with panel data (Chapter 10). In panel data, the data within an entity typically are correlated over time. For inference to be valid, standard errors must be com- puted using a method that is robust to this correlation. The chapter on panel data now uses one such method, clustered standard errors, from the outset. Clustered standard errors are the natural extension to panel data of the heteroskedasticity- robust standard errors introduced in the initial treatment of regression analysis in Part II. Recent research has shown that clustered standard errors have a number of desirable properties, which are now discussed in Chapter 10 and in a revised appendix to Chapter 10.

Another substantial set of changes concerns the treatment of experiments and quasi-experiments in Chapter 13. The discussion of differences-in-differences regression has been streamlined and draws directly on the multiple regression principles introduced in Part II. Chapter 13 now discusses regression discontinuity design, which is an intuitive and important framework for the analysis of quasi- experimental data. In addition, Chapter 13 now introduces the potential outcomes framework and relates this increasingly commonplace terminology to concepts that were introduced in Parts I and II.

This edition has a number of other significant changes. One is that it incor- porates a precise but accessible treatment of control variables into the initial discussion of multiple regression. Chapter 7 now discusses conditions for con- trol variables being successful in the sense that the coefficient on the variable of interest is unbiased even though the coefficients on the control variables generally are not. Other changes include a new discussion of missing data in Chapter 9, a new optional calculus-based appendix to Chapter 8 on slopes and elasticities of nonlinear regression functions, and an updated discussion in Chapter 12 of what to do if you have weak instruments. This edition also includes new general interest boxes, updated empirical examples, and additional exercises.

Preface xxxi

the Updated third edition

• The time series data used in Chapters 14–16 have been extended through the beginning of 2013 and now include the Great Recession.

• The empirical analysis in Chapter 14 now focuses on forecasting the growth rate of real GDP using the term spread, replacing the Phillips curve forecasts from earlier editions.

• Several new empirical exercises have been added to each chapter. Rather than include all of the empirical exercises in the text, we have moved many of them to the Companion Website, www.pearsonhighered.com/stock_watson. This has two main advantages: first, we can offer more and more in-depth exercises, and second, we can add and update exercises between editions. We encourage you to browse the empirical exercises available on the Companion Website.

Features of this Book

Introduction to Econometrics differs from other textbooks in three main ways. First, we integrate real-world questions and data into the development of the theory, and we take seriously the substantive findings of the resulting empirical analysis. Second, our choice of topics reflects modern theory and practice. Third, we provide theory and assumptions that match the applications. Our aim is to teach students to become sophisticated consumers of econometrics and to do so at a level of mathematics appropriate for an introductory course.

real-World Questions and Data We organize each methodological topic around an important real-world question that demands a specific numerical answer. For example, we teach single-variable regression, multiple regression, and functional form analysis in the context of estimating the effect of school inputs on school outputs. (Do smaller elementary school class sizes produce higher test scores?) We teach panel data methods in the context of analyzing the effect of drunk driving laws on traffic fatalities. We use possible racial discrimination in the market for home loans as the empirical appli- cation for teaching regression with a binary dependent variable (logit and probit). We teach instrumental variable estimation in the context of estimating the demand elasticity for cigarettes. Although these examples involve economic reasoning, all

www.pearsonhighered.com/stock_watson
xxxii Preface

can be understood with only a single introductory course in economics, and many can be understood without any previous economics coursework. Thus the instruc- tor can focus on teaching econometrics, not microeconomics or macroeconomics.

We treat all our empirical applications seriously and in a way that shows students how they can learn from data but at the same time be self-critical and aware of the limitations of empirical analyses. Through each application, we teach students to explore alternative specifications and thereby to assess whether their substantive findings are robust. The questions asked in the empirical applica- tions are important, and we provide serious and, we think, credible answers. We encourage students and instructors to disagree, however, and invite them to rean- alyze the data, which are provided on the textbook’s Companion Website (www .pearsonhighered.com/stock_watson).

Contemporary Choice of topics Econometrics has come a long way since the 1980s. The topics we cover reflect the best of contemporary applied econometrics. One can only do so much in an introductory course, so we focus on procedures and tests that are commonly used in practice. For example:

• Instrumental variables regression. We present instrumental variables regres- sion as a general method for handling correlation between the error term and a regressor, which can arise for many reasons, including omitted variables and simultaneous causality. The two assumptions for a valid instrument— exogeneity and relevance—are given equal billing. We follow that presenta- tion with an extended discussion of where instruments come from and with tests of overidentifying restrictions and diagnostics for weak instruments, and we explain what to do if these diagnostics suggest problems.

• Program evaluation. An increasing number of econometric studies analyze either randomized controlled experiments or quasi-experiments, also known as natural experiments. We address these topics, often collectively referred to as program evaluation, in Chapter 13. We present this research strategy as an alternative approach to the problems of omitted variables, simultaneous causality, and selection, and we assess both the strengths and the weaknesses of studies using experimental or quasi-experimental data.

• Forecasting. The chapter on forecasting (Chapter 14) considers univariate (autoregressive) and multivariate forecasts using time series regression, not large simultaneous equation structural models. We focus on simple and reli- able tools, such as autoregressions and model selection via an information

www.pearsonhighered.com/stock_watson
www.pearsonhighered.com/stock_watson
Preface xxxiii

criterion, that work well in practice. This chapter also features a practically oriented treatment of stochastic trends (unit roots), unit root tests, tests for structural breaks (at known and unknown dates), and pseudo out-of-sample forecasting, all in the context of developing stable and reliable time series forecasting models.

• Time series regression. We make a clear distinction between two very dif- ferent applications of time series regression: forecasting and estimation of dynamic causal effects. The chapter on causal inference using time series data (Chapter 15) pays careful attention to when different estimation meth- ods, including generalized least squares, will or will not lead to valid causal inferences and when it is advisable to estimate dynamic regressions using OLS with heteroskedasticity- and autocorrelation-consistent standard errors.

theory that Matches applications Although econometric tools are best motivated by empirical applications, stu- dents need to learn enough econometric theory to understand the strengths and limitations of those tools. We provide a modern treatment in which the fit between theory and applications is as tight as possible, while keeping the mathematics at a level that requires only algebra.

Modern empirical applications share some common characteristics: The data sets typically are large (hundreds of observations, often more); regressors are not fixed over repeated samples but rather are collected by random sampling (or some other mechanism that makes them random); the data are not normally dis- tributed; and there is no a priori reason to think that the errors are homoskedastic (although often there are reasons to think that they are heteroskedastic).

These observations lead to important differences between the theoretical development in this textbook and other textbooks:

• Large-sample approach. Because data sets are large, from the outset we use large-sample normal approximations to sampling distributions for hypothesis testing and confidence intervals. In our experience, it takes less time to teach the rudiments of large-sample approximations than to teach the Student t and exact F distributions, degrees-of-freedom corrections, and so forth. This large-sample approach also saves students the frustration of discover- ing that, because of nonnormal errors, the exact distribution theory they just mastered is irrelevant. Once taught in the context of the sample mean, the large-sample approach to hypothesis testing and confidence intervals carries directly through multiple regression analysis, logit and probit, instrumental variables estimation, and time series methods.

xxxiv Preface

• Random sampling. Because regressors are rarely fixed in econometric appli- cations, from the outset we treat data on all variables (dependent and inde- pendent) as the result of random sampling. This assumption matches our initial applications to cross-sectional data, it extends readily to panel and time series data, and because of our large-sample approach, it poses no additional conceptual or mathematical difficulties.

• Heteroskedasticity. Applied econometricians routinely use heteroskedasticity- robust standard errors to eliminate worries about whether heteroskedasticity is present or not. In this book, we move beyond treating heteroskedasticity as an exception or a “problem” to be “solved”; instead, we allow for heteroskedasticity from the outset and simply use heteroskedasticity-robust standard errors. We present homoskedasticity as a special case that provides a theoretical motivation for OLS.

Skilled Producers, Sophisticated Consumers We hope that students using this book will become sophisticated consumers of empir- ical analysis. To do so, they must learn not only how to use the tools of regression analysis but also how to assess the validity of empirical analyses presented to them.

Our approach to teaching how to assess an empirical study is threefold. First, immediately after introducing the main tools of regression analysis, we devote Chapter 9 to the threats to internal and external validity of an empirical study. This chapter discusses data problems and issues of generalizing findings to other settings. It also examines the main threats to regression analysis, including omit- ted variables, functional form misspecification, errors-in-variables, selection, and simultaneity—and ways to recognize these threats in practice.

Second, we apply these methods for assessing empirical studies to the empiri- cal analysis of the ongoing examples in the book. We do so by considering alterna- tive specifications and by systematically addressing the various threats to validity of the analyses presented in the book.

Third, to become sophisticated consumers, students need firsthand experi- ence as producers. Active learning beats passive learning, and econometrics is an ideal course for active learning. For this reason, the textbook website features data sets, software, and suggestions for empirical exercises of different scopes.

approach to Mathematics and Level of rigor Our aim is for students to develop a sophisticated understanding of the tools of modern regression analysis, whether the course is taught at a “high” or a “low” level of mathematics. Parts I through IV of the text (which cover the substantive

Preface xxxv

material) are accessible to students with only precalculus mathematics. Parts I through IV have fewer equations and more applications than many introductory econometrics books and far fewer equations than books aimed at mathemati- cal sections of undergraduate courses. But more equations do not imply a more sophisticated treatment. In our experience, a more mathematical treatment does not lead to a deeper understanding for most students.

That said, different students learn differently, and for mathematically well- prepared students, learning can be enhanced by a more explicitly mathematical treatment. Part V therefore contains an introduction to econometric theory that is appropriate for students with a stronger mathematical background. When the mathematical chapters in Part V are used in conjunction with the material in Parts I through IV, this book is suitable for advanced undergraduate or master’s level econometrics courses.

Contents and Organization

There are five parts to Introduction to Econometrics. This textbook assumes that the student has had a course in probability and statistics, although we review that material in Part I. We cover the core material of regression analysis in Part II. Parts III, IV, and V present additional topics that build on the core treatment in Part II.

Part I Chapter 1 introduces econometrics and stresses the importance of providing quantitative answers to quantitative questions. It discusses the concept of cau- sality in statistical studies and surveys the different types of data encountered in econometrics. Material from probability and statistics is reviewed in Chapters 2 and 3, respectively; whether these chapters are taught in a given course or are simply provided as a reference depends on the background of the students.

Part II Chapter 4 introduces regression with a single regressor and ordinary least squares (OLS) estimation, and Chapter 5 discusses hypothesis tests and confidence inter- vals in the regression model with a single regressor. In Chapter 6, students learn how they can address omitted variable bias using multiple regression, thereby esti- mating the effect of one independent variable while holding other independent variables constant. Chapter 7 covers hypothesis tests, including F-tests, and confi- dence intervals in multiple regression. In Chapter 8, the linear regression model is

xxxvi Preface

extended to models with nonlinear population regression functions, with a focus on regression functions that are linear in the parameters (so that the parameters can be estimated by OLS). In Chapter 9, students step back and learn how to identify the strengths and limitations of regression studies, seeing in the process how to apply the concepts of internal and external validity.

Part III Part III presents extensions of regression methods. In Chapter 10, students learn how to use panel data to control for unobserved variables that are constant over time. Chapter 11 covers regression with a binary dependent variable. Chapter 12 shows how instrumental variables regression can be used to address a variety of problems that produce correlation between the error term and the regressor, and examines how one might find and evaluate valid instruments. Chapter 13 intro- duces students to the analysis of data from experiments and quasi-, or natural, experiments, topics often referred to as “program evaluation.”

Part IV Part IV takes up regression with time series data. Chapter 14 focuses on forecast- ing and introduces various modern tools for analyzing time series regressions, such as unit root tests and tests for stability. Chapter 15 discusses the use of time series data to estimate causal relations. Chapter 16 presents some more advanced tools for time series analysis, including models of conditional heteroskedasticity.

Part V Part V is an introduction to econometric theory. This part is more than an appendix that fills in mathematical details omitted from the text. Rather, it is a self-contained treatment of the econometric theory of estimation and inference in the linear regression model. Chapter 17 develops the theory of regression analysis for a single regressor; the exposition does not use matrix algebra, although it does demand a higher level of mathematical sophistication than the rest of the text. Chapter 18 presents and studies the multiple regression model, instrumental variables regression, and generalized method of moments estimation of the linear model, all in matrix form.

Prerequisites Within the Book Because different instructors like to emphasize different material, we wrote this book with diverse teaching preferences in mind. To the maximum extent possible,

Preface xxxvii

the chapters in Parts III, IV, and V are “stand-alone” in the sense that they do not require first teaching all the preceding chapters. The specific prerequisites for each chapter are described in Table I. Although we have found that the sequence of topics adopted in the textbook works well in our own courses, the chapters are written in a way that allows instructors to present topics in a different order if they so desire.

Sample Courses

This book accommodates several different course structures.

TaBLE i Guide to Prerequisites for Special-Topic Chapters in Parts III, Iv, and v

prerequisite parts or chapters

part i part ii part iii part iV part V

10.1, 12.1,

Chapter 1–3 4–7, 9 8 10.2 12.2 14.1–14.4 14.5–14.8 15 17

10 Xa Xa X

11 Xa Xa X

12.1, 12.2 Xa Xa X

12.3–12.6 Xa Xa X X X

13 Xa Xa X X X

14 Xa Xa b

15 Xa Xa b X

16 Xa Xa b X X X

17 X X X

18 X X X X X

This table shows the minimum prerequisites needed to cover the material in a given chapter. For example, estimation of dynamic causal effects with time series data (Chapter 15) first requires Part I (as needed, depending on student preparation, and except as noted in footnote a), Part II (except for Chapter 8; see footnote b), and Sections 14.1 through 14.4.

aChapters 10 through 16 use exclusively large-sample approximations to sampling distributions, so the optional Sections 3.6 (the Student t distribution for testing means) and 5.6 (the Student t distribution for testing regression coefficients) can be skipped. bChapters 14 through 16 (the time series chapters) can be taught without first teaching Chapter 8 (nonlinear regression functions) if the instructor pauses to explain the use of logarithmic transformations to approximate percentage changes.

xxxviii Preface

Standard Introductory econometrics This course introduces econometrics (Chapter 1) and reviews probability and sta- tistics as needed (Chapters 2 and 3). It then moves on to regression with a single regressor, multiple regression, the basics of functional form analysis, and the evaluation of regression studies (all Part II). The course proceeds to cover regres- sion with panel data (Chapter 10), regression with a limited dependent variable (Chapter 11), and instrumental variables regression (Chapter 12), as time permits. The course concludes with experiments and quasi-experiments in Chapter 13, topics that provide an opportunity to return to the questions of estimating causal effects raised at the beginning of the semester and to recapitulate core regression methods. Prerequisites: Algebra II and introductory statistics.

Introductory econometrics with time Series and Forecasting applications Like a standard introductory course, this course covers all of Part I (as needed) and Part II. Optionally, the course next provides a brief introduction to panel data (Sections 10.1 and 10.2) and takes up instrumental variables regression (Chapter 12, or just Sections 12.1 and 12.2). The course then proceeds to Part IV, covering forecasting (Chapter 14) and estimation of dynamic causal effects (Chapter 15). If time permits, the course can include some advanced topics in time series analysis such as volatility clustering and conditional heteroskedasticity (Section 16.5). Prerequisites: Algebra II and introductory statistics.

applied time Series analysis and Forecasting This book also can be used for a short course on applied time series and forecast- ing, for which a course on regression analysis is a prerequisite. Some time is spent reviewing the tools of basic regression analysis in Part II, depending on student preparation. The course then moves directly to Part IV and works through forecast- ing (Chapter 14), estimation of dynamic causal effects (Chapter 15), and advanced topics in time series analysis (Chapter 16), including vector autoregressions and conditional heteroskedasticity. An important component of this course is hands-on forecasting exercises, available to instructors on the book’s accompanying website. Prerequisites: Algebra II and basic introductory econometrics or the equivalent.

Introduction to econometric theory This book is also suitable for an advanced undergraduate course in which the students have a strong mathematical preparation or for a master’s level course in

Preface xxxix

econometrics. The course briefly reviews the theory of statistics and probability as necessary (Part I). The course introduces regression analysis using the nonmath- ematical, applications-based treatment of Part II. This introduction is followed by the theoretical development in Chapters 17 and 18 (through Section 18.5). The course then takes up regression with a limited dependent variable (Chapter 11) and maximum likelihood estimation (Appendix 11.2). Next, the course optionally turns to instrumental variables regression and generalized method of moments (Chapter 12 and Section 18.7), time series methods (Chapter 14), and the estima- tion of causal effects using time series data and generalized least squares (Chapter 15 and Section 18.6). Prerequisites: Calculus and introductory statistics. Chapter 18 assumes previous exposure to matrix algebra.

Pedagogical Features

This textbook has a variety of pedagogical features aimed at helping students understand, retain, and apply the essential ideas. Chapter introductions provide real-world grounding and motivation, as well as brief road maps highlighting the sequence of the discussion. Key terms are boldfaced and defined in context throughout each chapter, and Key Concept boxes at regular intervals recap the central ideas. General interest boxes provide interesting excursions into related topics and highlight real-world studies that use the methods or concepts being discussed in the text. A Summary concluding each chapter serves as a helpful framework for reviewing the main points of coverage. The questions in the Review the Concepts section check students’ understanding of the core content, Exercises give more intensive practice working with the concepts and techniques introduced in the chapter, and Empirical Exercises allow students to apply what they have learned to answer real-world empirical questions. At the end of the textbook, the Appendix provides statistical tables, the References section lists sources for further reading, and a Glossary conveniently defines many key terms in the book.

Supplements to accompany the textbook

The online supplements accompanying the third edition update of Introduction to Econometrics include the Instructor’s Resource Manual, Test Bank, and Power- Point® slides with text figures, tables, and Key Concepts. The Instructor’s Resource Manual includes solutions to all the end-of-chapter exercises, while the Test Bank, offered in Testgen, provides a rich supply of easily edited test problems and

xl Preface

questions of various types to meet specific course needs. These resources are avail- able for download from the Instructor’s Resource Center at www.pearsonhighered .com/stock_watson.

Companion Website

The Companion Website, found at www.pearsonhighered.com/stock_watson, provides a wide range of additional resources for students and faculty. These resources include more and more in depth empirical exercises, data sets for the empirical exercises, replication files for empirical results reported in the text, practice quizzes, answers to end-of-chapter Review the Concepts questions and Exercises, and EViews tutorials.

MyEconLab

The third edition update is accompanied by a robust MyEconLab course. The MyEconLab course includes all the Review the Concepts questions as well as some Exercises and Empirical Exercises. In addition, the enhanced eText avail- able in MyEconLab for the third edition update includes URL links from the Exercises and Empirical Exercises to questions in the MyEconLab course and to the data that accompanies them. To register for MyEconLab and to learn more, log on to www.myeconlab.com.

acknowledgments

A great many people contributed to the first edition of this book. Our biggest debts of gratitude are to our colleagues at Harvard and Princeton who used early drafts of this book in their classrooms. At Harvard’s Kennedy School of Govern- ment, Suzanne Cooper provided invaluable suggestions and detailed comments on multiple drafts. As a coteacher with one of the authors (Stock), she also helped vet much of the material in this book while it was being developed for a required course for master’s students at the Kennedy School. We are also indebted to two other Kennedy School colleagues, Alberto Abadie and Sue Dynarski, for their patient explanations of quasi-experiments and the field of program evaluation and for their detailed comments on early drafts of the text. At Princeton, Eli Tamer taught from an early draft and also provided helpful comments on the penultimate draft of the book.

www.pearsonhighered.com/stock_watson
www.pearsonhighered.com/stock_watson
www.pearsonhighered.com/stock_watson
www.myeconlab.com
Preface xli

We also owe much to many of our friends and colleagues in econometrics who spent time talking with us about the substance of this book and who collec- tively made so many helpful suggestions. Bruce Hansen (University of Wisconsin– Madison) and Bo Honore (Princeton) provided helpful feedback on very early outlines and preliminary versions of the core material in Part II. Joshua Angrist (MIT) and Guido Imbens (University of California, Berkeley) provided thought- ful suggestions about our treatment of materials on program evaluation. Our presentation of the material on time series has benefited from discussions with Yacine Ait-Sahalia (Princeton), Graham Elliott (University of California, San Diego), Andrew Harvey (Cambridge University), and Christopher Sims (Princeton). Finally, many people made helpful suggestions on parts of the manuscript close to their area of expertise: Don Andrews (Yale), John Bound (University of Michigan), Gregory Chow (Princeton), Thomas Downes (Tufts), David Drukker (StataCorp.), Jean Baldwin Grossman (Princeton), Eric Hanushek (Hoover Institution), James Heckman (University of Chicago), Han Hong (Princeton), Caroline Hoxby (Harvard), Alan Krueger (Princeton), Steven Levitt (University of Chicago), Richard Light (Harvard), David Neumark (Michigan State University), Joseph Newhouse (Harvard), Pierre Perron (Boston University), Kenneth Warner (University of Michigan), and Richard Zeckhauser (Harvard).

Many people were very generous in providing us with data. The Califor- nia test score data were constructed with the assistance of Les Axelrod of the Standards and Assessments Division, California Department of Education. We are grateful to Charlie DePascale, Student Assessment Services, Massachusetts Department of Education, for his help with aspects of the Massachusetts test score data set. Christopher Ruhm (University of North Carolina, Greensboro) graciously provided us with his data set on drunk driving laws and traffic fatali- ties. The research department at the Federal Reserve Bank of Boston deserves thanks for putting together its data on racial discrimination in mortgage lending; we particularly thank Geoffrey Tootell for providing us with the updated version of the data set we use in Chapter 9 and Lynn Browne for explaining its policy context. We thank Jonathan Gruber (MIT) for sharing his data on cigarette sales, which we analyze in Chapter 12, and Alan Krueger (Princeton) for his help with the Tennessee STAR data that we analyze in Chapter 13.

We thank several people for carefully checking the page proof for errors. Kerry Griffin and Yair Listokin read the entire manuscript, and Andrew Fraker, Ori Heffetz, Amber Henry, Hong Li, Alessandro Tarozzi, and Matt Watson worked through several chapters.

In the first edition, we benefited from the help of an exceptional development editor, Jane Tufts, whose creativity, hard work, and attention to detail improved

xlii Preface

the book in many ways, large and small. Pearson provided us with first-rate sup- port, starting with our excellent editor, Sylvia Mallory, and extending through the entire publishing team. Jane and Sylvia patiently taught us a lot about writing, organization, and presentation, and their efforts are evident on every page of this book. We extend our thanks to the superb Pearson team, who worked with us on the second edition: Adrienne D’Ambrosio (senior acquisitions editor), Bridget Page (associate media producer), Charles Spaulding (senior designer), Nancy Fenton (managing editor) and her selection of Nancy Freihofer and Thompson Steele Inc. who handled the entire production process, Heather McNally (sup- plements coordinator), and Denise Clinton (editor-in-chief). Finally, we had the benefit of Kay Ueno’s skilled editing in the second edition. We are also grate- ful to the excellent third edition Pearson team of Adrienne D’Ambrosio, Nancy Fenton, and Jill Kolongowski, as well as Mary Sanger, the project manager with Nesbitt Graphics. We also wish to thank the Pearson team who worked on the third edition update: Christina Masturzo, Carolyn Philips, Liz Napolitano, and Heidi Allgair, project manager with Cenveo® Publisher Services.

We also received a great deal of help and suggestions from faculty, students, and researchers as we prepared the third edition and its update. The changes made in the third edition incorporate or reflect suggestions, corrections, com- ments, data, and help provided by a number of researchers and instructors: Don- ald Andrews (Yale University), Jushan Bai (Columbia), James Cobbe (Florida State University), Susan Dynarski (University of Michigan), Nicole Eichelberger (Texas Tech University), Boyd Fjeldsted (University of Utah), Martina Grunow, Daniel Hamermesh (University of Texas–Austin), Keisuke Hirano (University of Arizona), Bo Honore (Princeton University), Guido Imbens (Harvard Uni- versity), Manfred Keil (Claremont McKenna College), David Laibson (Harvard University), David Lee (Princeton University), Brigitte Madrian (Harvard Uni- versity), Jorge Marquez (University of Maryland), Karen Bennett Mathis (Flor- ida Department of Citrus), Alan Mehlenbacher (University of Victoria), Ulrich Müller (Princeton University), Serena Ng (Columbia University), Harry Patrinos (World Bank), Zhuan Pei (Brandeis University), Peter Summers (Texas Tech University), Andrey Vasnov (University of Sydney), and Douglas Young (Mon- tana State University). We also benefited from student input from F. Hoces dela Guardia and Carrie Wilson.

Thoughtful reviews for the third edition were prepared for Addison-Wesley by Steve DeLoach (Elon University), Jeffrey DeSimone (University of Texas at Arlington), Gary V. Engelhardt (Syracuse University), Luca Flabbi (Georgetown University), Steffen Habermalz (Northwestern University), Carolyn J. Heinrich (University of Wisconsin–Madison), Emma M. Iglesias-Vazquez (Michigan State

Preface xliii

University), Carlos Lamarche (University of Oklahoma), Vicki A. McCracken (Washington State University), Claudiney M. Pereira (Tulane University), and John T. Warner (Clemson University). We also received very helpful input on draft revisions of Chapters 7 and 10 from John Berdell (DePaul University), Janet Kohlhase (University of Houston), Aprajit Mahajan (Stanford University), Xia Meng (Brandeis University), and Chan Shen (Georgetown University).

Above all, we are indebted to our families for their endurance throughout this project. Writing this book took a long time, and for them, the project must have seemed endless. They, more than anyone else, bore the burden of this commit- ment, and for their help and support we are deeply grateful.

Introduction to Econometrics

Ask a half dozen econometricians what econometrics is, and you could get a half dozen different answers. One might tell you that econometrics is the science of testing economic theories. A second might tell you that econometrics is the set of tools used for forecasting future values of economic variables, such as a firm’s sales, the overall growth of the economy, or stock prices. Another might say that econo- metrics is the process of fitting mathematical economic models to real-world data. A fourth might tell you that it is the science and art of using historical data to make numerical, or quantitative, policy recommendations in government and business.

In fact, all these answers are right. At a broad level, econometrics is the science and art of using economic theory and statistical techniques to analyze economic data. Econometric methods are used in many branches of economics, including finance, labor economics, macroeconomics, microeconomics, marketing, and eco- nomic policy. Econometric methods are also commonly used in other social sci- ences, including political science and sociology.

This book introduces you to the core set of methods used by econometricians. We will use these methods to answer a variety of specific, quantitative questions from the worlds of business and government policy. This chapter poses four of those questions and discusses, in general terms, the econometric approach to answering them. The chapter concludes with a survey of the main types of data available to econometricians for answering these and other quantitative economic questions.

1.1 Economic Questions We Examine Many decisions in economics, business, and government hinge on understanding relationships among variables in the world around us. These decisions require quantitative answers to quantitative questions.

This book examines several quantitative questions taken from current issues in economics. Four of these questions concern education policy, racial bias in mortgage lending, cigarette consumption, and macroeconomic forecasting.

C h a p t e r

1 Economic Questions and Data

Question #1: Does Reducing Class Size Improve Elementary School Education? Proposals for reform of the U.S. public education system generate heated debate. Many of the proposals concern the youngest students, those in elementary schools. Elementary school education has various objectives, such as developing social skills, but for many parents and educators, the most important objective is basic academic learning: reading, writing, and basic mathematics. One prominent pro- posal for improving basic learning is to reduce class sizes at elementary schools. With fewer students in the classroom, the argument goes, each student gets more of the teacher’s attention, there are fewer class disruptions, learning is enhanced, and grades improve.

But what, precisely, is the effect on elementary school education of reducing class size? Reducing class size costs money: It requires hiring more teachers and, if the school is already at capacity, building more classrooms. A decision maker contemplating hiring more teachers must weigh these costs against the benefits. To weigh costs and benefits, however, the decision maker must have a precise quantitative understanding of the likely benefits. Is the beneficial effect on basic learning of smaller classes large or small? Is it possible that smaller class size actu- ally has no effect on basic learning?

Although common sense and everyday experience may suggest that more learning occurs when there are fewer students, common sense cannot provide a quantitative answer to the question of what exactly is the effect on basic learning of reducing class size. To provide such an answer, we must examine empirical evidence—that is, evidence based on data—relating class size to basic learning in elementary schools.

In this book, we examine the relationship between class size and basic learn- ing, using data gathered from 420 California school districts in 1999. In the Cali- fornia data, students in districts with small class sizes tend to perform better on standardized tests than students in districts with larger classes. While this fact is consistent with the idea that smaller classes produce better test scores, it might simply reflect many other advantages that students in districts with small classes have over their counterparts in districts with large classes. For example, districts with small class sizes tend to have wealthier residents than districts with large classes, so students in small-class districts could have more opportunities for learning outside the classroom. It could be these extra learning opportunities that lead to higher test scores, not smaller class sizes. In Part II, we use multiple regres- sion analysis to isolate the effect of changes in class size from changes in other factors, such as the economic background of the students.

2 ChaptEr 1 Economic Questions and Data

1.1 Economic Questions We Examine 3

Question #2: Is There Racial Discrimination in the Market for Home Loans? Most people buy their homes with the help of a mortgage, a large loan secured by the value of the home. By law, U.S. lending institutions cannot take race into account when deciding to grant or deny a request for a mortgage: Applicants who are identical in all ways except their race should be equally likely to have their mortgage applications approved. In theory, then, there should be no racial bias in mortgage lending.

In contrast to this theoretical conclusion, researchers at the Federal Reserve Bank of Boston found (using data from the early 1990s) that 28% of black appli- cants are denied mortgages, while only 9% of white applicants are denied. Do these data indicate that, in practice, there is racial bias in mortgage lending? If so, how large is it?

The fact that more black than white applicants are denied in the Boston Fed data does not by itself provide evidence of discrimination by mortgage lenders because the black and white applicants differ in many ways other than their race. Before concluding that there is bias in the mortgage market, these data must be examined more closely to see if there is a difference in the probability of being denied for otherwise identical applicants and, if so, whether this difference is large or small. To do so, in Chapter 11 we introduce econometric methods that make it possible to quantify the effect of race on the chance of obtaining a mort- gage, holding constant other applicant characteristics, notably their ability to repay the loan.

Question #3: How Much Do Cigarette Taxes Reduce Smoking? Cigarette smoking is a major public health concern worldwide. Many of the costs of smoking, such as the medical expenses of caring for those made sick by smoking and the less quantifiable costs to nonsmokers who prefer not to breathe secondhand cigarette smoke, are borne by other members of society. Because these costs are borne by people other than the smoker, there is a role for government intervention in reducing cigarette consumption. One of the most flexible tools for cutting consumption is to increase taxes on cigarettes.

Basic economics says that if cigarette prices go up, consumption will go down. But by how much? If the sales price goes up by 1%, by what percentage will the quantity of cigarettes sold decrease? The percentage change in the quantity demanded resulting from a 1% increase in price is the price elasticity of demand.

4 ChaptEr 1 Economic Questions and Data

If we want to reduce smoking by a certain amount, say 20%, by raising taxes, then we need to know the price elasticity of demand to calculate the price increase necessary to achieve this reduction in consumption. But what is the price elasticity of demand for cigarettes?

Although economic theory provides us with the concepts that help us answer this question, it does not tell us the numerical value of the price elasticity of demand. To learn the elasticity, we must examine empirical evidence about the behavior of smokers and potential smokers; in other words, we need to analyze data on cigarette consumption and prices.

The data we examine are cigarette sales, prices, taxes, and personal income for U.S. states in the 1980s and 1990s. In these data, states with low taxes, and thus low cigarette prices, have high smoking rates, and states with high prices have low smoking rates. However, the analysis of these data is complicated because causal- ity runs both ways: Low taxes lead to high demand, but if there are many smokers in the state, then local politicians might try to keep cigarette taxes low to satisfy their smoking constituents. In Chapter 12, we study methods for handling this “simultaneous causality” and use those methods to estimate the price elasticity of cigarette demand.

Question #4: By How Much Will U.S. GDP Grow Next Year? It seems that people always want a sneak preview of the future. What will sales be next year at a firm that is considering investing in new equipment? Will the stock market go up next month, and, if it does, by how much? Will city tax receipts next year cover planned expenditures on city services? Will your microeconomics exam next week focus on externalities or monopolies? Will Saturday be a nice day to go to the beach?

One aspect of the future in which macroeconomists are particularly interested is the growth of real economic activity, as measured by real gross domestic product (GDP), during the next year. A management consulting firm might advise a man- ufacturing client to expand its capacity based on an upbeat forecast of economic growth. Economists at the Federal Reserve Board in Washington, D.C., are man- dated to set policy to keep real GDP near its potential in order to maximize employment. If they forecast anemic GDP growth over the next year, they might expand liquidity in the economy by reducing interest rates or other measures, in an attempt to boost economic activity.

Professional economists who rely on precise numerical forecasts use econo- metric models to make those forecasts. A forecaster’s job is to predict the future

1.2 Causal Effects and Idealized Experiments 5

by using the past, and econometricians do this by using economic theory and statistical techniques to quantify relationships in historical data.

The data we use to forecast the growth rate of GDP are past values of GDP and the “term spread” in the United States. The term spread is the difference between long-term and short-term interest rates. It measures, among other things, whether investors expect short-term interest rates to rise or fall in the future. The term spread is usually positive, but it tends to fall sharply before the onset of a recession. One of the GDP growth rate forecasts we develop and evaluate in Chapter 14 is based on the term spread.

Quantitative Questions, Quantitative Answers Each of these four questions requires a numerical answer. Economic theory pro- vides clues about that answer—for example, cigarette consumption ought to go down when the price goes up—but the actual value of the number must be learned empirically, that is, by analyzing data. Because we use data to answer quantitative questions, our answers always have some uncertainty: A different set of data would produce a different numerical answer. Therefore, the conceptual frame- work for the analysis needs to provide both a numerical answer to the question and a measure of how precise the answer is.

The conceptual framework used in this book is the multiple regression model, the mainstay of econometrics. This model, introduced in Part II, provides a math- ematical way to quantify how a change in one variable affects another variable, holding other things constant. For example, what effect does a change in class size have on test scores, holding constant or controlling for student characteristics (such as family income) that a school district administrator cannot control? What effect does your race have on your chances of having a mortgage application granted, holding constant other factors such as your ability to repay the loan? What effect does a 1% increase in the price of cigarettes have on cigarette consumption, hold- ing constant the income of smokers and potential smokers? The multiple regres- sion model and its extensions provide a framework for answering these questions using data and for quantifying the uncertainty associated with those answers.

1.2 Causal Effects and Idealized Experiments Like many other questions encountered in econometrics, the first three questions in Section 1.1 concern causal relationships among variables. In common usage, an action is said to cause an outcome if the outcome is the direct result, or consequence,

6 ChaptEr 1 Economic Questions and Data

of that action. Touching a hot stove causes you to get burned; drinking water causes you to be less thirsty; putting air in your tires causes them to inflate; putting fertilizer on your tomato plants causes them to produce more tomatoes. Causality means that a specific action (applying fertilizer) leads to a specific, measurable consequence (more tomatoes).

Estimation of Causal Effects How best might we measure the causal effect on tomato yield (measured in kilo- grams) of applying a certain amount of fertilizer, say 100 grams of fertilizer per square meter?

One way to measure this causal effect is to conduct an experiment. In that experiment, a horticultural researcher plants many plots of tomatoes. Each plot is tended identically, with one exception: Some plots get 100 grams of fertilizer per square meter, while the rest get none. Moreover, whether a plot is fertilized or not is determined randomly by a computer, ensuring that any other differences between the plots are unrelated to whether they receive fertilizer. At the end of the growing season, the horticulturalist weighs the harvest from each plot. The difference between the average yield per square meter of the treated and untreated plots is the effect on tomato production of the fertilizer treatment.

This is an example of a randomized controlled experiment. It is controlled in the sense that there are both a control group that receives no treatment (no fertil- izer) and a treatment group that receives the treatment (100 g/m2 of fertilizer). It is randomized in the sense that the treatment is assigned randomly. This random assignment eliminates the possibility of a systematic relationship between, for example, how sunny the plot is and whether it receives fertilizer so that the only systematic difference between the treatment and control groups is the treatment. If this experiment is properly implemented on a large enough scale, then it will yield an estimate of the causal effect on the outcome of interest (tomato produc- tion) of the treatment (applying 100 g/m2 of fertilizer).

In this book, the causal effect is defined to be the effect on an outcome of a given action or treatment, as measured in an ideal randomized controlled experi- ment. In such an experiment, the only systematic reason for differences in out- comes between the treatment and control groups is the treatment itself.

It is possible to imagine an ideal randomized controlled experiment to answer each of the first three questions in Section 1.1. For example, to study class size, one can imagine randomly assigning “treatments” of different class sizes to differ- ent groups of students. If the experiment is designed and executed so that the only systematic difference between the groups of students is their class size, then in

1.3 Data: Sources and Types 7

theory this experiment would estimate the effect on test scores of reducing class size, holding all else constant.

The concept of an ideal randomized controlled experiment is useful because it gives a definition of a causal effect. In practice, however, it is not possible to perform ideal experiments. In fact, experiments are relatively rare in economet- rics because often they are unethical, impossible to execute satisfactorily, or pro- hibitively expensive. The concept of the ideal randomized controlled experiment does, however, provide a theoretical benchmark for an econometric analysis of causal effects using actual data.

Forecasting and Causality Although the first three questions in Section 1.1 concern causal effects, the fourth—forecasting the growth rate of GDP—does not. You do not need to know a causal relationship to make a good forecast. A good way to “forecast” whether it is raining is to observe whether pedestrians are using umbrellas, but the act of using an umbrella does not cause it to rain.

Even though forecasting need not involve causal relationships, economic theory suggests patterns and relationships that might be useful for forecasting. As we see in Chapter 14, multiple regression analysis allows us to quantify historical relationships suggested by economic theory, to check whether those relationships have been stable over time, to make quantitative forecasts about the future, and to assess the accuracy of those forecasts.

1.3 Data: Sources and Types In econometrics, data come from one of two sources: experiments or nonexperi- mental observations of the world. This book examines both experimental and nonexperimental data sets.

Experimental Versus Observational Data Experimental data come from experiments designed to evaluate a treatment or policy or to investigate a causal effect. For example, the state of Tennessee financed a large randomized controlled experiment examining class size in the 1980s. In that experiment, which we examine in Chapter 13, thousands of students were randomly assigned to classes of different sizes for several years and were given standardized tests annually.

8 ChaptEr 1 Economic Questions and Data

The Tennessee class size experiment cost millions of dollars and required the ongoing cooperation of many administrators, parents, and teachers over several years. Because real-world experiments with human subjects are difficult to admin- ister and to control, they have flaws relative to ideal randomized controlled exper- iments. Moreover, in some circumstances, experiments are not only expensive and difficult to administer but also unethical. (Would it be ethical to offer randomly selected teenagers inexpensive cigarettes to see how many they buy?) Because of these financial, practical, and ethical problems, experiments in economics are relatively rare. Instead, most economic data are obtained by observing real-world behavior.

Data obtained by observing actual behavior outside an experimental setting are called observational data. Observational data are collected using surveys, such as telephone surveys of consumers, and administrative records, such as historical records on mortgage applications maintained by lending institutions.

Observational data pose major challenges to econometric attempts to esti- mate causal effects, and the tools of econometrics are designed to tackle these challenges. In the real world, levels of “treatment” (the amount of fertilizer in the tomato example, the student–teacher ratio in the class size example) are not assigned at random, so it is difficult to sort out the effect of the “treatment” from other relevant factors. Much of econometrics, and much of this book, is devoted to methods for meeting the challenges encountered when real-world data are used to estimate causal effects.

Whether the data are experimental or observational, data sets come in three main types: cross-sectional data, time series data, and panel data. In this book, you will encounter all three types.

Cross-Sectional Data Data on different entities—workers, consumers, firms, governmental units, and so forth—for a single time period are called cross-sectional data. For example, the data on test scores in California school districts are cross sectional. Those data are for 420 entities (school districts) for a single time period (1999). In general, the number of entities on which we have observations is denoted n; so, for example, in the California data set, n = 420.

The California test score data set contains measurements of several different variables for each district. Some of these data are tabulated in Table 1.1. Each row lists data for a different district. For example, the average test score for the first district (“district #1”) is 690.8; this is the average of the math and science test scores for all fifth graders in that district in 1999 on a standardized test (the Stanford

1.3 Data: Sources and Types 9

taBLe 1.1 Selected Observations on test Scores and Other Variables for California School Districts in 1999

Observation (District)

Number

District average

test Score (fifth grade)

Student–teacher

ratio

expenditure per

pupil ($)

percentage of Students

Learning english

1 690.8 17.89 $6385 0.0%

2 661.2 21.52 5099 4.6

3 643.6 18.70 5502 30.0

4 647.7 17.36 7102 0.0

5 640.8 18.67 5236 13.9

. . . . .

418 645.0 21.89 4403 24.3

419 672.2 20.20 4776 3.0

420 655.8 19.04 5993 5.0

Note: The California test score data set is described in Appendix 4.1.

Achievement Test). The average student–teacher ratio in that district is 17.89; that is, the number of students in district #1 divided by the number of classroom teachers in district #1 is 17.89. Average expenditure per pupil in district #1 is $6385. The percentage of students in that district still learning English—that is, the percentage of students for whom English is a second language and who are not yet proficient in English—is 0%.

The remaining rows present data for other districts. The order of the rows is arbitrary, and the number of the district, which is called the observation number, is an arbitrarily assigned number that organizes the data. As you can see in the table, all the variables listed vary considerably.

With cross-sectional data, we can learn about relationships among variables by studying differences across people, firms, or other economic entities during a single time period.

Time Series Data Time series data are data for a single entity (person, firm, country) collected at multiple time periods. Our data set on the growth rate of GDP and the term spread in the United States is an example of a time series data set. The data set

10 ChaptEr 1 Economic Questions and Data

taBLe 1.2 Selected Observations on the Growth rate of GDp and the term Spread in the United States: Quarterly Data, 1960:Q1–2013:Q1

Observation

Number

Date

(year:quarter)

GDp Growth rate

(% at an annual rate)

term Spread

(% per year)

1 1960:Q1 8.8% 0.6%

2 1960:Q2 −1.5 1.3

3 1960:Q3 1.0 1.5

4 1960:Q4 −4.9 1.6

5 1961:Q1 2.7 1.4

. . . .

211 2012:Q3 2.7 1.5

212 2012:Q4 0.1 1.6

213 2013:Q1 1.1 1.9

Note: The United States GDP and term spread data set is described in Appendix 14.1.

contains observations on two variables (the growth rate of GDP and the term spread) for a single entity (the United States) for 213 time periods. Each time period in this data set is a quarter of a year (the first quarter is January, Febru- ary, and March; the second quarter is April, May, and June; and so forth). The observations in this data set begin in the first quarter of 1960, which is denoted 1960:Q1, and end in the first quarter of 2013 (2013:Q1). The number of observa- tions (that is, time periods) in a time series data set is denoted T. Because there are 213 quarters from 1960:Q1 to 2013:Q1, this data set contains T = 213 observations.

Some observations in this data set are listed in Table 1.2. The data in each row correspond to a different time period (year and quarter). In the first quarter of 1960, for example, GDP grew 8.8% at an annual rate. In other words, if GDP had continued growing for four quarters at its rate during the first quarter of 1960, the level of GDP would have increased by 8.8%. In the first quarter of 1960, the long-term interest rate was 4.5%, the short-term interest rate was 3.9%, so their difference, the term spread, was 0.6%.

By tracking a single entity over time, time series data can be used to study the evolution of variables over time and to forecast future values of those variables.

Homework is Completed By:

Writer	Writer Name	Amount	Client Comments & Rating
ONLINE	Instant Homework Helper 4.8 4305 Orders Completed	$36	She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up! 5.00
Answer.docx Turnitin Report.pdf Contact Writer For Solution Contact Writer For Solution

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 3 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 6 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 12 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

6 writers have sent their proposals to do this homework:

Writer	Writer Name	Offer	Chat
ONLINE	Supreme Essay Writer I have read your project description carefully and you will get plagiarism free writing according to your requirements. Thank You 4.8 1890 Orders Completed	$24	Chat With Writer
ONLINE	Quality Homework Helper I have worked on wide variety of research papers including; Analytical research paper, Argumentative research paper, Interpretative research, experimental research etc. 4.8 1449 Orders Completed	$15	Chat With Writer
ONLINE	Calculation Master I will provide you with the well organized and well research papers from different primary and secondary sources will write the content that will support your points. 0 Orders Completed	$45	Chat With Writer
ONLINE	Quality Assignments I have assisted scholars, business persons, startups, entrepreneurs, marketers, managers etc in their, pitches, presentations, market research, business plans etc. 0 Orders Completed	$16	Chat With Writer
ONLINE	Maths Master I have read your project details and I can provide you QUALITY WORK within your given timeline and budget. 4.8 1386 Orders Completed	$33	Chat With Writer
ONLINE	Top Writing Guru Being a Ph.D. in the Business field, I have been doing academic writing for the past 7 years and have a good command over writing research papers, essay, dissertations and all kinds of academic writing and proofreading. 4.7 1680 Orders Completed	$39	Chat With Writer