ECON 2311Q: Econometrics I Instructor: Rui Sun Due: 10/30/2019
Homework Assignment 2
1. The data file Growth contains data on average growth rates from 1960 through 1995 for 65 countries, along with variables that are potentially related to growth. In this exercise, you will investigate the relationship between growth and trade.
(a) Construct a scatterplot of average annual growth rate (Growth) on the average trade share (TradeShare). Does there appear to be a relationship between the variables?
(b) One country, Malta, has a trade share much larger than the other countries. Find Malta on the scat- terplot. Does Malta look like an outlier?
(c) Using all observations, run a regression of Growth on TradeShare. What is the estimated slope? What is the estimated intercept? Use the regression to predict the growth rate for a country with a trade share of 0.5 and with a trade share equal to 1.0.
(d) Estimate the same regression, excluding the data from Malta. Answer the same questions in (c). (e) Plot the estimated regression functions from (c) and (d). Using the scatterplot in (a), explain why the
regression function that includes Malta is steeper than the regression function that excludes Malta. (f) Where is Malta? Why is the Malta trade share so large? Should Malta be included or excluded from
the analysis?
2. The data file Earnings_and_Height contains data on earnings, height, and other characteristics of a random sample of U.S. workers. In this exercise, you will investigate the relationship between earnings and height.
(a) What is the median value of height in the sample? (b) i. Estimate average earnings for workers whose height is at most 67 inches.
ii. Estimate average earnings for workers whose height is greater than 67 inches. iii. On average, do taller workers earn more than shorter workers? How much more? What is a 95%
confidence interval for the difference in average earnings? (c) Construct a scatterplot of annual earnings (Earnings) on height (Height). Notice that the points on the
plot fall along horizontal lines. (There are only 23 distinct values of Earnings). Why? (Hint: Carefully read the detailed data description.)
(d) Run a regression of Earnings on Height. i. What is the estimated slope? ii. Use the estimated regression to predict earnings for a worker who is 67 inches tall, for a worker who
is 70 inches tall, and for a worker who is 65 inches tall. (e) Suppose height were measured in centimeters instead of inches. Answer the following questions about
the Earnings on Height (in cm) regression. i. What is the estimated slope of the regression? ii. What is the estimated intercept? iii. What is the R2? iv. What is the standard error of the regression?
(f) Run a regression of Earnings on Height, using data for female workers only. i. What is the estimated slope? ii. A randomly selected woman is 1 inch taller than the average woman in the sample. Would you
predict her earnings to be higher or lower than the average earnings for women in the sample? By how much?
1
ECON 2311Q: Econometrics I Instructor: Rui Sun Due: 10/30/2019
(g) Repeat (f) for male workers.
(h) Do you think that height is uncorrelated with other factors that cause earning? That is, do you think that the regression error term, say ui, has a conditional mean of zero, given Height (xi)?
3. The data file Birthweight_Smoking, which contains data for a random sample of babies born in Pennsyl- vania in 1989. The data include the baby’s birth weight together with various characteristics of the mother, including whether she smoked during the pregnancy. In this exercise you will investigate the relationship between birth weight and smoking during pregnancy.
(a) In the sample:
i. What is the average value of Birthweight for all mothers? ii. For mothers who smoke? iii. For mothers who do not smoke?
(b) i. Use the data in the sample to estimate the difference in average birth weight for smoking and nonsmoking mothers.
ii. What is the standard error for the estimated difference in (i)? iii. Construct a 95% confidence interval for the difference in the average birth weight for smoking and
nonsmoking mothers.
(c) Run a regression of Birthweight on the binary variable Smoker.
i. Explain how the estimated slope and intercept are related to your answers in parts (a) and (b). ii. Explain how the s.e.(β̂1) is related to your answer in b(ii). iii. Construct a 95% confidence interval for the effect of smoking on birth weight.
(d) Do you think smoking is uncorrelated with other factors that cause low birth weight? That is, do you think that the regression error term, say ui, has a conditional mean of zero, given Smoking (xi)?
Instruction
(1) Please hand in a hardcopy of your .log file, which contains your STATA codes and your answers to the questions, before the class on 10/30/2019. Please make sure you number each question in your .do file.
(2) You can either include your answers to the questions in your .do file as comments (everything you include in .do file will be included in .log file as well) or answer them separately. If your answers are in your .log file, you only need to print out the .log file otherwise you need to submit both .log file and your answers to the questions.
(3) For creating .log file, you need to specify the command of creating .log file at the beginning of your .do file, for example “log using hw2_FirstLastName.log, replace” and use “log close” at the end of your .do file.
2
ECON 2311Q: Econometrics I Instructor: Rui Sun Due: 10/30/2019
Documentation for Growth Data
Growth contains data on average growth rates over 1960-1995 for 65 countries, along with variables that are po- tentially related to growth. These data were provided by Professor Ross Levine of Brown University and were used in his paper with, Thorsten Beck and Norman Loayza “Finance and the Sources of Growth” Journal of Financial Economics, 2000, Vol. 58, pp. 261- 300.
Variable Definitions
Variable Description Country_name Name of country growth Average annual percentage growth of real
Gross Domestic Product (GDP) from 1960 to 1995. rgdp60 The value of GDP per capita in 1960, converted to 1960 US dollars. tradehare The average share of trade in the economy from 1960 to 1995, measured as the
sum of exports plus imports, divided by GDP; that is, the average value of (X +M)/GDP from 1960 to 1995, where X = exports and M = imports (both X and M are positive).
yearsshcool Average number of years of schooling of adult residents in that country in 1960 rev_coups Average annual number of revolutions, insurrections (successful or not)
and coupd’etats in that country from 1960 to 1995 assasinations Average annual number of political assassinations in that country from
1960 to 1995 (per million population) oil = 1 if oil accounted for at least half of exports in 1960; = 0 otherwise
3
ECON 2311Q: Econometrics I Instructor: Rui Sun Due: 10/30/2019
Documentation for Earnings_and_Height
These data are taken from the US National Health Interview Survey for 1994. They are a subset of the data used in Anne Case and Christina Paxson’s paper “Stature and Status: Height, Ability, and Labor Market Outcomes,” Journal of Political Economy, 2008, 116(3): 499-532, and were graciously supplied by the authors for empirical exercises in the Stock-Watson textbook.
The dataset contains information on 17,870 workers. The following table describes the variables.
Variable Description age Age, in years cworker Class of Worker:
1 = Private company Employee 2 = Federal Government Employee 3 = State Government Employee 4 = Local Government Employee 5 = Incorporated Business Employee 6 = Self Employed
earnings annual labor earnings, expressed in $20121 educ years of education height height without shoes (in inches) mrd Marital Status
1 = Married, Spouse in household 2 = Married, Spouse not in household 3 = Widowed 4 = Divorced 5 = Separated 6 = Never Married
occupation Occupations in 15 categories: 1 = Exec/Manager 2 = Professionals 3 = Technicians 4 = Sales 5 = Administrat 6 = Household service 7 = Protective service 8 = Other Service 9 = Farming 10 = Mechanics 11 = Construction/Mining 12 = Precision production 13 = Machine Operator 14 = Transport 15 = Laborer
race race/ethnicity 1 = non-Hispanic white 2 = non-Hispanic black 3 = Hispanic 4 = other
region Region of the U.S. 1 = Northeast 2 = Midwest 3 = South 4 = West
sex Sex, 1=Male, 0 = Female weight weight without shoes (in pounds)
1In the survey, labor earnings are reported in 23 brackets (for example, $26,000-$30,00). For each of these brackets Professors Case and Paxson estimated a value of average earnings based on information in the Current Population, and these average values were assigned to all workers with incomes in the corresponding bracket. The earnings values for 1994 were converted to $2012 using the consumer price index.
4
ECON 2311Q: Econometrics I Instructor: Rui Sun Due: 10/30/2019
Documentation for Birthweight_Smoking
The datafile Birthweight_Smoking is from the 1989 linked National Natality-Mortality Detail files, which contains a census of infant births and deaths. The data in bw_smoking.data are for births in Pennsylvania in 1989.
These data were provided by Porfessor Douglas Almond, Kenneth Chay, and David Lee and are a subset of the data used in their paper “The Costs of Low Birth Weight,” Quarterly Journal of Economics, August 2005, 120(3): 1031-1083. The file contains 3,000 observations on the variables described below
Variable Description birthweight birth weight of infant (in grams) smoker indicator equal to one if the mother smoked during pregnancy and zero, otherwise. age age educ years of educational attainment (more than 16 years coded as 17) unmarried indicator =1 if mother is unmarried alcohol indicator=1 if mother drank alcohol during pregnancy drinks number of drinks per week tripre1 indicator=1 if 1st prenatal care visit in 1st trimester tripre2 indicator=1 if 1st prenatal care visit in 2nd trimester tripre3 indicator=1 if 1st prenatal care visit in 2nd trimester tripre0 indicator=1 if no prenatal visits nprevist total number of prenatal visits
5