ECON 140A 2020 Summer Exam 3 Note: Please show your work. You will lose points if your work/reasoning is incomplete. Problem 1 Suppose your colleague works for an NGO which is considering a program to provide remote supplemental tutoring to students in rural areas. To estimate the potential effectiveness of the program, your colleague plans to use information from a prior survey which includes student test scores and whether they had a tutor (this survey was conducted prior to any NGO intervention). He decides to use the model below to assess the impact of tutoring on test scores: Yi = α0 + α1 Di , where Yi denotes student i’s test score and Di denotes a binary variable representing whether the a student had a tutor. For convenience, denote yi (1) as student i’s potential test score had they had a tutor and yi (0) as student i’s potential test score had they not had a tutor. Also, denote n1 as the number of students that had a tutor and n0 as the number of students did not have a tutor. d (α̂1 ) = 1.32. He estimates the model and finds that α̂1 = 4.53, and Var (a) (2 points) Assume the requirements for CLT are satisfied. Test the hypotheses H : α = 0 0 1 . H1 : α1 ̸= 0 (b) (1 point) State the mathematical definition of average treatment effect on treated (ATT) using the notation provided. (c) (1 point) For student i, can we observe both yi (0) and yi (1)? Why or why not? (d) (2 points) Show α̂1 = ATT + 1 X 1 X yi (0) − yi (0). n1 i:D =1 n0 i:D =0 i i You may take the following as given: α̂1 = 1 X 1 X Yi − Yi . n1 i:D =1 n0 i:D =0 i i (e) (2 points) Can you claim that if the NGO randomly assigns a tutor to a student, the student’s score will increase by 4.53 in expectation? Why or why not? Explain with one example. (f) (3 points) Give one possible omitted variable which could result in α̂1 being overestimated related to the true causal effect. Explain your answer in words and justify your answer using the formula from (d). (g) (3 points) Give one possible omitted variable which could result in α̂1 being underestimated related to the true causal effect. Explain your answer in words and justify your answer using the formula from (d). 1 Instead of using survey data, your colleague and the NGO decide to run a small experiment. He randomly selects n students from two schools. Half of them (treatment group) are randomly assigned to receive a 1 hour private tutoring session every week, while the other half (control group) receives nothing. To assess the effectiveness of remote tutoring, he uses a similar regression model: Yi = β0 + β1 Di , where Yi denotes student i’s test score and Di denotes the binary variable represents whether the a student is assigned to the treated group, which receives remote tutoring. (g) (2 points) Describe the “attribution problem”. Is the attribution problem likely to pose a threat to the results of this experiment? You may argue with the students’ schools as an example. (h) (2 points) Why can we use the OLS regression coefficient β1 to identify ATT? What problem does a randomized experiment solve? Use the result of part (d) to explain your answer. Problem 2 Suppose you are conducting an experiment to study the effect of media on people’s political preferences. You randomly assign people into two groups: group 1 receives a pro-Democrat newspaper everyday, and group 2 receives a pro-Republican newspaper everyday. Before the experiment, you survey each subject to determine which party, Democrat or Republican, they prefer (they are asked to choose from one of them). The initial preference is denoted as Xi , where Xi = 0 if i prefers Republican and Xi = 1 if i prefers Democrat. After 4 weeks, you survey all the subjects again and ask them to state the final preferences, Yi , where Yi = 0 if i prefers Republican and Yi = 1 if i prefers Democrat. The final preferences (Yi ) is summarized by treatment groups in the table below. Treatment Group Preference Group 1 Group 2 Prefer Dem. 130 60 Prefer Rep. 70 140 You plan to use the following regression model to estimate the treatment effect of newspaper. Yi = β0 + β1 Di , where Di is a binary variable that denotes whether subject i is assigned in group 1. d β̂1 . (a) (2 points) Find β̂1 and Var (b) (2 points) Assume the requirements for CLT are satisfied. Test the hypotheses H : β = 0 0 1 . H1 : β1 ̸= 0 (c) (2 points) Interpret β̂1 . Carefully explain what the result you find represents. 2 (d) (2 points) One researcher sees your study and argues that your result might not be robust due to the attribution problem in the initial preferences prior to the experiment. How can you defend against this issue? Propose a possible analysis that can answer this question, and explain why this can help you defend your finding.