Note: (1) Homework should be submitted in pdf/world format generated from RMarkdown (2) Please include your answers, analysis, code, reasoning and the key steps, for instance the tables/plots produced by R. Simply writing down the solution earns 0 point (3) Plagiarism is not accepted. Any similar homework will get zero point.

Q1 [15pt]
Suppose you want to estimate the seasonal effect on the revenue. There is a constant term included in the regression as usual. How many dummies are needed to perform such analysis?

I need 3 dummies for the 4 seasons.

Q2 [20pt]
Use the data in gpa2 and GPA2_description for this exercise.

Q1_data <- read_excel(path = "gpa2.xls", sheet = 1, col_names = FALSE)

## New names: ## * `` -> `..1` ## * `` -> `..2` ## * `` -> `..3` ## * `` -> `..4` ## * `` -> `..5` ## * … and 7 more

colnames(Q1_data) <- c("sat", "tothrs", "colgpa", "athlete", "verbmath", "hsize", "hsrank", "hsperc", "female", "stargazer()ite", "black", "hsizesq")

1. Using all observations and regress colgpa on hsperc and sat.

Q1_model1 <- lm(colgpa ~ hsperc + sat, data = Q1_data) stargazer(Q1_model1, type = "text")

## ## =============================================== ## Dependent variable: ## --------------------------- ## colgpa ## ----------------------------------------------- ## hsperc -0.014*** ## (0.001) ## ## sat 0.001*** ## (0.0001) ## ## Constant 1.392*** ## (0.072) ## ## ----------------------------------------------- ## Observations 4,137 ## R2 0.273 ## Adjusted R2 0.273 ## Residual Std. Error 0.562 (df = 4134) ## F Statistic 777.917*** (df = 2; 4134) ## =============================================== ## Note: *p<0.1; **p<0.05; ***p<0.01

summary(Q1_model1)

## ## Call: ## lm(formula = colgpa ~ hsperc + sat, data = Q1_data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.6007 -0.3581 0.0329 0.3963 1.7599 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.392e+00 7.154e-02 19.45 <2e-16 *** ## hsperc -1.352e-02 5.495e-04 -24.60 <2e-16 *** ## sat 1.476e-03 6.531e-05 22.60 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.5615 on 4134 degrees of freedom ## Multiple R-squared: 0.2734, Adjusted R-squared: 0.2731 ## F-statistic: 777.9 on 2 and 4134 DF, p-value: < 2.2e-16

2. Reestimate the model using only the first 2,070 observations

Q1_model2 <- lm(colgpa ~ hsperc + sat, data = Q1_data[1:2070, ]) stargazer(Q1_model2, type = "text")

## ## =============================================== ## Dependent variable: ## --------------------------- ## colgpa ## ----------------------------------------------- ## hsperc -0.013*** ## (0.001) ## ## sat 0.001*** ## (0.0001) ## ## Constant 1.436*** ## (0.098) ## ## ----------------------------------------------- ## Observations 2,070 ## R2 0.283 ## Adjusted R2 0.282 ## Residual Std. Error 0.539 (df = 2067) ## F Statistic 407.392*** (df = 2; 2067) ## =============================================== ## Note: *p<0.1; **p<0.05; ***p<0.01

summary(Q1_model2)

## ## Call: ## lm(formula = colgpa ~ hsperc + sat, data = Q1_data[1:2070, ]) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.28027 -0.34910 0.04051 0.38046 1.69464 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.436e+00 9.778e-02 14.69 <2e-16 *** ## hsperc -1.275e-02 7.185e-04 -17.74 <2e-16 *** ## sat 1.468e-03 8.858e-05 16.58 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.5395 on 2067 degrees of freedom ## Multiple R-squared: 0.2827, Adjusted R-squared: 0.282 ## F-statistic: 407.4 on 2 and 2067 DF, p-value: < 2.2e-16

3. Find the ratio of the standard erros on hsperc from 1. and 2. what do you find? why?

summary(Q1_model1)$coefficients[2, 2]/summary(Q1_model2)$coefficients[2, 2]

## [1] 0.7647155

The standard error on hsperc from 1. is smaller since the sample size is larger.

4. Add female, verbmath and their interaction terms into the regression using all observations.

Q1_model3 <- lm(colgpa ~ hsperc + sat + female + verbmath + female * verbmath, data = Q1_data) stargazer(Q1_model3, type = "text")

## ## =============================================== ## Dependent variable: ## --------------------------- ## colgpa ## ----------------------------------------------- ## hsperc -0.013*** ## (0.001) ## ## sat 0.002*** ## (0.0001) ## ## female 0.143 ## (0.106) ## ## verbmath -0.064 ## (0.084) ## ## female:verbmath 0.011 ## (0.118) ## ## Constant 1.243*** ## (0.102) ## ## ----------------------------------------------- ## Observations 4,137 ## R2 0.285 ## Adjusted R2 0.285 ## Residual Std. Error 0.557 (df = 4131) ## F Statistic 330.074*** (df = 5; 4131) ## =============================================== ## Note: *p<0.1; **p<0.05; ***p<0.01

It appears that being a female student and the ratio of verbal and math score on the SAT as well as the interaction between these has no significant effect on GPA after fall semester. Combined SAT score and high school percentile does have a signifiacnt impact.

Q3 [20pt]
Load package ggplot2 and type data(diamonds) to load the data set. The definition of table and depth can be found in the following picture

diamond.jpg

1. A diamond’s quality can be measured by cut, ordered by Ideal, Premium, Very Good, Good, and Fair. Create dummy to represent Ideal and Premium, and to represent Very Good and Good.

dia <- diamonds dia <- mutate(dia, D1 = as.numeric(cut == "Premium" | cut == "Ideal"), D2 = as.numeric(cut == "Very Good" | cut == "Good"))

2. Regress price on carat, depth, table, and , all interactions terms between dummies and quantitative variables (carat, depth and table). Interpret your result

Q3_model1 <- lm(price ~ carat + depth + table + D1 + D2 + carat * D1 + carat * D2 + depth * D1 + depth * D2 + table * D1 + table * D2, data = dia) stargazer(Q3_model1, type = "text")

## ## ================================================== ## Dependent variable: ## ------------------------------ ## price ## -------------------------------------------------- ## carat 6,002.377*** ## (73.086) ## ## depth -86.946*** ## (12.186) ## ## table -2.707 ## (11.185) ## ## D1 738.068 ## (1,434.572) ## ## D2 5,020.093*** ## (1,426.486) ## ## carat:D1 1,997.748*** ## (75.073) ## ## carat:D2 1,854.563*** ## (77.410) ## ## depth:D1 54.620*** ## (15.176) ## ## depth:D2 -17.224 ## (14.397) ## ## table:D1 -82.709*** ## (12.073) ## ## table:D2 -80.461*** ## (12.434) ## ## Constant 3,807.443*** ## (1,258.219) ## ## -------------------------------------------------- ## Observations 53,940 ## R2 0.858 ## Adjusted R2 0.858 ## Residual Std. Error 1,501.182 (df = 53928) ## F Statistic 29,728.630*** (df = 11; 53928) ## ================================================== ## Note: *p<0.1; **p<0.05; ***p<0.01

We see no significant effect on a diamond’s price from being D1 (ideal or premium), the table size, or the interaction of being D2 with a diamond’s depth once the other characteristics and interactions are accounted for. Value of a diamond is decreasing in size of table for both D1 and D2 daimonds. For D1 diamonds, depth has a positive impact though overall we see value decreases with depth. A diamond’s carat seems to have the largest impact on value. Being D2 over fair also significantly increases value.

3. Create a random sample of size 1000 from the diamonds data. Draw the scatterplot of carat vs log(price), color coded by cut.

Q3_sample <- sample_n(diamonds, 1000) ggplot(Q3_sample, aes(x = carat, y = log(price), colour = cut)) + geom_point()

HW6_Solution_Ruozi_files/figure-docx/unnamed-chunk-9-1.png 4. List the distinct categories of color. What is their ordering?

unique(Q3_sample$color)

## [1] H G J D F E I ## Levels: D < E < F < G < H < I < J

Q4 [45pt] (Just Answer the Question; No R Command)
According to past series of Bond films, the average number of people that are killed by Bond shows substantial variations among different Bond actors, as shown in the following graph. In particular, Pierce Brosnan ranks #1 on this list. To study whether the revenue of the film are affected by the number of people that Bond killed, we performed regression analysis on the available data set. The data are based on 23 past Bond films with all the 6 Bond actors. For each film, we have information on the adjusted worldwide gross (in 1000 dollars), the average rating (on a 1-10 basis with 10 being the best), rating, film budget,the number of people Bond killed and others killed in each film, bond actors and the year of the film. To start with, we build up the following model to see if the number of people that Bond killed in the film would affect the worldwide gross, Where log(gross) is the logarithm of the worldwide gross, Bond kills is the number of people that Bond killed in the film, Pierce is a dummy variable indicating whether the Bond actor is Pierce. The following table shows the estimation results.

=============================================== Dependent variable: ————————— log(Gross) ———————————————– Bond kills 0.02** (0.002, 0.04)

Pierce -0.53** (-1.03, -0.04)

Constant 13.05*** (12.79, 13.31)

Observations 23 R2 0.21 Adjusted R2 0.14 Residual Std. Error 0.32 (df = 20) F Statistic 2.73* (df = 2; 20) =============================================== Note: p<0.1; p<0.05; p<0.01

Answer Q1-Q4 using the regression results above.

1. Does the number of people Bond killed significantly affect the worldwide gross at 5% level? Interpret the estimated coefficient of Bond kills.

If one more person is killed in the movie, then the worldwide gross will drop down by 2%. It’s significant at 5% level.

1. Interpret the estimated coefficient of Pierce.

The worldwide gross of movies with Piece is lower than those without Pierce by 53%.

1. Is the regression overall significant at 5% level?

The regression overall is not siginicant at 5% level since the p-value for the F-statistics is larger than 5%.

1. What does the adjusted R^2 measure?

It measures how much variation of the log of gross revenue can be explained by number of people Bond killed and Pierce as the Bond actor. At the same time, adjusted R^2 penalizes more variables.

Suppose you believe that the decade of 1990’s is the booming age for Bond films, so you include a time dummy variable decade90 into the model.

## ## ======================================================== ## Dependent variable: ## ------------------------------------ ## log(Gross) ## (1) (2) ## -------------------------------------------------------- ## `Bond kills` 0.02** 0.02** ## (0.002, 0.04) (0.002, 0.04) ## ## Pierce -0.53** -0.42 ## (-1.03, -0.04) (-1.15, 0.30) ## ## decade90 -0.16 ## (-0.89, 0.58) ## ## Constant 13.05*** 13.05*** ## (12.79, 13.31) (12.78, 13.31) ## ## -------------------------------------------------------- ## Observations 23 23 ## R2 0.21 0.22 ## Adjusted R2 0.14 0.10 ## Residual Std. Error 0.32 (df = 20) 0.32 (df = 19) ## F Statistic 2.73* (df = 2; 20) 1.80 (df = 3; 19) ## ======================================================== ## Note: *p<0.1; **p<0.05; ***p<0.01

1. From the above results of model 2, why do you think the dummy Pierce becomes insignificant?

The revenue in the decade of 1990’s was actually relatively low. And Pierce happened to be Bond actor in the decade of 1990’s. So the effect of Pierce being a Bond actor is now being explained by both ‘Pierce’ and ‘decade90’. They are not siginificant any more.

Now suppose that you run a new model with the interaction term Bond Kills:Pierce, which equals to the product of dummy Pierce and variable Bond kills.

## ## ========================================================================== ## Dependent variable: ## ------------------------------------------------------ ## log(Gross) ## (1) (2) (3) ## -------------------------------------------------------------------------- ## `Bond kills` 0.02** 0.02** 0.02** ## (0.002, 0.04) (0.002, 0.04) (0.004, 0.04) ## ## Pierce -0.53** -0.42 0.05 ## (-1.03, -0.04) (-1.15, 0.30) (-1.37, 1.47) ## ## decade90 -0.16 ## (-0.89, 0.58) ## ## `Bond kills`:Pierce -0.02 ## (-0.06, 0.02) ## ## Constant 13.05*** 13.05*** 13.01*** ## (12.79, 13.31) (12.78, 13.31) (12.72, 13.29) ## ## -------------------------------------------------------------------------- ## Observations 23 23 23 ## R2 0.21 0.22 0.24 ## Adjusted R2 0.14 0.10 0.12 ## Residual Std. Error 0.32 (df = 20) 0.32 (df = 19) 0.32 (df = 19) ## F Statistic 2.73* (df = 2; 20) 1.80 (df = 3; 19) 2.04 (df = 3; 19) ## ========================================================================== ## Note: *p<0.1; **p<0.05; ***p<0.01

1. Interpret the estimated coefficient of Bond Kills:Pierce. Comparing model 1 and 3, do you think it is a good idea to include the interaction term? Why

Compared to movies without Pierce, the additional gross revenue of movies with Pierce drops by 2% if one more person is killed by Bond.

I think it’s a good idea to include the interaction term because we can see from model 3, Pierce himself can’t be associated with lower revenue. But rather audience disliked to see Pierce killing more people in the movie. Then model 1 is misleading.

Now we turn to study the effect of Bond kills on the average rating, which ranges from 1 to 10. Considering that each of the actors may appeal to specific group or specific generation of audiences, since each of them may represent different time and style, we include several dummy variables in the model for each actor. Moreover, we believe that not only does Bond kills matter, the number of people killed by others (for instance, the supporting actors) also matters.

## ## ====================================================== ## Dependent variable: ## --------------------------- ## Rating ## ------------------------------------------------------ ## `Bond kills` 0.05** ## (0.02) ## ## `Others kills` -0.01** ## (0.004) ## ## `Bond actor`George Lazenby 0.16 ## (0.66) ## ## `Bond actor`Pierce Brosnan -1.82*** ## (0.48) ## ## `Bond actor`Roger Moore -0.76* ## (0.40) ## ## `Bond actor`Sean Connery 0.53 ## (0.45) ## ## `Bond actor`Timothy Dalton -0.69 ## (0.48) ## ## Constant 6.65*** ## (0.42) ## ## ------------------------------------------------------ ## Observations 23 ## R2 0.68 ## Adjusted R2 0.53 ## Residual Std. Error 0.51 (df = 15) ## F Statistic 4.48*** (df = 7; 15) ## ====================================================== ## Note: *p<0.1; **p<0.05; ***p<0.01

1. Interpret the estimated coefficients of Bond kills and Other kills. Comparing the estimated coefficients for Bond kills and Other kills.

The more people killed by Bond, the higher the rating. While the more people killed by others, the lower the rating. They are both significant at 5% level.

1. Which actor is the base category?

Daniel Craig

1. According to the estimates (ignoring significance at this moment), who was the best and who was the worst at boosting the ratings among the 6 Bond actors?

Sean Connery was the best and Pierce Brosnan was the worst.

Homework is Completed By:

Writer	Writer Name	Amount	Client Comments & Rating
ONLINE	Instant Homework Helper 4.8 4305 Orders Completed	$36	She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up! 5.00
Answer.docx Turnitin Report.pdf Contact Writer For Solution Contact Writer For Solution

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 3 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 6 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 12 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

6 writers have sent their proposals to do this homework:

Writer	Writer Name	Offer	Chat
ONLINE	Smart Tutor I have written research reports, assignments, thesis, research proposals, and dissertations for different level students and on different subjects. 4.9 1008 Orders Completed	$25	Chat With Writer
ONLINE	Engineering Exam Guru I am a professional and experienced writer and I have written research reports, proposals, essays, thesis and dissertations on a variety of topics. 4.8 1176 Orders Completed	$31	Chat With Writer
ONLINE	Top Class Engineers I will provide you with the well organized and well research papers from different primary and secondary sources will write the content that will support your points. 4.7 1218 Orders Completed	$38	Chat With Writer
ONLINE	Exam Attempter Being a Ph.D. in the Business field, I have been doing academic writing for the past 7 years and have a good command over writing research papers, essay, dissertations and all kinds of academic writing and proofreading. 4.9 1197 Orders Completed	$31	Chat With Writer
ONLINE	Top Academic Tutor I have done dissertations, thesis, reports related to these topics, and I cover all the CHAPTERS accordingly and provide proper updates on the project. 4.7 1344 Orders Completed	$44	Chat With Writer
ONLINE	Calculation Guru I am an elite class writer with more than 6 years of experience as an academic writer. I will provide you the 100 percent original and plagiarism-free content. 4.4 189 Orders Completed	$16	Chat With Writer