Book name for the Cours Stats: Data and Models (4th Edition) (Hardcover)
by Richard D. De Veaux, Paul F. Velleman
Question 1.
Re-expressing Data to Fit a Linear Model
Suppose that you have the following data below for x (the independent variable) and y (the response variable):
independent.var = c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60)
response.var = c(16.3, 9.7, 8.1, 4.2, 3.4, 2.9, 2.4, 2.3, 1.9, 1.7, 1.4, 1.3)
A) Using a linear model, fit a line to the above data without using a re-expression. Show the fitted line relative to a scatterplot of the data. Comment on what you see in terms of fit, and also calculate and explain the meaning of R2.
B) Re-express the data so that you obtain a better linear fit, and explain how and why you chose your re-expression. (NOTE: Only consider re-expressions that change the y-variable (the response variable, for this problem). Also, show the re-expressed fitted line relative to the re-expressed data. Comment on what you see in terms of fit, and also calculate and explain the meaning of R2.
Question 2.
Simulation – Washington Nationals Win The 2019 World Series.
As we all know by now, the Washington Nationals beat the Houston Astros to win the 2019 World Series. However, before the start of that World Series, the Washington Nationals were often cited as having only a 40% chance of winning any particular game against the Astros.
So, how rare is what the Nats accomplished by winning the 2019 World Series?
Let’s pretend that the World Series has not been played yet. Keep in mind that the World Series is a series of up to 7 games. The winner is determined by whichever teams wins 4 games first. So, the World Series may last only 4 games, or perhaps 5 games, or 6 games, or even 7 games. The World Series is over as soon as one of the teams wins 4 games.
Run a simulation using the above information to assess the chances that the Washington Nationals win the World Series. Make sure that you show all of your work, including anything you do in R (e.g., any random numbers you generate and how you use them). In doing so, make sure that you specify how you are modeling the simulation using equally likely random digits, explain what constitutes a trial and its outcome.