Statistics questions
True/False
1. The standard error of the estimate (standard error) is the estimated standard deviation of the distribution of the independent variable (X).
2. In a simple linear regression model, the coefficient of determination not only indicates the strength of the relationship between independent and dependent variable, but can also show whether the relationship is positive or negative. 3. When using simple regression analysis, if there is a strong correlation between the independent and dependent variable, we cannot conclude that an increase in the value of the independent variable is associated with an increase in the value of the dependent variable. 4. The error term is the difference between an individual value of the dependent variable and the corresponding mean value of the dependent variable.
5. An assumption of Classical Linear Regression model is that the variance of the error term does not change from one observation to another.
6. A significant positive correlation between X and Y implies that changes in X cause Y to change.
7. The correlation coefficient is the ratio of explained variation to total variation.
8. If the null hypothesis is rejected when the F test is used to test the overall significance of a multiple regression model, it can be concluded that all of the independent variables X1, X2, Xk are significantly related to the dependent variable Y. 9. An application of the multiple regression model generated the following results involving the F test of the overall regression model: p-value = 0.012, R 2= 0.60 and s = 0.176. Thus, the null hypothesis, which states that none of the independent variables is significantly related to the dependent variable, cannot be rejected at the .01 level of significance. 10. One of the assumptions of Multiple Regression model is that there is a perfect linear relationship among independent variables.
11. The assumption of independent error terms in regression analysis is often violated when using time series data.
Multiple Choices
1. The point estimate of the variance of the error term in a regression model is denoted as: A. MSE B. b0 C. SSE D. b1 2. The least squares regression line minimizes the sum of the A. Sum of Differences between actual and predicted Y values B. Sum of Squared differences between actual and predicted X values C. Sum of Absolute deviations between actual and predicted X values D. Sum of Absolute deviations between actual and predicted Y values E. Sum of Squared differences between actual and predicted Y values 3. The ___________ the s (standard error) and the __________ the R2 the stronger the relationship between the dependent variable and the independent variable. A. Lower, higher B. Higher, lower C. Lower, lower D. Higher, higher 4. In simple bivariate regression analysis, if the correlation coefficient is a positive value, then A. The Y intercept must also be a positive value B. The coefficient of determination can be either positive or negative, depending on the value of the slope C. The least squares regression equation could either have a positive or a negative slope D. The slope of the regression line must also be positive E. The standard error of estimate can either have a positive or a negative value The slope coefficient and correlation coefficient have the same sign in bivariate regression- also obvious from the interpretation of the slope in Instruction- but note that the relation could be weak or strong. Positive sign only shows the direction not the magnitude.
5. The correlation coefficient may assume any value between A. 0 and 1 B. -1 and 0 C. -infinity and + infinity D. 0 and infinity E. -1 and 1
6. A simple bivariate regression analysis with 20 observations would yield ________ degrees of freedom for error and _________ degrees of freedom total. A. 1, 20 B. 18, 19 C. 19, 20 D. 1, 19 E. 18, 20
7. Which is not an assumption of a multiple regression model? A. Independence of error terms B. Normality of error terms C. Positive autocorrelation of error terms see Instructions D. Constant variation of error terms E. Independence of error terms with X variables
8. A multiple regression analysis with 23 observations on each of four independent variables and the dependent variable would yield ______ and _______ degrees of freedom respectively for regression (explained) and error. A. 3, 18 B. 4, 22 C. 4, 18 D. 3, 19 E. 4, 17
9. Consider the following partial computer output for a multiple regression model. What is R2? A. 31.308% B. 77.72% C. 76.95% D. 72.63% E. 23.1% 10. Consider the following partial computer output for a multiple regression model. What is adjusted R2? A. 31.308% B. 76.95% C. 87.72% D. 72.63% E. 23.1%
11. In multiple regression analysis, the mean square regression divided by mean square error yields the: A. Standard error B. F statistic C. R2 D. Adjusted R2 or E. T statistic 12. A particular multiple regression model has 3 independent variables, the sum of the squared error is 7680 and the total number of observations is 34. What is the value of the standard error of estimate? A. 256 B. 232.72 C. 225.89 D. 15.03 E. 16 The df for error = 34- 3-1 = 30 and the standard error of estimate is √MSE = √(7680/30) = 16
Essay Type (please explain and show your working)
1. Use the following results obtained from a simple linear regression analysis with 15 observations.
= 35.5- (1.75)X R2= 0.9345 and sb1 = 0.60 Interpret regression results and the value of the coefficient of Determination. Predict the value of Y when X is equal to 10. Calculated the correlation coefficient between Y and X. Test to determine if there is a significant relationship between the independent and dependent variable at = 0.05. Perform a two-tailed test.
2. A local tire dealer wants to predict the number of tires sold each month. He believes that the number of tires sold is a linear function of the amount of money invested in advertising. He randomly selects past months of data consisting of tire sales (in hundreds of tires) and advertising expenditures (in thousands of dollars). Based on the data set with 20 observations, the simple linear regression model yielded the following results. (X is advertising expenditure in thousand dollars and Y is tires sold in hundreds): ∑X = 50; ∑Y = 100; ∑X2 = 225; ∑Y2 = 720; ∑XY = 390.
Find the Intercept and slope and Write the Regression Equation. Also predict the amount of tires sold when money invested in advertising is 5 thousand dollars. Calculate the correlation coefficient and coefficient of determination. Check whether there is a relation between correlation coefficient and coefficient of determination. Calculate SSE and MSE, and standard error and t-score of the slope coefficient.
3. A member of the state legislature has expressed concern about the differences in the mathematics test scores of high school freshmen across the state. She asks her research assistant to conduct a study to investigate what factors could account for the differences. The research assistant looked at a random sample of school districts across the state and used the factors of percentage of mathematics teachers in each district with a degree in mathematics, the average age of mathematics teachers and the average salary of mathematics teachers:
Regression Output
Predictor
Coef.
SE Coef.
Constant
35.17
7.850
Math Degree (%)
0.30
0.080
Age
0.45
0.188
Salary
0.15
0.075
Analysis of Variance
Source
DF
SS
Regression
3
1120.5
Residual Error
28
530.8
Write the least squares prediction equation. What is the number of observations in the sample? Based on the multiple regression model given above, estimate the mathematics test score and calculate the value of the residual, if the percentage of teachers with a mathematics degree is 50.0, the average age is 45 and the average salary is $48,000. If the actual mathematics test score for these factors is 68.50, what is the error for this observation? What is the total sum of squares? What is the explained variation? What is the mean square error and the standard error of estimate? (note: the unit of measure for the salary is in thousand dollars)
4. For the results given in question # 3 above, calculate the Coefficient of Determination and the Adjusted coefficient of Determination and Test for the overall usefulness of the model using F-Statistic at 5% and 1% significance levels. Finally, test the usefulness (or significance of the three independent variables using t-test for 5% and 1% significance levels.
5. The following table gives the data for per capita income in thousands of US dollars with the percentage of the labor force in Agriculture and the average years of schooling of the population over 25 years of age for 15 developed countries in 2000 (data modified for educational purpose). Develop a multiple regression model for per capita income (dependent variable) using Excel or MegaStat and answer the questions below the table. You can use symbols Y, X1 and X2 for the variables in your calculation. Show your computer output.
Country number
per capita
% of labor in Agriculture
Average years of schooling
1
20
9
7
2
26
10
12
3
24
8
11
4
21
7
11
5
22
10
12
5
42
4
16
7
27
5
11
8
24
5
9
9
28
6
12
10
32
8
14
11
30
7
12
12
40
4
16
13
34
9
14
14
30
5
10
15
35
8
13
Find the Y-intercept and slopes for the two independent variables and interpret them. Predict the per capita income when percentage of labor force in Agriculture is only 3 and average years of schooling is 15. Find the overall explanatory power (Coefficient of Determination) of the model and interpret it. Also find the adjusted coefficient of Determination and interpret it. Find the standard error of estimate. From the ANOVA table find SSR, SSE and SST and the F-value. Perform the F-test and comment on the overall usefulness of the model Perform t-test for the statistical significance of individual coefficients.