HLTH 511
SPSS Assignment 4 Instructions
Part 1
Follow the steps below to complete your SPSS homework assignment:
Download the SPSS HW #4-1 data file from Blackboard. The data file is from a handwashing study that was conducted at University XYZ. Download the SPSS Assignment 4 Survey document to see the survey. The researcher, Dr. Z, conducted the study to determine the following:
What is the correlation between students’ age and how many seconds they were observed washing their hands?
What is the correlation between students’ score on the “when should you wash your hands” knowledge index and the “correct handwashing” self-report scale?
Now, we need to test if our dependent variables are approximately normally distributed. To do so, we have to test each dependent variable. Use your knowledge from our previous SPSS Assignments to determine if each dependent variable is approximately normally distributed. Also, calculate for skewness and kurtosis. Visually inspect for normal distribution, skewness, and kurtosis. Notice if the data are distributed or not.
To answer Dr. Z’s questions, you will need to determine if you need to run a Pearson or Spearman correlation based off the level of measurement (e.g., nominal, ordinal, scale) of each variable of interest.
Once you figure out whether to run a Pearson or Spearman, you need to know which buttons to click in SPSS. To run a Pearson, you need to click on “Analyze,” “Correlate,” and “Bivariate.”
Then, move over the 2 variables of interest into the “Variables” box.
Make sure you have “Two-tailed” selected and “Pearson” selected. Then, click “OK.”
This is the correlation value. And if this value is <.05, then the correlation is statistically significant.
Decide whether the correlation is weak, moderate, strong, negative, positive, and significant.
Also, create a scatter plot by clicking on graphs, legacy dialogs, and scatter/dot.
Select “Simple Scatter,” and then click “Define.”
Select the dependent variable and put it on the Y axis and the independent variable on the X axis. Then, click “OK.”
Then, double-click the scatter plot.
Click “Elements,” and click “Fit Line at Total.”
Click the “X” on each pop-up.
Then, you have a line through your scatter plot.
Repeat this same process for the variables that apply to conducting a Spearman correlation. The output is interpreted the same. Note if the correlation is weak, moderate, strong, negative, positive, and significant, and also include a scatter plot with line.
Finally, export your output to Word and keep it handy. You will combine this Word output with the output that you will create later on multiple regression.
To export, click the export symbol
…click “Browse,” and then save your document with a name and location that you will remember.
Part 2
Follow the steps below to complete your SPSS homework assignment:
From Blackboard, download the SPSS HW #4-2 data file. The data file is vague on purpose; it does not state what the variables are. The concepts and SPSS output will be easier to read if the variables are simply listed as DV (dependent variable) and IV (independent variable) for this activity.
Before we can begin multiple regression analysis, there are several assumptions we need to test for:
Measurement of variables (dependent must be scale, independent must be scale…or dummy coded for categorical)
Observations are independent (all data must come from different people...people cannot fill out the survey twice)
Normal distribution of dependent variable (tested by Shapiro-Wilk, calculation of skewness/kutosis, visual inspection)
Independent variables do not have multicollinearity (tested via tolerance and VIF)
Error terms are not auto-correlated (tested via Durbin-Watson test)
Outliers are not in the data (tested via Mahalanobis and Cook’s)
Linear relationship between dependent and independent variables (tested via visual inspection of scatter plot)
Data reflects homoscedasticity (tested via visual inspection of scatter plot)
Since we know that the DV and each IV are on a scale level of measurement, we have met assumption A above.
Since we know that each person only filled out 1 survey, we have met assumption B above.
Now, we need to test if our dependent variable is approximately normally distributed. To do so, click “Analyze”... “Descriptive Statistics” … “Explore.”
Select the dependent variable and move it to the “Dependent List” box.
Click on “Plots.”
When this pop-up pops up…unselect “Stem-and-leaf”…select “Histogram”
…select “Normality plots with tests”…click “Continue”
…and then click “OK.”
On the line for “Skewness”…use a calculator and divide the “Statistic” by the “Std. Error”…which in this example is .137 / .637 …
…if the calculated value is within -1.96 to +1.96, then the variable is within the acceptable range for skewness.
Repeat this calculation for the statistic and std. error of the kurtosis. If the calculated value is within -1.96 to +1.96, then the variable is within the acceptable range for kurtosis.
Look to see if the dependent variable has a “Shapiro-Wilk” “Sig.” value that is >.05. If so, then the dependent variable is normally distributed and it meets the assumption C above.
Inspect the histogram to see if the dependent variable looks approximately normally distributed.
Inspect the “normal Q-Q plot” to see if the dots are close to the line.
Inspect the box plot to see if the whiskers are symmetrical.
Next, we need to check for the other assumptions of: independent variables do not have multicollinearity, that there is a linear relationship between dependent and independent variables, that the error terms are not auto-correlated, that the data reflects homoscedasticity, and that outliers are not in the data.
To do this, we first need to…
Click “Analyze”…“Regression”…“Linear.”
Select the dependent variable and move it to the “Dependent” box…select the independent variables and move them to the “Independent(s)” box
…make sure that “Enter” is selected as the method … and then click “Statistics.”
When this pop-up pops up, select…
“Estimates”…“Confidence intervals”…“Model fit”…“Descriptives”…“Collinearity diagnostics”…
… and select “Durbin-Watson”…“Casewise diagnostics”…make sure “Outliers outside” and “3” are selected
… click “Continue.”
Click on “Plots.”
Select “ZPRED” and move it to the “X” box…select “ZRESID” and move it to the “Y” box…
…select the “Normal probability plot”…click “Continue.”
Click on “Save.”
When this pop-up pops up, select “Mahalanobis”…select “Cook’s”…
…click “Continue.”
Click “OK.”
Multicollinearity is when the independent variables are highly correlated. When this happens, the correlating independent variables are very similar, and one of the variables should be removed from the regression analysis.
To test the assumption that our independent variables do not have multicollinearity (assumption D above), we should first look at the correlations of the independent variables with each other. The general rule of thumb is that correlations of ± 0.7 are too high…meaning that the variables are too much alike. If this is the case, just keep it in mind for now, and continue with the rest of the tests.
The authoritative test for multicollinearity is the “Tolerance” and “VIF” values for each independent value. If Tolerance is less than 0.1, then that variable has too high of a correlation with another variable. If the VIF value is greater than 10, then that variable has too high of a correlation with another variable. You can find these values in the “Coefficients” table in your SPSS output.
If the Tolerance and VIF values are greater than 0.1 and less than 10, then you have met the assumption that the variables do not reflect multicollinearity. Note whether or not this assumption was met.
If a variable shows multicollinearity, we would want to exclude that variable from the regression analysis.
Next, we want to test the assumption that the error terms are not auto-correlated. Look at the Durbin-Watson test in the “Model Summary” table. If the value is greater than 1.0, then the assumption was met (assumption E above).
Next, we want to test the assumption that there are no outliers in the data. We will check this assumption by looking at the “Residual Statistics” table.
First, check that the Cook’s distance maximum value is less than 1. Then, check your maximum Mahalanobis distance.
Then, look at a Chi-square table to find the cut-off value for the Mahalanobis distance test. Do this by counting how many independent variables you have in your regression analysis. This equates to the number of degrees of freedom in the Chi-square chart. Then, check to see what the cut-off value is at the .001 level. If the Mahalanobis maximum value is less than the Chi-square cut-off value, then you do not have any outliers and you have met the assumption of not having any outliers.
If you did have a Cook’s value that exceeded 1 or a Mahalanobis value that exceed the Chi-square cut off, then you would go to your data view and delete each case that has a value greater than 1 in the COO column (this column was created automatically for you) and delete each case that has a value greater than the Chi-square cut off in the MAH column (this column was also created automatically for you).
You can find these cases by right-clicking on the variable column and then selecting “Sort Descending.”
Next, we need to check for the assumption that there is a linear relationship between the dependent and independent variables. We will do this through a visual inspection of two figures. One of those figures is already in your SPSS output, but we need to create the other figure. To do this, click on “Graphs”…“Legacy Dialogs”…“Scatter/Dot.”
When this pop-up pops up, select “Matrix Scatter” and then click “Define.”
Then, select all of your independent variables and the dependent variable …
…and move them to the “Matrix Variables” box…and then click “OK.”
Then, double-click anywhere on the matrix plot…
…and when this pop-up pops up, click on “Elements”…“Fit Line at Total”
…and then close the pop-up.
Then, inspect that the graphs that intersect each independent variable with the dependent variable (in other words, the row/column where there is a blank space for the dependent variable)
Look to see that the dots look like they are in a line and not just a cloud of dots. In our example above, it looks like IV1 and IV2 both are linear, but IV3 does not look as linear. We may want to consider eliminating IV3 so that we meet the assumption that there is a linear relationship between the dependent variable with each independent variable.
The other figure we want to inspect is the scatterplot that has the “regression standardized residual” on the y axis and the “regression standardized predicted value” on the x axis. Double-click anywhere on the scatter plot…
…and when this pop-up pops up, select “Elements” … “Fit Line at Total” … and then close the pop-up.
If the dots are in a random order in the shape of a cloud, with roughly half the dots above the horizontal line and half of the dots below the line, then we have met the assumption that there is a linear relationship between the dependent variable with each independent variable. If we do not meet this assumption, then we would not want to run a regression analysis.
Last but not least, we need to look at the exact same graph above (the scatterplot that has the “regression standardized residual” on the y axis and the “regression standardized predicted value” on the x axis) to check the assumption of homoscedasticity. This time, inspect the figure to make sure that the dots are in the shape of a cloud and not fanning out. If the dots are not fanning out, then we have met the assumption of homoscedasticity.
…if the dots were to fan out it would like this:
Finally, export your output to Word and keep it handy. You will combine this Word output with the output that you will create later on multiple regression.
To export, click the export symbol
…click “Browse,” and then save your document with a name and location that you will remember.
Part 3
To run a multiple regression analysis, click on the same buttons that we did for Part 1 of this homework. Click “Analyze”…“Regression”…“Linear.”
Select the dependent variable and move it to the “Dependent” box…select the independent variables and move them to the “Independent(s)” box
…make sure that “Enter” is selected as the method…and then click “Statistics.”
When this pop-up pops up, select…
“Estimates”…“Confidence intervals”…“Model fit”…“Descriptives”…“Collinearity diagnostics”…
…and select “Durbin-Watson”…“Casewise diagnostics”…make sure “Outliers outside” and “3” are selected…click “Continue.”
Click on “Plots.”
Select “ZPRED” and move it to the “X” box … select “ZRESID” and move it to the “Y” box …
…select the “Normal probability plot”…click “Continue.”
Click on “Save.”
When this pop-up pops up, select “Mahalanobis”…select “Cook’s”…
…click “Continue.”
Click “OK.”
In the SPSS output, go to “Model Summary” table and look at the “R Square.” This value tells us what percent of the variance of the dependent variable is explained by the regression model (all of the independent variables that we included in the regression analysis). Our model explains 92% of the variance of the dependent variable.
The “Adjusted R Square” adjusts for the number of independent variables that you include in your analysis
Next, we need to look at the “ANOVA” table of the output to see if our regression model is statistically significant in being able to explain the dependent variable…as opposed to the model explaining the dependent variable just by chance. If the “Sig.” value is ≤ .05, then the model is statistically significant in explaining the dependent variable. If it is > .05, then the model is not able to statistically explain the dependent variable.
Also, take note of the “df” values for “Regression” and “Residual” as well as the “F” value.
Remember the following format for reporting results and use this format when reporting results in the research project later in the course.
We performed a multiple linear regression in order to predict the [DV] based on [IV1] and [IV2]. Using the enter method, the regression equation was statistically significant (F(3,8) = 30.49, p < .05), with an R2 of .920 and an R2 adjusted of .889.
Next, let’s look at the “Coefficients” table. The “Sig.” values of ≤ .05 are considered statistically significant predictors.
Because IV3’s sig is > .05, it is not a statistically significant predictor in our model. Statisticians debate if non-significant variables should be excluded from the analysis and then re-run their assumption tests and multiple linear regression analysis without that variable. In our SPSS HW, for the sake of time, we are not going to re-run the analysis, we are just going to accept that IV3 was not significant.
Finally, export your output to Word and submit it to Blackboard.
To export, click the export symbol.
…click “Browse,” and then save your document with a name and location that you will remember.