HLTH 511
SPSS Assignment 4 Instructions
Part 1
Follow the steps below to complete your SPSS homework assignment:
Download the SPSS HW #4-1 data file from Blackboard. The data file is from a handwashing study that was conducted at University XYZ. Download the SPSS Assignment 4 Survey document to see the survey. The researcher, Dr. Z, conducted the study to determine the following:
What is the correlation between students’ age and how many seconds they were observed washing their hands?
What is the correlation between students’ score on the “when should you wash your hands” knowledge index and the “correct handwashing” self-report scale?
Now, we need to test if our dependent variables are approximately normally distributed. To do so, we have to test each dependent variable. Use your knowledge from our previous SPSS Assignments to determine if each dependent variable is approximately normally distributed. Also, calculate for skewness and kurtosis. Visually inspect for normal distribution, skewness, and kurtosis. Notice if the data are distributed or not.
To answer Dr. Z’s questions, you will need to determine if you need to run a Pearson or Spearman correlation based off the level of measurement (e.g., nominal, ordinal, scale) of each variable of interest.
Once you figure out whether to run a Pearson or Spearman, you need to know which buttons to click in SPSS. To run a Pearson, you need to click on “Analyze,” “Correlate,” and “Bivariate.”
Then, move over the 2 variables of interest into the “Variables” box.
Make sure you have “Two-tailed” selected and “Pearson” selected. Then, click “OK.”
This is the correlation value. And if this value is <.05, then the correlation is statistically significant.
Decide whether the correlation is weak, moderate, strong, negative, positive, and significant.
Also, create a scatter plot by clicking on graphs, legacy dialogs, and scatter/dot.
Select “Simple Scatter,” and then click “Define.”
Select the dependent variable and put it on the Y axis and the independent variable on the X axis. Then, click “OK.”
Then, double-click the scatter plot.
Click “Elements,” and click “Fit Line at Total.”
Click the “X” on each pop-up.
Then, you have a line through your scatter plot.
Repeat this same process for the variables that apply to conducting a Spearman correlation. The output is interpreted the same. Note if the correlation is weak, moderate, strong, negative, positive, and significant, and also include a scatter plot with line.
Finally, export your output to Word and keep it handy. You will combine this Word output with the output that you will create later on multiple regression.
To export, click the export symbol
…click “Browse,” and then save your document with a name and location that you will remember.
Part 2
Follow the steps below to complete your SPSS homework assignment:
From Blackboard, download the SPSS HW #4-2 data file. The data file is vague on purpose; it does not state what the variables are. The concepts and SPSS output will be easier to read if the variables are simply listed as DV (dependent variable) and IV (independent variable) for this activity.
Before we can begin multiple regression analysis, there are several assumptions we need to test for:
Measurement of variables (dependent must be scale, independent must be scale…or dummy coded for categorical)
Observations are independent (all data must come from different people...people cannot fill out the survey twice)
Normal distribution of dependent variable (tested by Shapiro-Wilk, calculation of skewness/kutosis, visual inspection)
Independent variables do not have multicollinearity (tested via tolerance and VIF)
Error terms are not auto-correlated (tested via Durbin-Watson test)
Outliers are not in the data (tested via Mahalanobis and Cook’s)
Linear relationship between dependent and independent variables (tested via visual inspection of scatter plot)
Data reflects homoscedasticity (tested via visual inspection of scatter plot)
Since we know that the DV and each IV are on a scale level of measurement, we have met assumption A above.
Since we know that each person only filled out 1 survey, we have met assumption B above.
Now, we need to test if our dependent variable is approximately normally distributed. To do so, click “Analyze”... “Descriptive Statistics” … “Explore.”
Select the dependent variable and move it to the “Dependent List” box.
Click on “Plots.”
When this pop-up pops up…unselect “Stem-and-leaf”…select “Histogram”
…select “Normality plots with tests”…click “Continue”
…and then click “OK.”
On the line for “Skewness”…use a calculator and divide the “Statistic” by the “Std. Error”…which in this example is .137 / .637 …
…if the calculated value is within -1.96 to +1.96, then the variable is within the acceptable range for skewness.
Repeat this calculation for the statistic and std. error of the kurtosis. If the calculated value is within -1.96 to +1.96, then the variable is within the acceptable range for kurtosis.
Look to see if the dependent variable has a “Shapiro-Wilk” “Sig.” value that is >.05. If so, then the dependent variable is normally distributed and it meets the assumption C above.
Inspect the histogram to see if the dependent variable looks approximately normally distributed.
Inspect the “normal Q-Q plot” to see if the dots are close to the line.
Inspect the box plot to see if the whiskers are symmetrical.
Next, we need to check for the other assumptions of: independent variables do not have multicollinearity, that there is a linear relationship between dependent and independent variables, that the error terms are not auto-correlated, that the data reflects homoscedasticity, and that outliers are not in the data.
To do this, we first need to…
Click “Analyze”…“Regression”…“Linear.”
Select the dependent variable and move it to the “Dependent” box…select the independent variables and move them to the “Independent(s)” box
…make sure that “Enter” is selected as the method … and then click “Statistics.”
When this pop-up pops up, select…
“Estimates”…“Confidence intervals”…“Model fit”…“Descriptives”…“Collinearity diagnostics”…
… and select “Durbin-Watson”…“Casewise diagnostics”…make sure “Outliers outside” and “3” are selected
… click “Continue.”
Click on “Plots.”
Select “ZPRED” and move it to the “X” box…select “ZRESID” and move it to the “Y” box…
…select the “Normal probability plot”…click “Continue.”
Click on “Save.”
When this pop-up pops up, select “Mahalanobis”…select “Cook’s”…
…click “Continue.”
Click “OK.”
Multicollinearity is when the independent variables are highly correlated. When this happens, the correlating independent variables are very similar, and one of the variables should be removed from the regression analysis.
To test the assumption that our independent variables do not have multicollinearity (assumption D above), we should first look at the correlations of the independent variables with each other. The general rule of thumb is that correlations of ± 0.7 are too high…meaning that the variables are too much alike. If this is the case, just keep it in mind for now, and continue with the rest of the tests.
The authoritative test for multicollinearity is the “Tolerance” and “VIF” values for each independent value. If Tolerance is less than 0.1, then that variable has too high of a correlation with another variable. If the VIF value is greater than 10, then that variable has too high of a correlation with another variable. You can find these values in the “Coefficients” table in your SPSS output.
If the Tolerance and VIF values are greater than 0.1 and less than 10, then you have met the assumption that the variables do not reflect multicollinearity. Note whether or not this assumption was met.
If a variable shows multicollinearity, we would want to exclude that variable from the regression analysis.
Next, we want to test the assumption that the error terms are not auto-correlated. Look at the Durbin-Watson test in the “Model Summary” table. If the value is greater than 1.0, then the assumption was met (assumption E above).