© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 1 of 21
Logistic Regression Tutorial:
RSCH-8250 Advanced Quantitative Reasoning
Charles T. Diebold, Ph.D.
July 30, 2013 How to cite this document: Diebold, C. T. (2013, July 30). Logistic regression tutorial: RSCH-8250 advanced quantitative reasoning.
Available from tom.diebold@waldenu.edu.
Table of Contents Assignment and Tutorial Introduction ................................................................................................................ 2
Specific Assignment Instructions & Expectations .......................................................................................... 2 Example APA Table .................................................................................................................................. 4
SPSS Assignment Screenshots and Output ......................................................................................................... 4 Becoming Familiar with the Variables ........................................................................................................... 4
Descriptive Statistics: Dichotomous & Categorical Variables ..................................................................... 6 Descriptive Statistics: Metric Variables ...................................................................................................... 8
Binary Logistic Regression ............................................................................................................................ 9 Sorting by Participant Number Prior to Analysis ........................................................................................ 9 Screenshots Specifying the Logistic Regression ....................................................................................... 10
Logistic Regression Assignment Output....................................................................................................... 12 References and Recommended Reading ....................................................................................................... 20 Appendix: Example of Additional APA Tables for Logistic Regression ....................................................... 21
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 2 of 21
Odds Ratio Tutorial: RSCH-8250 Advanced Quantitative Reasoning
Assignment and Tutorial Introduction
This tutorial is intended to assist RSCH-8250 students in completing the Week 9 application assignment. I recommend that you use this tutorial as your first line of instruction on producing the needed output and on what to focus on from the output. You should also re-review my odds ratio tutorial from last week. In the textbook’s companion website, Field provides elaborate interpretation of key parts of the output that I recommend you study even though some of his output is inconsistent with the hierarchical logistic that he said he did, contains some incorrect APA statistical reporting style, commits the common error of interpreting an odds ratio as “times more likely”(which I demonstrated as incorrect in last week’s tutorial), used incorrect form of one of the predictors in checking for multicollinearity, and has incorrect value for the constant in a probability calculation (so if you use his probability answer of .9067, you will be incorrect—do the math yourself using correct B coefficients). In addition, though I did not view it yet, you might find Dr. Morrow’s video useful. Separate from this tutorial is a logistic regression demonstration that I use at residencies (along with the SPSS data file). You might find the information in that demonstration instructive. 3rd edition of Field textbook: Chapter 8 in the Field textbook, Smart Alex's Task #3 on p. 314. 4th edition of Field textbook: Chapter 19 in the Field textbook, Smart Alex's Task #6-9 on p. 813. The exercise uses the condom.sav SPSS datafile. The objective of the exercise is to conduct and interpret a multiple predictor binary logistic regression. I personally think that anything other than a very rudimentary introduction to logistic regression is beyond the level of this course. Therefore, I depart from usual expectations and usual tutorial organization. Instead of providing step-by-step SPSS screenshots to produce the assignment output and separate annotated example output, in this tutorial I provide the screen shots and the resulting assignment output with commentary that still leaves you room to demonstrate understanding of the output via your write up (you also must still replicate the SPSS output and submit as part of the assignment).
Specific Assignment Instructions & Expectations
1. Describe the criterion and predictor variables, providing minimum, maximum, mean, and standard deviation for metric variables, and percentages for each level of categorical variables.
2. State the general purpose or global research question that guides the analysis (specific hypotheses are not required).
3. Generally state assumptions or other considerations for logistic regression. Examination of assumptions and residuals are beyond the scope of this course (even Field’s examination in the companion website of multicollinearity for the predictors in this week’s assignment is incorrect—he used the three-level non- ordinal variable “previous” in the linear regression multicollinearity screening; instead, two indicator variables should have been created to represent this variable and used in the screening).
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 3 of 21
a. Because Field associates the reliability of the model to assessment of multicolinearity and residuals (and because these are complex for logistic regression and beyond the scope of this course), the part of the task that asks “How reliable is the model?” is not required to be answered; in fact, it should not even be attempted to be answered because such answer is likely to be incorrect.
4. Report and interpret the overall chi-square test of the statistical significance of the model (see output labeled “Omnibus Tests of Model Coefficients”).
a. Recall that APA format for a chi-square result is: χ2(3, N = 77) = 12.236, p = .028, where, in this example, 3 is the df value, 77 is number of valid cases, 12.236 is the chi-square value, and .028 is the observed significance. Pay attention to proper spacing and italization and, per APA, report the exact p value unless less than .001, in which case report as p < .001. Do not report p = .000 (disregard Dr. Morrow’s incorrect example and follow proper APA style).
5. Report and interpret the fit of the model; specifically, the statistical results of the Hosmer and Lemeshow Test.
a. For a good fit, we want this test to be nonsignificant (i.e., if p < .05, then the model is not a good fit. It is possible for the omnibus chi-square to be statistically significant, but the model be a poor fit (so, do not confuse these two separate tests).
b. For a more nuanced understanding of the Hosmer and Lemeshow test, particularly the contingency table, see Dr. Diebold’s residency’s logistic regression demonstration).
6. Report and interpret the classification results (i.e., the percentage correctly classified in each level of the dichotomous criterion, and the overall correct percentage; for a more nuanced understanding, including cut value, sensitivity, specificity, and other issues with the classification results, see Dr. Diebold’s residency’s logistic regression demonstration).
7. For any statistically significant predictor (i.e., p < .05), report and interpret in text the odds ratio, for example: Variable X had a statistically significant odds ratio of 1.72, B = .542, p = .013, indicating that a one point increase in X was associated with 72% greater odds of using a condom.
a. Do not attempt to rank order the importance of statistically significant predictors; such rank ordering is likely to be incorrect. For information on how to properly assess the relative importance of metric predictors see Dr. Diebold’s residency’s logistic regression demonstration.
8. For predictors that were not statistically significant, simply report them in text without odds ratio, B, or p value.
9. Provide an APA table of logistic regression results as demonstrated in the table on the next page (it already includes the information for the constant and gender). You should use the table function in Microsoft Word, not spaces and tabs, to construct the table. Values within a column should align at the decimal, which can be accomplished by setting decimal tab stops.
a. In the real world, though not required for this assignment, you would also include two additional tables: (a) means and frequencies as demonstrated in Table 18.1, and (b) intercorrelations as demonstrated in Table 18.2. These are taken from Nicol and Pexman (2010) and shown in the appendix of this tutorial.
10. A female who used a condom in her previous encounter scores 2 on all variables except perceived risk (for which she scores 6). Use the model to estimate the probability that she will use a condom in her next encounter. (Note. See comment in the assignment output below for how to do this).
a. The “scores 2 on all variables except perceived risk” is tricky, if not downright misleading. For gender, previous(1), and previous(2) you must use values implied by “ a female who used a condom in her previous encounter”.
11. Using the final model of predictor coefficients of condom use, what are the probabilities that participants 12, 53, and 75 will use a condom? (Note. In the companion website, Field reports incorrect values for these participants for the analysis that he ran, so if you report his values you will be incorrect. There is a step in the screen shots below that will ensure you get this correct).
12. Have fun .
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 4 of 21
Example APA Table Information for the constant and gender are already entered; obviously, you will need to complete the information for the other variables. Table 1 Logistic Regression Predicting Condom Use (N = 100)
Predictor B SEB p OR 95% CI
Constant -4.960 1.147 < .001 0.007
Gender .003 .573 .996 1.003 [0.326, 3.081]
Predictor Name
Predictor Name
Predictor Name
Predictor Name
Predictor Name
Predictor Name
Predictor Name
Note. CI = confidence interval for odds ratio (OR).
SPSS Assignment Screenshots and Output
Becoming Familiar with the Variables
Open the condom.sav data file, the Variable View screencapture is shown at left. There are 8 variables in the datafile. The first one, particip, is a sequential participant ID, which we will address later. The variable name “use” is the criterion, and the remaining 6 are predictors.
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 5 of 21
It is important to understand the nature of each variable, particularly to differentiate metric variables from categorical variables.
Go to FileDisplay Data File InformationWorking File as shown at left. Output as shown below is produced.
Variable Information
Variable Position Label
particip 1 Participant
safety 2 Relationship Safety
use 3 Condom Use
gender 4
sexexp 5 Sexual experience
previous 6 Previous Use with Partner
selfcon 7 Self-Control
perceive 8 Perceived Risk
Variable Values
Value Label
use 0 Unprotected
1 Condom Used
gender 0 Male
1 Female
previous
0 No Condom
1 Condom used
2 First Time with partner
I edited the Variable Information output to include only the first three columns (the other columns are not important). This allows you to see, and readily refer back to if saved or printed, the names and labels of the variables in the file.
The Variable Values output allows us to see how the dichotomous and categorical variables were coded. Here, the “use” variable was coded 1 = condom used. This is the criterion, the variable we want to predict. The logistic model predicts the group coded 1, so we will be predicting used a condom. For gender, female is coded 1, so the coefficient will be with respect to females. For the “previous” variable there are three levels. Because the levels are not ordinal, we will have to create two indicator (aka dummy) variables to represent this variable in the logistic model.
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 6 of 21
Descriptive Statistics: Dichotomous & Categorical Variables Go to AnalyzeDescriptive StatisticsFrequencies.
The Frequencies dialogue box appears. Select and move the two dichotomous variables (use and gender) and the categorical variable (previous) into the Variable(s) box. Click OK to produce frequency tables output for each of these variables (shown on next page).
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 7 of 21
For each of these three variables you should report the percent in each level of the variable. For the “previous” variable, notice that there are only 3 cases in the 2 = first time with partner level. Such a small number in a level of a variable causes unreliable standard error that is used to compute statistical significance. Moreover, a small number of cases in the last category for a variable will cause further issues if used as the default reference group for the variable. I show how to solve this latter issue for purposes of this assignment, but in the real world you would want to conduct this logistic regression without these 3 cases (which means the “previous” variable would become a simple dichotomy).
use Frequency Percent Valid Percent Cumulative
Percent
Valid
0 Unprotected 57 57.0 57.0 57.0
1 Condom Used 43 43.0 43.0 100.0
Total 100 100.0 100.0
gender
Frequency Percent Valid Percent Cumulative Percent
Valid
0 Male 50 50.0 50.0 50.0
1 Female 50 50.0 50.0 100.0
Total 100 100.0 100.0
previous
Frequency Percent Valid Percent Cumulative Percent
Valid
0 No Condom 50 50.0 50.0 50.0
1 Condom used 47 47.0 47.0 97.0
2 First Time with partner 3 3.0 3.0 100.0
Total 100 100.0 100.0
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 8 of 21
Descriptive Statistics: Metric Variables For the metric predictors we need some basic descriptive statistics. Go to AnalyzeDescriptive StatisticsDescriptives, similar to as shown on page 6, but selecting Descriptives instead of Frequencies. Select and move the four metric variables into the Variable(s) box as shown below.
Click OK to produce the output as shown below. Report in text the minimum, maximum, mean, and standard deviation of each of these metric variables. Note. In the 3rd edition, Field incorrectly described the measurement scale for each of these:
• Safety is actually on a 7-pt scale from 0 to 6, but Field reported it as a 5-pt scale. • Sexexp is actually on an 11-pt scale from 0 to 10, but Field reported it as a 10-pt scale. • Selfcon is actually on a 12-pt scale from 0 to 11, but Field reported it as a 9-pt scale. • Perceive is actually on an 8-pt scale from 0 to 7, but Field reported is as a 6-pt scale.
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation safety 100 0 6 2.29 1.622
sexexp 100 0 10 4.01 2.529
selfcon 100 0 11 3.94 2.282
perceive 100 0 7 3.11 1.769
Valid N (listwise) 100
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 9 of 21
Binary Logistic Regression Sorting by Participant Number Prior to Analysis As will become evident later, SPSS produces case summaries numbered by their order in the data file, not by participant number. This will cause confusion in interpreting certain output if participant numbers (often labeled ID, but in this data file labeled particip) are not sorted in ascending order to correspond with SPSS’ internal numbering of cases. Go to DataSort Cases as shown below left, and in the pop-up below right select and move “particip” into the Sort by box. Leave the Sort Order as Ascending. Clicking OK will resort the file.
A partial view of before (left) and after (right) of the order of cases is shown below. The fixed SPSS row numbers correspond to SPSS’s internal numbering and identification of cases. It should be clear that in the before sort (left), the particip numbering does not match the SPSS case numbering, which sorting (below right) made congruent. The case numbers in the logistic regression case summaries will now correspond to the participant number.
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 10 of 21
Screenshots Specifying the Logistic Regression Go to AnalyzeRegressionBinary Logistic. In the Logistic Regression dialogue, select and move the “use”
variable to the Dependent box, and select and move the six predictors—safety, gender, sexexp, previous, selfcon, and perceive—to the Covariates box. Leave the Method as Enter. This will be a standard (aka direct) regression that forces all predictors to enter. In the companion website, Field stated that the correct analysis is a sequential (aka hierarchical1
) logistic regression. In the 3rd edition, instructions in the Smart Alex task are “Previous research…has shown that gender, relationship safety, and perceived risk predict condom use. Carry out an appropriate analysis to verify these previous findings, and to test whether self-control, previous usage and sexual experience can predict any of the remaining variance in condom use” (emphasis added, p. 314). The 4th edition does not refer to “any of the remaining variance”.
So, the instructions in the 3rd edition directly imply a sequential regression, but those in the 4th edition do not. However, sequential regression requires a justification (not just being told to do it), and Field’s justification rests on shaky ground. It is true that one reason for doing a sequential regression is to first enter theoretically grounded and well- established predictors of a criterion and then enter new predictors to determine if any remaining variance can be explained. One problem with Field’s setup is that there is no well-established empirical evidence (one study does not “well-established” make) that gender, relationship safety, and perceived risk combine into an explanatory theoretical
framework. Another problem is that a demographic, such as gender, is never the cause but merely a surrogate for some underlying variable. So, rarely would it make sense to give entry priority to a demographic. Instead, given that a specific demographic has been previously found related to a criterion, one would propose one or more theoretically-driven predictors that might account for the demographic difference, enter those variables first in a sequential regression, then enter the demographic to see if any remaining variance is explained and, if so, propose additional predictors until a well-established theoretical framework explains away the demographic effect, wholly accounting for its surrogacy. 1 I prefer the term sequential for a number of reasons, but mostly to avoid confusion with hierarchical linear modeling, which is a regression, but a different species.
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 11 of 21
Finally, if a sequential regression is justified, then it is the block effect that must be the focus of the analysis, not the effect of each individual predictor. Field briefly reports on the block effect, but then focuses on interpreting each of the six individual predictors. It is a common misunderstanding that the coefficients of individual predictors are different in the final model of a sequential compared to a standard regression. If you doubt this, prove it to yourself by doing both and comparing the final models. Okay, off my soapbox and back to specifying the standard logistic regression. Because one of the predictors is categorical, SPSS must be told, otherwise it will be incorrectly treated as a metric variable. In the Logistic Regression dialogue (see previous screenshot), click the button labeled Categorical. In the dialogue that appears (below left), highlight and move “previous” to the Categorical Covariates box (below right). Notice that the default is Indicator and the reference category is last.
Recall that this variable was coded 0 = no condom, 1 = condom used, and 2 = first time with partner. So, first time with partner is the last category. There are at least three issues with using this as the reference category.
First, the reference group should serve as a useful comparison (e.g., a control group; the group expected to score highest or lowest on Y; a standard treatment). Second, for clarity of interpretation of the results, the reference group should be well defined and not a “wastebasket” category (e.g., “Other” for religion). Third, the reference group should not have a very small sample size relative to the other groups. (Cohen, Cohen, West, & Aiken, 2003, pp. 303-304)
So, click the radio button next to First, then click the Change button (bottom left). Notice that “previous(Indicator)” changes to “previous(Indicator(first))”. This is what we want. Click Continue.
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 12 of 21
After clicking Continue, you are returned to the Logistic Regression dialogue (below left). Notice that the “previous” is now shown as “previous(Cat)”, indicating successful specification of this variable as categorical. Click the Options button, which opens the dialogue below right. Check the following boxes as shown below:
• Classification plots (though not a required part of the assignment, inspection of the plot and comparison to comments in Dr. Diebold’s residency’s logistic regression demonstration might be instructive);
• Hosmer-Lemeshow goodness-of-fit, which will provide output to answer item 5 of the specific assignment instructions and expectations listed at the beginning of this tutorial;
• Casewise listing of residuals, which will provide the probabilities to answer item 10 for participants 12, 53, and 75;
o So that the above provides information on all cases, click the radio button next to “All cases” • Correlations of estimates (though not a required part of the assignment, this will produce a correlation
matrix from which example Table 18.2 could be constructed); • CI for exp(B) 95%, which will provide confidence interval information needed as part of the required
APA table; • Include constant in model, which is standard practice and needed to answer item 11.
Clicking the Continue button will produce the following output:
Logistic Regression Assignment Output Case Processing Summary
Unweighted Casesa N Percent
Selected Cases
Included in Analysis 100 100.0
Missing Cases 0 .0
Total 100 100.0
Unselected Cases 0 .0
Total 100 100.0
a. If weight is in effect, see classification table for the total number of
cases.
This indicates that all 100 cases had valid data on all 7 variables used in the analysis. If a case had missing data on even one of the variables, it would have been excluded.
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 13 of 21
Dependent Variable Encoding
Original Value Internal Value
0 Unprotected 0
1 Condom Used 1 The “previous” variable is categorical and we specified it as such. The output below indicates how the indicator (aka dummy) variables were constructed. The number of indicator variables needed is the number or levels of the variable minus 1; so, here, there are two indicator variables. Parameter coding (1) and (2) will appear in the output as previous(1) and previous(2). Previous(1) represents the level of the variable with the value of 1 in the Parameter coding (1) column, which is the condom used category. Similarly, previous(2) represents the first time with partner category. The reference category is the one with 0 values on both parameter (1) and (2), which is the no condom category. So, previous(1) coefficients and odds ratio represents the condom used group with respect to the no condom used group. Similarly, previous(2) compares the first time with partner group with the no condom used group.
Categorical Variables Codings Frequency Parameter coding
(1) (2)
previous
0 No Condom 50 .000 .000
1 Condom used 47 1.000 .000
2 First Time with partner 3 .000 1.000 Block 0 is the constant only model against which the predictor model will be compared. All cases are predicted to be in the criterion group with the largest number of cases; thus, with 57% in the largest group, the correctly predicted overall percentage is 57.0. Block 0: Beginning Block
Classification Tablea,b Observed Predicted
use Percentage
Correct 0 Unprotected 1 Condom Used
Step 0 use
0 Unprotected 57 0 100.0
1 Condom Used 43 0 .0
Overall Percentage 57.0 a. Constant is included in the model.
b. The cut value is .500 With just the constant in the model, the odds of being in the condom used group is .754. Simple probability = odds ÷ (1 + odds) = .754 ÷ 1.754 = .43, which is the same as the proportion of cases in this level of the criterion—this math is not a coincidence.
Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 0 Constant -.282 .202 1.947 1 .163 .754
This indicates how the dependent variable was coded. The model predicts the level of the dependent variable coded 1.
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 14 of 21
The variables not in the equation at Block 0 is not required to interpret for this assignment. It approximates the significance of each predictor if entered. For a statistical (aka stepwise) regression, the predictor with the largest Score value would enter first.
Variables not in the Equation Score df Sig.
Step 0 Variables
safety .003 1 .953
gender .041 1 .840
sexexp 1.368 1 .242
previous 5.493 2 .064
previous(1) 5.491 1 .019
previous(2) .118 1 .731
selfcon 17.168 1 .000
perceive 21.358 1 .000
Overall Statistics 39.387 7 .000 Block 1 contains the information needed to report and interpret the results of the logistic regression. The Omnibus Tests of Model Coefficients is the statistical test of whether the set of predictors is significantly better than the constant only model (you need to report in text, in proper APA statistical style, the results of this test). As FYI: The -2 Log likelihood value in the Model Summary output is the amount by which the chi-square value changed from the constant only to predictor model. That is, the predictor model has a chi-square of 48.692, so the constant only model had a chi-square of 48.692 + 87.971 = 136.663 (a reduction in chi-square value is an improvement). Block 1: Method = Enter
Omnibus Tests of Model Coefficients Chi-square df Sig.
Step 1
Step 48.692 7 .000
Block 48.692 7 .000
Model 48.692 7 .000
Model Summary
Step -2 Log likelihood Cox & Snell R
Square
Nagelkerke R
Square
1 87.971a .385 .517
a. Estimation terminated at iteration number 5 because parameter
estimates changed by less than .001.
FYI: The Cox & Snell R2 and the Nagelkerke R2 are but 2, and not necessarily the best, of quite a few so-called pseudo-R2 for logistic regression. Two others that are fairly easy to calculate are (a) likelihood ratio RL2, and (b) the traditional R2.
356. 971.87692.48
692.48 )2(2
2 2 =
+ =
−+ =
LL RL χ
χ
To calculate R2 you need to save the predicted probabilities, then square the correlation between the dichotomous criterion and the predicted probability. Here, you would find R = .64, so R2 = .410.
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 15 of 21
The Hosmer and Lemeshow test assesses the fit of the model. A nonsignificant result (i.e., p > .05) indicates a good fit. To satisfy item 5 of the assignment you must report in text (in proper APA statistical style) and interpret the results of this test.
Hosmer and Lemeshow Test
Step Chi-square df Sig.
1 9.188 8 .327 Interpretation of the following contingency table is not required. For the inquisitive, see Dr. Diebold’s residency’s logistic regression demonstration for an understanding of how to interpret this for a poor fitting model.
Contingency Table for Hosmer and Lemeshow Test use = 0 Unprotected use = 1 Condom Used Total
Observed Expected Observed Expected
Step 1
1 10 9.816 0 .184 10
2 10 9.406 0 .594 10
3 7 8.672 3 1.328 10
4 7 7.912 3 2.088 10
5 9 6.791 1 3.209 10
6 4 5.526 6 4.474 10
7 6 4.099 4 5.901 10
8 2 2.793 8 7.207 10
9 1 1.403 9 8.597 10
10 1 .583 9 9.417 10 For the assignment you need to report the percentage correctly classified for each level of the dichotomous criterion variable, and the overall correct percentage. Not required, but for understanding of how these relate to sensitivity, specificity, cut point effects and the somewhat arbitrary nature of the table, and examination of classification as a continuous variable (i.e., ROC curve analysis), see Dr. Diebold’s residency’s logistic regression demonstration).
Classification Tablea Observed Predicted
use Percentage
Correct 0 Unprotected 1 Condom Used
Step 1 use
0 Unprotected 47 10 82.5
1 Condom Used 12 31 72.1
Overall Percentage 78.0 a. The cut value is .500
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 16 of 21
The output below is needed for items 7-10 of the specific assignment instructions and expectations. For ease of reference, these 4 assignment items are reprinted below:
7. For any statistically significant predictor (i.e., p < .05), report and interpret in text the odds ratio, for example: Variable X had a statistically significant odds ratio of 1.72, B = .542, p = .013, indicating that a one point increase in X was associated with 72% greater odds of using a condom.
a. Do not attempt to rank order the importance of statistically significant predictors; such rank ordering is likely to be incorrect. For information on how to properly assess the relative importance of metric predictors see Dr. Diebold’s residency’s logistic regression demonstration.
8. For predictors that were not statistically significant, simply report them in text without odds ratio, B, or p value. 9. Provide an APA table of logistic regression results as demonstrated in the table below (it already includes the
information for the constant and gender). You should use the table function in Microsoft Word, not spaces and tabs, to construct the table.
a. In the real world, though not required for this assignment, you would also include two additional tables: (a) means and frequencies as demonstrated in Table 18.1, and (b) intercorrelations as demonstrated in Table 18.2. These are taken from Nicol and Pexman (2010) and shown in the appendix of this tutorial.
10. A female who used a condom in her previous encounter scores 2 on all variables except perceived risk (for which she scores 6). Use the model to estimate the probability that she will use a condom in her next encounter. (Note. See comment in the assignment output below for how to do this).
a. The “scores 2 on all variables except perceived risk” is tricky, if not downright misleading. For gender, previous(1), and previous(2) you must use values implied by “ a female who used a condom in her previous encounter”.
For item 10, you can create a table to aid the calculation as Field demonstrates in the companion website (Caution: Field’s result is incorrect because it used incorrect value for the constant. You will not get credit for this item if you report Field’s result). Or, you can write out the equation, substitute in the specified variable values, and do the math. Log odds of condom use = Z = -4.960 -.482(safety) + .003(gender) + .180(sexexp) + 1.087(previous(1)) - .017(previous(2)) + .348(selfcon) + .949(perceive).
Probability of condom use = Ze−+1
1 . To calculate e-Z you can use a scientific calculator, or type the following
into an Excel cell: =exp(-Z), substituting, of course, the actual calculated value of Z (be sure to exponentiate –Z, not Z).
Variables in the Equation B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)
Lower Upper
Step 1a
safety -.482 .236 4.178 1 .041 .617 .389 .980
gender .003 .573 .000 1 .996 1.003 .326 3.081
sexexp .180 .112 2.614 1 .106 1.198 .962 1.491
previous 4.033 2 .133
previous(1) 1.087 .552 3.880 1 .049 2.966 1.005 8.750
previous(2) -.017 1.400 .000 1 .991 .984 .063 15.289
selfcon .348 .127 7.511 1 .006 1.416 1.104 1.815
perceive .949 .237 16.040 1 .000 2.583 1.624 4.111
Constant -4.960 1.147 18.714 1 .000 .007
a. Variable(s) entered on step 1: safety, gender, sexexp, previous, selfcon, perceive.
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 17 of 21
In the coefficients table on the previous page, take note of the entries for previous, previous(1), and previous(2). Explanation of what previous(1) and previous(2) represent have already been provided. Having ‘previous” in the model is often confusing to students. This is the overall effect of the 3-level grouping variable. It does not have B or odds ratio values, but it does have an omnibus significance value. The indicator variables—here, previous(1) and previous(2)—should not be interpreted, even if statistically significant, unless the omnibus test of the variable is, itself, statistically significant (Menard, 2002). As FYI, the correlation table below could be used to construct a table such as that demonstrated in Table 18.2 in the Appendix of this tutorial. Unfortunately, SPSS output does not provide the significance values for the correlations. You could run a regular correlation among the variables, which would include p values, but you would have to create the previous(1) and previous(2) dichotomies. Or, you could find critical r values at .05, .01. and .001 levels in a table.
Correlation Matrix Constant safety gender sexexp previous(1) previous(2) selfcon perceive
Step 1
Constant 1.000 -.036 -.246 -.473 -.370 -.142 -.434 -.563
safety -.036 1.000 .214 -.016 -.103 -.029 -.059 -.611
gender -.246 .214 1.000 -.150 .062 .051 -.174 .004
sexexp -.473 -.016 -.150 1.000 .026 .136 .005 .192
previous(1) -.370 -.103 .062 .026 1.000 .189 .087 .169
previous(2) -.142 -.029 .051 .136 .189 1.000 -.113 .074
selfcon -.434 -.059 -.174 .005 .087 -.113 1.000 .049
perceive -.563 -.611 .004 .192 .169 .074 .049 1.000 The casewise list below can be used to answer item 11 by finding the case number and reading from the Predicted column (recall we sorted the file by the particip variable so that participant number and SPSS case number would be the same). For example, for participant #1, the probability of using a condom was .217. (Note. I cut some of the rows in the output to conserve space).
Casewise List
Case Selected Statusa Observed Predicted Predicted Group Temporary Variable
use Resid ZResid
1 S U .217 U -.217 -.527
2 S C** .142 U .858 2.455
3 S C .829 C .171 .455
4 S C .707 C .293 .644
5 S U** .508 C -.508 -1.016
6 S U .091 U -.091 -.315
7 S C .583 C .417 .845
8 S U .110 U -.110 -.352
9 S U .028 U -.028 -.169
10 S U .073 U -.073 -.280
11 S U** .640 C -.640 -1.332
12 S C .758 C .242 .565
13 S U .336 U -.336 -.712
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 18 of 21
14 S U .390 U -.390 -.799
15 S C** .466 U .534 1.070
50 S C .899 C .101 .335
51 S U .199 U -.199 -.499
52 S C .872 C .128 .383
53 S U .326 U -.326 -.696
54 S U .038 U -.038 -.199
55 S U** .624 C -.624 -1.287
56 S C** .441 U .559 1.125
57 S C** .164 U .836 2.258
58 S U .108 U -.108 -.349
59 S U .018 U -.018 -.133
70 S U .414 U -.414 -.840
71 S C .844 C .156 .430
72 S U .388 U -.388 -.797
73 S U** .603 C -.603 -1.232
74 S U .055 U -.055 -.240
75 S U .273 U -.273 -.613
76 S U .022 U -.022 -.151
77 S C .955 C .045 .216
78 S U .019 U -.019 -.139
79 S C .602 C .398 .813
80 S U .478 U -.478 -.957
81 S U .206 U -.206 -.509
82 S C .921 C .079 .292
83 S U** .916 C -.916 -3.294
84 S U .020 U -.020 -.143
89 S C .942 C .058 .248
90 S U .307 U -.307 -.666
91 S C .914 C .086 .306
92 S C .842 C .158 .433
93 S U** .663 C -.663 -1.403
94 S U .101 U -.101 -.335
95 S U .308 U -.308 -.667
96 S C .716 C .284 .629
97 S C .840 C .160 .436
98 S C .927 C .073 .281
99 S C** .150 U .850 2.380
100 S U .181 U -.181 -.471
a. S = Selected, U = Unselected cases, and ** = Misclassified cases.
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 19 of 21
Step number: 1 Observed Groups and Predicted Probabilities 8 + + I I I I F I I R 6 + U + E I U I Q I U I U I U I E 4 + UU C + N I UU C I C I UU U U C U C C C C I Y I UU U U C U C C C C I 2 + UU U U UU U UU C U U C C U C C C C C CC C + I UU U U UU U UU C U U C C U C C C C C CC C I IUUUUUU UUUUU CCCUUUUUUU U U UU U CU U CUCUC U U C C U UU CU C CC U CC C C CC C U UCCCC CC I IUUUUUU UUUUU CCCUUUUUUU U U UU U CU U CUCUC U U C C U UU CU C CC U CC C C CC C U UCCCC CC I Predicted ---------+---------+---------+---------+---------+---------+---------+---------+---------+---------- Prob: 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 Group: UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC Predicted Probability is of Membership for Condom Used The Cut Value is .50 Symbols: U - Unprotected C - Condom Used Each Symbol Represents .5 Cases.
The classification plot is not required for the assignment. If interested in understanding the plot, see Dr. Diebold’s residency’s logistic regression demonstration.
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 20 of 21
References and Recommended Reading Cohen, J., Cohen, P., West, S.G., & Aiken, L.S. (2003). Applied multiple regression/correlation analysis for the
behavioral sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York, NY: Wiley.
Menard, S. (2002). Applied logistic regression analysis (2nd ed.). Thousand Oaks, CA: Sage.
O’Connell, A.A., & Amico, K.R. (2010). Logistic regression. In G.R. Hancock & R.O. Mueller (Eds.), The reviewer’s guide to quantitative methods in the social sciences (pp. 221-239). New York, NY: Routledge.
© Charles T. Diebold, Ph.D., 7/30/13. All Rights Reserved. Page 21 of 21
Appendix: Example of Additional APA Tables for Logistic Regression From: Nicol, A.A.M., & Pexman, P.M. (2010). Presenting your findings: A practical guide for creating tables
(6th ed.). Washington, DC: American Psychological Association.