THE MULTIVARIATE ANALYSIS OF CATEGORICAL DATA EXERCISE 4: LOGITS AND MORE LOGITS DUE TUESDAY APRIL 25TH We will discuss on the 11 and 18th 20 points Susan Carol Losh Department of Educational Psychology and Learning Systems Florida State University |
In this exercise, you will investigate a possible two-stage causal model for the boy or girl gene question (DADGENE, coded so that 1 = right answer and 2 = everything else), DEGLEV4 (4 levels of education, 1 = < high school, 2= some college, 3 = 4 year degree, 4 = graduate school), GENDER, (1 = male, 2 = female), and MADEG2 (recoded mother's highest degree, 1 < high school; 2 = some college plus* ).
*Why mother's highest degree? On general public USA samples, such as the General Social Survey, we can lose as many as 25% of our cases if we use father's highest degree instead (the person did not know their father well enough to answer--death, divorce, mom never married, etc.)
As you may have guessed we're using the 2016 brand new General Social Survey data, gss1016.sav, which you will find in the Data and Output folder under COURSE DOCUMENTS.
To start SPSS, just click on the gss2016.sav link in the Blackboard Data and Output folder..
We'll use the Model selection and General programs and the multinomial logistic regression program.
This exercise moves us to logistic regression.
You can do either
binomial or
multinomial logistic regression through the multinomial regression progress
in SPSS under Regression in the Analysis menu.
After a little SPSS exploration, I recommend
the multinomial program as the first choice for either.
You will predict first DEGLEV4 and then DADGENE.
In the multinomial dependent variable distribution, DEGLEV4 has four categories or values to it, from high school or less to an advanced college degree. (Please use DEGLEV4 ONLY.)
The possible model is:
Gender and mother's
degree level influence the individual adult's degree level.
Gender, mother's
highest degree, and respondent's degree level influence answers to the
DADGENE science question.
This model is testable and can be falsified if the data fail to support it. If the data are consistent with the model, this does NOT mean the model is "true" but that it is suggestive and was not falsified by the analytic results.
The multinomial logistic
regression package offers you choices about how you want to create the
contrasts on your dependent variable when you use DEGLEV4 (or DADGENE for
that matter)..
|
|
|
ESTIMATES |
QUESTIONS |
---|
|
Remember: use SPSS 23.
(1) You'll use the SATURATED program run from MODEL SELECTION to examine the four variable model and to identify what appears to be the most likely final model.
THEN
Run the GENERAL program for all main, association and interaction effects for your "Best Model". If you use the saturated model in General, just select that (it's the default too) under Model.
In the GENERAL program: if you are not using the saturated model, then be sure to enter the main effects in the model FIRST. Very strange things happen to the effect estimate parameters if you don't.
The exception is a four variable interaction effect, which the program will run as a saturated model and generate all the lower order terms.
(2) You will use the Loglinear General... program to test your chosen model.
Examine
the causal order here: Gender and MADEG2
degree attainment
answer to the science "boy or girl" question.
Be sure to use MADEG
In addition, the
model could postulate a possible direct causal effect of either gender
or mother's degree (or both) on DADGENE.
All kinds of interaction
effects are also possible.
Can either knowing some basic genetics (X and Y chromosomes) or having a college degree change someone's gender?
(I guess one never knows...but see me for a brief biology lesson if you think either one is likely to occur.)
Is it more likely that gender influenced level of educational attainment or is it more likely that the highest degree level attained caused one's gender? Which PROBABLY came first in time (remember these are adults): educational level or question knowledge as an adult? Wouldn't mother's highest degree generally be more likely to affect one's own educational level than the reverse?
To say that gender has a DIRECT causal effect on the science question means that the gender by dadgene PARTIAL ASSOCIATION is nonzero, controlling for other variables in the equation (i.e., degree level or mother's degree in this example).
To say that gender
has an INDIRECT causal effect on science question knowledge statistically
means that both (1) the gender by degree
level PARTIAL
ASSOCIATION is nonzero AND (2) the degree
level by dadgene
PARTIAL ASSOCIATION
is nonzero.
|
The data presented below are from the
2016 General Social Survey 2016 and are the WEIGHTED data.
There are 1298 weighted cases with
scores on ALL FOUR variables.
In your SPSS runs, first under "data" do a SELECT IF DADGENE < 9. That way your multinomial run on degree level will use the same cases as the multinomial run on DADGENE. Double check for the n on the DEGLEV4 run! |
MOTHER'S EDUCATION: HIGH SCHOOL OR LESS
GENDER | MALE | FEMALE |
DEGREE LEVEL | <HS | SOME
COLLEGE |
BA | >BA | <HS | SOME
COLLEGE |
BA | >BA | |||
|
|
|
|
|
186
|
|
|
|
|
|
407
|
EVERYTHING ELSE |
|
|
|
|
231
|
|
|
|
|
181
|
|
100%
278 |
100%
37 |
100%
67 |
100%
35 |
|
100%
409 |
100%
40 |
100%
90 |
100%
49 |
|
MOTHER'S EDUCATION: SOME COLLEGE OR MORE
GENDER | MALE | FEMALE |
DEGREE LEVEL | <HS | SOME
COLLEGE |
BA | >BA | <HS | SOME
COLLEGE |
BA | >BA | |||
|
|
|
|
|
66
|
|
|
|
|
118
|
|
|
|
|
|
|
61
|
|
|
|
|
48
|
|
100%
61 |
100%
7 |
100%
39 |
100%
20 |
|
100%
62 |
100%
21 |
100%
46 |
100%
37 |
|
|
Open the SPSS program and load the gss2016.sav file into the Data Editor.
This is the four variable model: gender, mother's highest degree, less than or high school graduate vs. some college or more (MADEG2), deglev4 and DADGENE. Check this model through the Model Selection program for your best model and then estimate the model using the General program. For General, make sure to include the parameter estimates and that the constant term box is checked.
TIP (General, SPSS recommends): under the Options portion, check Estimates and leave the checks on Frequencies and Residuals. Uncheck any options under Plots. The adjusted residuals will come out in your Frequencies/Residuals table and you will save some paper when you print your results (plots takes A LOT of paper).
If possible and appropriate, test for your direct and indirect effects by dropping the terms that correspond to those particular partial associations. If the G2 goes up significantly when you drop terms, you must return those terms to the model. Use the partitioning of nested models with the G2s and their associated degrees of freedom. Use a X2 table (in the back of most texts if needed) to see if the difference in the G2s is statistically significant. Alternatively you can run and then use the partial association tests from the MODEL SELECTION program run, which will deliver equivalent substantive results.
Turn in your output with your exercise answers. (Remember the Discussion Board takes pdf and docx formats!)
|
In phase 1, in the
Dependent
box, put DEGLEV4
For the "reference
category," change "last" to "first".
This will make the
High School or Less group the refence group (similar to dummy variable
analysis in regression) for the other 3 education groups. In this case,
our reference category has the greatest number or cases (about 2/3 of the
total) and can serve as our comparison point.
Enter these variables into the Factors box: gender madeg2
Check Custom/stepwise for the model
choice under Model.
Use the TOP
Build Terms box ("forced entry"), highlight "main
effects" and pull over gender and madeg2 to the forced entry box.
Then, highlight "interaction",
hold down the control key, highlight both gender and madeg2 and move to
the forced entry box.
It will appear as
gender*madeg2
Leave the check on the Intercept term.
Click "continue".
Under Statistics, check the "classification table" and "goodness of fit" options. You want to keep the checks on the other options.
Click "continue".
Then click OK.
|
|
Now, for predicting DADGENE through the multinomial program.
Make DADGENE your
Dependent variable.
Make the "ref" (referent) category
as the FIRST category. (That's the "correct" answer, True, or "1")
Make gender, deglev4 and madeg2 the Factor(s)
Check Custom/stepwise for the model
choice under Model.
Use the TOP
Build Terms box to make Gender DEGLEV2 and MADEG2 the Forced Entry
Terms:
Keep the check on Include intercept
in
the model box.
For the 4 way interaction term:
Hold down the control key and select gender madeg2 and deglev4.
Load the gender*madeg2*deglev4 (implicitly the *dadgene variable too) interaction into the Forced Entry box.
Click on Continue.
Under Statistics..., keep the already
checked statistics.
In addition, check
the "classification table" and "goodness of fit" and click on Continue.
click on OK
|
You have just completed the program run for the predictive equation for dadgene.
THOUGHT: What
happens to your postulated causal model if gender has NO direct effect
on DADGENE?
ANOTHER THOUGHT:
Remember any interaction effects from your first logistic regression run.
What did these mean?
|
1. Your SPSS GENERAL and MULTINOMIAL LOGISTIC REGRESSION (2 runs) output (2 points)
Although your output does not have a large weight, you must turn it in. That way, if needed, I can compare your output and your exercise answers. (I'm assuming you had the frequencies runs from Exercise 3.)
PLUS YOUR ANSWERS
TO QUESTIONS 2 - 10
BELOW:
|
2. (1 point) Did gender have a causal effect on deglev4? How did you know?
3. (1 point) Did madeg2 have a causal effect on deglev4? How did you know?
4. (2 points) Did you have any moderator
or interaction effects of gender and madeg2 on deglev4?
How did you know?
5. (3 points) Now, for DADGENE:
Did gender have any causal effect
on DADGENE?
Was this effect direct or indirect or
moderated? (NOTE: or, of course, nonexistent!)
Briefly, how did you know?
6. (3 points)
Did madeg2 have any causal effect on DADGENE?
Was this effect direct or indirect or
moderated? (NOTE: or, of course, nonexistent!)
Briefly, how did you know?
7. (2 points) What kind of effect did DEGLEV4 have on DADGENE? What it direct or indirect (or nonexistent?)
8 (2 points) Write out the entire (i.e., use all the appropriate coefficients you were given) numeric logistic regression estimates for DEGLEV4. (Remember the constant term!)
Star (*) or bold (or otherwise indicate) the coefficients that were statistically significant.
9. (2 points) Write out the entire (i.e.,
use all the appropriate coefficients you were given) numeric logistic
regression estimate for DADGENE.
(Remember the constant term!)
Star (*) or bold (or otherwise indicate) the coefficients that were statistically significant.
(NOTE: How do you want to handle the terms that were statistically zero or no effect? Good idea to mention in questions 8 and 9.)
10. (2 points) Using all your output together,what do you think describes the best causal model to describe how gender, mother's education and respondent's degree affect the science question. This means talking about the associations and possible interactions among the variables, not presenting numeric loglinear results or symbols. Imagine that you are describing the results in a non-technical fashion to a colleague at a conference who is not familiar with loglinear or logit analysis. (You are, of course, allowed to allude to raising and lower effects and relative magnitude...)
Use BOTH words
and a diagram
to describe this model.
|
|
READINGS |
|
This page created with Netscape
Composer
Susan Carol Losh
April 12 2017