Exercise 4: Logits and Logits

IMPORTANT NOTE: The Multinomial program appears to be constantly under construction. I recommend using SPSS 23, which is what I used to test these runs. This is installed on the LRC computers. Remember to change the defaults in SPSS to variable names and alphabetical under general. Make the output labels (INCLUDING pivot tables) variable name and label, value label name and number.

READINGS

GUIDE 1: ISSUES IN MODELING
GUIDE 2: TERMINLOGY
GUIDE 3: THE LOWLY 2 X 2 TABLE GUIDE 4: BASICS ON FITTING MODELS
GUIDE 5: SOME REVIEW, EXTENSIONS, LOGITS GUIDE 6: LOGLINEAR & LOGIT MODELS
GUIDE 7: LOG-ODDS AND MEASURES OF FIT
GUIDE 8: LOGITS,LAMBDAS & OTHER GENERAL THOUGHTS

OVERVIEW

EDF 6937-01 SPRING 2017
THE MULTIVARIATE ANALYSIS OF CATEGORICAL DATA
EXERCISE 4: LOGITS AND MORE LOGITS
DUE TUESDAY APRIL 25TH
We will discuss on the 11 and 18th
20 points
Susan Carol Losh
Department of Educational Psychology and Learning Systems
Florida State University

In this exercise, you will investigate a possible two-stage causal model for the boy or girl gene question (DADGENE, coded so that 1 = right answer and 2 = everything else), DEGLEV4 (4 levels of education, 1 = < high school, 2= some college, 3 = 4 year degree, 4 = graduate school), GENDER, (1 = male, 2 = female), and MADEG2 (recoded mother's highest degree, 1 < high school; 2 = some college plus* ).

*Why mother's highest degree? On general public USA samples, such as the General Social Survey, we can lose as many as 25% of our cases if we use father's highest degree instead (the person did not know their father well enough to answer--death, divorce, mom never married, etc.)

As you may have guessed we're using the 2016 brand new General Social Survey data, gss1016.sav, which you will find in the Data and Output folder under COURSE DOCUMENTS.

To start SPSS, just click on the gss2016.sav link in the Blackboard Data and Output folder..

We'll use the Model selection and General programs and the multinomial logistic regression program.

This exercise moves us to logistic regression.

You can do either binomial or multinomial logistic regression through the multinomial regression progress in SPSS under Regression in the Analysis menu.
After a little SPSS exploration, I recommend the multinomial program as the first choice for either.

You will predict first DEGLEV4 and then DADGENE.

In the multinomial dependent variable distribution, DEGLEV4 has four categories or values to it, from high school or less to an advanced college degree. (Please use DEGLEV4 ONLY.)

The possible model is:

Gender and mother's degree level influence the individual adult's degree level.
Gender, mother's highest degree, and respondent's degree level influence answers to the DADGENE science question.

This model is testable and can be falsified if the data fail to support it. If the data are consistent with the model, this does NOT mean the model is "true" but that it is suggestive and was not falsified by the analytic results.

The multinomial logistic regression package offers you choices about how you want to create the contrasts on your dependent variable when you use DEGLEV4 (or DADGENE for that matter)..

PROGRAM AND OTHER NUANCES TABLES PRELIMINARIES PARAMETER
ESTIMATES ASSIGNMENT
QUESTIONS

PROGRAM AND OTHER NUANCES	TABLES	PRELIMINARIES	PARAMETER ESTIMATES	ASSIGNMENT QUESTIONS

REVIEW: PROGRAM AND OTHER NOTES FOR THIS EXERCISE

Remember: use SPSS 23.

(1) You'll use the SATURATED program run from MODEL SELECTION to examine the four variable model and to identify what appears to be the most likely final model.

THEN

Run the GENERAL program for all main, association and interaction effects for your "Best Model". If you use the saturated model in General, just select that (it's the default too) under Model.

In the GENERAL program: if you are not using the saturated model, then be sure to enter the main effects in the model FIRST. Very strange things happen to the effect estimate parameters if you don't.

The exception is a four variable interaction effect, which the program will run as a saturated model and generate all the lower order terms.

(2) You will use the Loglinear General... program to test your chosen model.

Examine the causal order here: Gender and MADEG2 degree attainment answer to the science "boy or girl" question.
Be sure to use MADEG

In addition, the model could postulate a possible direct causal effect of either gender or mother's degree (or both) on DADGENE.
All kinds of interaction effects are also possible.

Can either knowing some basic genetics (X and Y chromosomes) or having a college degree change someone's gender?

(I guess one never knows...but see me for a brief biology lesson if you think either one is likely to occur.)

Is it more likely that gender influenced level of educational attainment or is it more likely that the highest degree level attained caused one's gender? Which PROBABLY came first in time (remember these are adults): educational level or question knowledge as an adult? Wouldn't mother's highest degree generally be more likely to affect one's own educational level than the reverse?

To say that gender has a DIRECT causal effect on the science question means that the gender by dadgene PARTIAL ASSOCIATION is nonzero, controlling for other variables in the equation (i.e., degree level or mother's degree in this example).

To say that gender has an INDIRECT causal effect on science question knowledge statistically means that both (1) the gender by degree level PARTIAL ASSOCIATION is nonzero AND (2) the degree level by dadgene PARTIAL ASSOCIATION is nonzero.

TABLES

The data presented below are from the 2016 General Social Survey 2016 and are the WEIGHTED data.
There are 1298 weighted cases with scores on ALL FOUR variables.

In your SPSS runs, first under "data" do a SELECT IF DADGENE < 9. That way your multinomial run on degree level will use the same cases as the multinomial run on DADGENE. Double check for the n on the DEGLEV4 run!

MOTHER'S EDUCATION: HIGH SCHOOL OR LESS

GENDER

MALE

FEMALE

DEGREE LEVEL	<HS	SOME COLLEGE	BA	>BA		<HS	SOME COLLEGE	BA	>BA
RIGHT ANSWER BOYORGRL	38.5%	43.2%	59.7%	65.7%	186	66.3%	77.5%	76.7%	73.5%	407
EVERYTHING ELSE	61.5	56.8	40.3	34.3	231	33.8	22.5	23.3	26.5	181
	100% 278	100% 37	100% 67	100% 35	417	100% 409	100% 40	100% 90	100% 49	588

MOTHER'S EDUCATION: SOME COLLEGE OR MORE

GENDER

MALE

FEMALE

DEGREE LEVEL	<HS	SOME COLLEGE	BA	>BA		<HS	SOME COLLEGE	BA	>BA
RIGHT ANSWER BOYORGRL	45.9%	57.1%	59.0%	55.0%	66	48.4%	85.7%	84.8%	83.8%	118
EVERYTHING ELSE	54.1	42.9	41.0	45.0	61	51.6	14.3	15.2	16.2	48
	100% 61	100% 7	100% 39	100% 20	127	100% 62	100% 21	100% 46	100% 37	166

PRELIMINARIES AND THE SATURATED MODEL: YOUR SPSS GENERAL PROGRAM RUN

Open the SPSS program and load the gss2016.sav file into the Data Editor.

This is the four variable model: gender, mother's highest degree, less than or high school graduate vs. some college or more (MADEG2), deglev4 and DADGENE. Check this model through the Model Selection program for your best model and then estimate the model using the General program. For General, make sure to include the parameter estimates and that the constant term box is checked.

TIP (General, SPSS recommends): under the Options portion, check Estimates and leave the checks on Frequencies and Residuals. Uncheck any options under Plots. The adjusted residuals will come out in your Frequencies/Residuals table and you will save some paper when you print your results (plots takes A LOT of paper).

If possible and appropriate, test for your direct and indirect effects by dropping the terms that correspond to those particular partial associations. If the G² goes up significantly when you drop terms, you must return those terms to the model. Use the partitioning of nested models with the G²s and their associated degrees of freedom. Use a X² table (in the back of most texts if needed) to see if the difference in the G²s is statistically significant. Alternatively you can run and then use the partial association tests from the MODEL SELECTION program run, which will deliver equivalent substantive results.

Turn in your output with your exercise answers. (Remember the Discussion Board takes pdf and docx formats!)

THE MULTINOMIAL LOGISTIC REGRESSION PROGRAM

Under Analyze and Regression, go to the Multinomial Logistic Regression program

In phase 1, in the Dependent box, put DEGLEV4
For the "reference category," change "last" to "first".
This will make the High School or Less group the refence group (similar to dummy variable analysis in regression) for the other 3 education groups. In this case, our reference category has the greatest number or cases (about 2/3 of the total) and can serve as our comparison point.

Enter these variables into the Factors box: gender madeg2

Check Custom/stepwise for the model choice under Model.
Use the TOP Build Terms box ("forced entry"), highlight "main effects" and pull over gender and madeg2 to the forced entry box.

Then, highlight "interaction", hold down the control key, highlight both gender and madeg2 and move to the forced entry box.
It will appear as gender*madeg2

Leave the check on the Intercept term.

Click "continue".

Under Statistics, check the "classification table" and "goodness of fit" options. You want to keep the checks on the other options.

Click "continue".

Then click OK.

We're double checking to see if there's a three way interaction among gender, madeg2 and deglev4.

Now, we'll double check to see if there's a four way interaction among gender, madeg2 deglev4 and dadgene also:

Now, for predicting DADGENE through the multinomial program.

Make DADGENE your Dependent variable.
Make the "ref" (referent) category as the FIRST category. (That's the "correct" answer, True, or "1")

Make gender, deglev4 and madeg2 the Factor(s)

Check Custom/stepwise for the model choice under Model.
Use the TOP Build Terms box to make Gender DEGLEV2 and MADEG2 the Forced Entry Terms:
Keep the check on Include intercept in the model box.

For the 4 way interaction term:

Hold down the control key and select gender madeg2 and deglev4.

Load the gender*madeg2*deglev4 (implicitly the *dadgene variable too) interaction into the Forced Entry box.

Click on Continue.

Under Statistics..., keep the already checked statistics.
In addition, check the "classification table" and "goodness of fit" and click on Continue.

click on OK

If you list interaction effects (e.g., gender by madeg2) as we are doing initially, this means you are really looking at the three-way interaction among gender, madeg2 and deglev4. In general, don't plan to include include interactions unless they really are part of your best model. The program will automatically fit the interactions and associations among the independent variables, but that is in the background, out of sight. It will also fit all the univariate marginals for the independent variables and you won't see those on the logit output either.

You have just completed the program run for the predictive equation for dadgene.

THOUGHT: What happens to your postulated causal model if gender has NO direct effect on DADGENE?
ANOTHER THOUGHT: Remember any interaction effects from your first logistic regression run. What did these mean?

ASSIGNMENT QUESTIONS

1. Your SPSS GENERAL and MULTINOMIAL LOGISTIC REGRESSION (2 runs) output (2 points)

Although your output does not have a large weight, you must turn it in. That way, if needed, I can compare your output and your exercise answers. (I'm assuming you had the frequencies runs from Exercise 3.)

PLUS YOUR ANSWERS TO QUESTIONS 2 - 10 BELOW:

Questions 2-10 use the GENERAL and LOGISTIC REGRESSION RUNS RESULTS (consulting them together)

2. (1 point) Did gender have a causal effect on deglev4? How did you know?

3. (1 point) Did madeg2 have a causal effect on deglev4? How did you know?

4. (2 points) Did you have any moderator or interaction effects of gender and madeg2 on deglev4?
How did you know?

5. (3 points) Now, for DADGENE:

Did gender have any causal effect on DADGENE?
Was this effect direct or indirect or moderated? (NOTE: or, of course, nonexistent!)
Briefly, how did you know?

6. (3 points)

Did madeg2 have any causal effect on DADGENE?
Was this effect direct or indirect or moderated? (NOTE: or, of course, nonexistent!)
Briefly, how did you know?

7. (2 points) What kind of effect did DEGLEV4 have on DADGENE? What it direct or indirect (or nonexistent?)

8 (2 points) Write out the entire (i.e., use all the appropriate coefficients you were given) numeric logistic regression estimates for DEGLEV4. (Remember the constant term!)

Star (*) or bold (or otherwise indicate) the coefficients that were statistically significant.

9. (2 points) Write out the entire (i.e., use all the appropriate coefficients you were given) numeric logistic regression estimate for DADGENE.
(Remember the constant term!)

Star (*) or bold (or otherwise indicate) the coefficients that were statistically significant.

(NOTE: How do you want to handle the terms that were statistically zero or no effect? Good idea to mention in questions 8 and 9.)

10. (2 points) Using all your output together,what do you think describes the best causal model to describe how gender, mother's education and respondent's degree affect the science question. This means talking about the associations and possible interactions among the variables, not presenting numeric loglinear results or symbols. Imagine that you are describing the results in a non-technical fashion to a colleague at a conference who is not familiar with loglinear or logit analysis. (You are, of course, allowed to allude to raising and lower effects and relative magnitude...)

Use BOTH words and a diagram to describe this model.

OVERVIEW READINGS

This page created with Netscape Composer
Susan Carol Losh
April 12 2017