EDF 5481 METHODS OF EDUCATIONAL RESEARCH
FALL 2017

 GUIDE 2: VARIABLES AND HYPOTHESES SUSAN CAROL LOSH

 KEY TAKEAWAYS: Research begins with WHAT you want to find out, not how you plan to discover it. Conceptual variables are about abstract constructs; operational variables ("operational definitions") are the concrete operations, measures, or procedures used to measure the concept in practice. A confounded variable is  multidimensional, it is a variable in which several variables are simultaneously embedded. (This interferes with establishing causality!) If one variable causes a second, they should correlate. Causation implies correlation. BUT correlation does NOT imply causation. Causes are called INDEPENDENT VARIABLES. If one variable truly causes a second, it is the independent variable. Independent variables may be also called explanatory variables or predictors. Effects are called DEPENDENT VARIABLES. We explain what has caused dependent variables. Dependent variables may be also called outcome, response or criterion variables. Two variables may be associated but we cannot designate cause and effect. These are symmetric relationships. In asymmetric relationships, we CAN designate cause and effect.  A mediating variable  links between the independent and the dependent variable. Thus, a mediating or mediator variable is part of a causal chain. Hypotheses link variables, in causal assertions. An hypothesis may describe whether or not a relationship exists, possible causal direction of the relationship ("null" hypotheses are directionless), the mechanics (how) of the relationship; even the form of the relationship. Hypotheses should be falsifiable. Three basic levels of measurement are nominal (categories are different), ordinal (categories are ordered), and interval-ratio (categories are numbers). Even a two category variable can be ordinal if we can rank the categories ("yes I smoked a cigarette" is more than "no I didn't").

Where are the data collection methods?

Before you design an experiment or a survey or an ethnography, you must consider basic issues in hypotheses, whether your variables can approximate numbers or are clearly just categories, and whether your variables are unidimensional or multidimensional. That's what we will do in this guide.

All too often, the student says "I want to do an experiment that will..." or "I want to do a survey" or "I want to do an ethnography of..."

 Research begins with WHAT you want to find out, not how you plan to discover it.

 CONCEPTUAL VARIABLES, OPERATIONAL DEFINITIONS

CONCEPTUAL VARIABLES are what you think the entity really is or what it means. Conceptual variables are about abstract constructs. YOU DO NOT DISCUSS MEASUREMENT AT THIS STAGE! Examples include "achievement motivation" or "career choice" or "second language". You are describing a concept.

On the other hand, OPERATIONAL VARIABLES  (sometimes called "operational definitions") are how you actually measure this entity, or the concrete operations, measures, or procedures that you use to measure the concept in practice. If you use a Stanford-Binet to measure intelligence or a bar code scan to assess the popularity of musical artists, those are operational variables.

Why should we care about the difference? A conceptual definition is broader. A particular concept or construct can be operationalized in several different ways. For example, disengagement among students or team members can be measured through absence records, rates of volunteerism, expressions of enthusiasm, and so on.

To complicate matters further, an operational construct may measure many things besides the original concept you are interested in. A Stanford-Binet IQ test may measure "native ability," but also disabilities, language facility, format response set, and other factors extraneous to "native ability." This makes it doubly important to carefully define your conceptual variable.

***For Assignment One (not yet available), you will address CONCEPTUAL VARIABLES, and the RELATIONSHIPS AMONG CONCEPTUAL VARIABLES.

EXAMPLES

 CONCEPTUAL VARIABLE OPERATIONAL VARIABLE Letter recognition Scores on a particular test Culture Use of "Standard American English" Bipolar Disorder Score patterns on a diagnostic Collective Efficacy Formation of online study groups

 OUCH! CONFOUNDED VARIABLES

A confounded variable is a multidimensional variable, it is a variable in which several variables are simultaneously embedded. Because this variable is multidimensional, we do not know precisely what it means or measures. This causes tremendous problems. If a confounded variable is supposed to be a cause, we cannot isolate exactly what was the specific cause of some phenomenon.

Whenever possible, avoid confounded variables because they muddle and confuse any kind of causal assertions.

EXAMPLES:

Educational level is one of the worst confounded variables because it simultaneously taps:

• cognitive sophistication
• tolerance of diversity
• exposure to higher levels of math or science
• age (which is currently related to educational level in many countries)
• social class and other variables.
OPTIONAL:See a copy of my Skeptical Inquirer article (I have no idea who ssavage is); see the end for the real authors: me and several students) about the ambiguities in the variable "level of formal education" HERE.

Experimental treatments that either deliberately or inadvertently include too many variables in a single treatment.

• For example, suppose you designed a treatment to help people stop smoking. Because you are really dedicated, you assigned the same individuals simultaneously to (1) a "stop smoking" nicotine patch; (2) a "quit buddy"; and (3) a discussion support group. Compared with a group in which no intervention at all occurred, your experimental group now smokes 10 fewer cigarettes per day.
Now comes the hard part:

which treatment caused the decrease in smoking? The patch? The buddy? The support group? Because the experimental group received all three treatments at once, you cannot precisely specify which causal variable or combination was the most important.

To solve the confounded variable problem, you must carefully see that each operational variable measures one and only one construct. This may mean more experimental groups (at least four groups in my example above, including a control group.) It may mean that you must use a variety of question formats in your "standardized test" to control for question format effects.

 INDEPENDENT, DEPENDENT AND INTERVENING (mediating) VARIABLES: A PRIMER

If one variable causes a second variable, they should correlate (have a real relationship). Causation implies correlation.

However, two variables could be associated without having a causal relationship. For example, such a spurious relationship (apparently, but not truly causal) could occur because both the supposed independent variable and the supposed dependent variable are caused by a third variable.

DID YOU KNOW?

There is an apparent correlation between ice cream consumption and the number of bodily assaults.
However, this apparent correlation probably doesn't happen because some mystery ingredient in ice cream provokes violence. Rather the correlation occurs statistically because the hot temperatures of summer cause both ice cream consumption and assaults to increase.Thus, correlation does NOT imply causation.

Recall that causes are called INDEPENDENT VARIABLES. If one variable truly causes a second, the cause is the independent variable.

Independent variables are often also called explanatory variables or predictors.

Effects are called DEPENDENT VARIABLES. We explain what has caused dependent variables.

Dependent variables are also sometimes called outcome, response or criterion variables.

Two variables may be associated but we cannot designate cause and effect. These are symmetric relationships.

In asymmetric relationships, we CAN designate cause and effect.

EXAMPLE:  Married or cohabiting people average better mental health than unmarried people. However, we have evidence that marriage promotes mental health AND ALSO that mentally healthy people are more likely to marry. Thus, we can't clearly and unambigously designate cause and effect without further information. This is a symmetric relationship.*

*Recent research indicates single people go to bars and drink more often, which may inflence their mental health.

EXAMPLE: Someone's gender is linked to their level of basic science knowledge. While it is possible that being male or female might lead to differential interests, hence to sex-linked science scores, it is IMPOSSIBLE (in nearly all cases) for your basic science score to make you male or female, or to change your biological sex. Because cause and effect can unambiguously be designated, this is an asymmetric relationship.

MEDIATING VARIABLES

I define a mediating variable as one that links between the independent and the dependent variable. Thus, an mediating or mediator variable is part of a causal chain:

INDEPENDENT VARIABLE -------> MEDIATOR VARIABLE ------> DEPENDENT VARIABLE

EXAMPLE: educational level is a cause of science attitudes because educational level influences the type of occupation someone has (mediator variable), and it is the occupational type that affects science attitudes.

Mediator variables inform us about causal sequences or chains, thus explaining the causal process of a phenomenon.

EXAMPLE: educational level -----> occupational type -----> income level

While I would love to say that employers will pay you just because you have a college degree, in fact, it is the job you obtain (often thanks to the degree) that pays the salary. The job is the mediating variable between educational level and income level.

Mediating variables certainly CAN be measured. They are critical to use in non experimental research designs. Often they can specify what it is about the dependent variable that is important.

Disentangling independent, mediator and dependent variables can be critical if you are in a clinical occupation. If you want to help clients create changes, it is imperative to know which changes will really have an impact.

 It's easy to confuse the terms "mediator variable" with something called a "moderator variable"; however, a "moderator variable" is something quite different.

 CONCEPTUAL HYPOTHESES, OPERATIONAL HYPOTHESES, NULL HYPOTHESES

Hypotheses link variables, typically independent, mediating, and dependent variables in causal assertions. An hypothesis may describe whether there is a relationship, no relationship predicted at all, the causal direction of the relationship, the mechanics (how) of the relationship, and may even specify the form of the relationship.

Hypotheses should be falsifiable through logic or ultimately (for operational and null hypotheses) through empirical test.
This property is absolutely critical in scientific research.
If an article you read does not address falsifiable hypotheses in some way, its assertions aren't science.

Two of the things that make science "science" are (1) falsifiable hypotheses and (2) The self-corrective process of replication.

A CONCEPTUAL HYPOTHESIS links at least two conceptual variables. Typically, this is stated in some type of cause and effect manner.

EXAMPLE:

 Aerobic exercise will reduce levels of "state anxiety." Independent variable direction of effect Dependent variable

EXAMPLE:

 Young chronological age will increase ease of second language learning Independent variable direction of effect Dependent variable

EXAMPLE: An external threat raises team cohesiveness.

Notice that I have never stated how we will measure aerobic exercise, state anxiety, second language learning, external threat, or cohesion. At this stage I need to develop and define what these terms actually mean and how or why I expect them to be linked together.

For example, I could discuss how an external threat makes social identity salient and thus helps team members to work together better. Or I might show how the endorphins generated through aerobic exercise allay anxiety. (In these examples, "salience of social identity" or "endorphins" are mediator variables.)

AN OPERATIONAL HYPOTHESIS links at least two operational variables. Again, some type of cause and effect is usually present in the hypothesis.

EXAMPLE: Children with an encyclopedia in their home will achieve higher scores on the Stanford-Binet intelligence Test.

EXAMPLE: Fast walking (a 10 minute or less mile) will lower Galvanic Skin Response scores.

NULL HYPOTHESES (0In classical statistics inference testing, it is mathematically the easiest to disprove a null hypothesis, which is sometimes written as Ho:

A null hypothesis is also precisely stated.

A null hypothesis will assert that:

• There is no relationship among two or more variables (EXAMPLE: the correlation between educational level and income is zero)
• Or that two or more populations or subpopulations are essentially the same (EXAMPLE: women and men have the same average science knowledge scores.)
For example, to rewrite the conceptual and operational hypotheses above in null form, we have:

Having an encyclopedia in the home has no effect on children's scores on the Stanford-Binet Intelligence Test.
Fast walking has no effect on Galvanic Skin Response scores.
There is no relationship between an external threat and team cohesiveness.

As you can see, null hypotheses are basically "directionless."

If the null hypothesis is rejected, typically an alternative hypothesis (usually styled HA:) is accepted. Usually the alternative hypothesis will assert that a relationship among two or more variables exists or that two or more subpopulations differ in some respect. A direction to the relationship (e.g., external threat raises team cohesion) may be specified. Directional alternative hypotheses are specified in advance of data collection procedures.

You may not believe your null hypothesis at the time you state it, because, in fact, you believe there is a relationship or that two groups differ. However, a null hypothesis is consistent with more tests of "statistical significance" which may make it a little easier to work with.

It used to be that students had to assert null hypotheses in a thesis, dissertation, conference presentation or article. Now, we are more comfortable with students creating directional hypotheses. Many articles do so now.

 LEVELS OF MEASUREMENT A SHORT STATISTICAL PRESENTATION TO HELP WITH READING AND TO USE IN CRITIQUES

When you read an article or a paper, the types of statistics used should be consistent with the level of measurement:

 NOMINAL ORDINAL INTERVAL-RATIO

A variable is a characteristic or factor that has values that vary. Thus, a variable has at least two different categories or values.

Variables consist of  sets or systems of categories with several properties. Examples of category systems include:

GENDER: Categories = Male and Female

PRIMARY/SECONDARY GRADES: Categories = Kindergarten, first, second, third...and so forth to grade twelve

AGE IN YEARS: Categories = 1, 2, 3, 4, 5, and so forth up to 90 years of age--or even higher.

At a minimum, category systems should be exhaustive (cover all cases) . Each case must be able to fit into a category. Sometimes that means we must construct an all-inclusive "other" category.

Categories of a variable should also be mutually exclusive (each case fits into one and only ONE category).

Other nice category properties--WHEN IT IS POSSIBLE-- include:

a good spread of cases over categories (no category with too large or too small a percentage of cases). Possibilities IF the data allow include a normal ("bell-shaped" or Gaussian distribution) or an equiprobable distribution in which each category has the same number of cases.

a limited number of categories and

equal intervals between categories (this applies only IF the category values are numeric).

TIP: Researchers should try to gather data as completely as possible (for example, get education in number of years rather than degree level) because one can collapse or move around categories later on with computer programs. If the researcher really meant degree level, then ask about degree level explicitly rather than years of education or  "how much" education.

Avoid "open-ended" categories that do not have fixed end points when possible (e.g., "graduate degree or more"--or "\$75000 or more"). However, keep in mind that it may not be possible to use  a final closed category with income.

Make questions and responses explicit enough that respondents or interviewers do not need to guess about the answer. "Guessing" can quickly turn a numeric variable into a non numeric variable.

Nominal, ordinal and interval-ratio variables are different types of category systems. These form a cumulative and hierarchical set of data properties, so that nominal properties are true for ordinal and interval data. And ordinal properties are also true for interval data.  The reverse does NOT hold.

 NOMINAL VARIABLES

With nominal  variables, you can tell whether two cases or instances fall into the same category or into different categories. Thus, you can sort all cases into mutually exclusive, exhaustive categories. That's it!

Examples of nominal variables include:

Zodiac sign

Gender

Birth country and

Religious affiliation (or denomination)

Nominal variables are also sometimes called categorical variables or qualitative variables. The categories are not only not numbers, they do not have any inherent order.

Try these examples:

Who is more? South Koreans or Turks? More WHAT? Country of origin is NOT a number or even a "relative judgment"..

Who is "better"? Women or Men? Better at WHAT? If you suspect that ranking the categories (NOTE: NOT the cases within the categories) would start a war, you probably have nominal variables.

STATS & PRESENTATION ADVICE: You can only do very basic statistics or presentations with nominal data, such as: percents, ratios, rates, frequency distributions (thus charts and graphs), and modes. Of course, many nominal variables are very important, especially as explanatory variables.

 ORDINAL VARIABLES

With ordinal variables, the categories themselves can be rank-ordered from highest to lowest.

This means the scores must be rank-ordered from highest to lowest (or vice versa) first, before any ordinal statistics can be used. Like runners in a race, we can rank scores--and the categories themselves--from first to last, most to least, or highest to lowest.

In rank-ordered cases, we can literally rank order the finishers in a race or the students by their grade point average (first in class, second in class, and so on down to last in class). Notice that the intervals between cases probably are not the same (or equal). The class valedictorian may have a straight-A or 4.0 average, the salutatorian a 3.6, the third student a 3.5, and so on. The fastest runner might run a mile in 5 minutes, the second fastest in 5 minutes 10 seconds (10 seconds slower), the third runner in 6 minutes (50 seconds slower still). So the "distances" between the SCORE TIMES (not the ranks) are unequal.

We can also rank-order the categories of a variable in ordinal data. One example is a Likert scale. Respondents are given a statement, such as "I like the Big Bang Theory" then asked if they:

Strongly Agree        Agree        Disagree         or    Strongly Disagree     with that statement.

We can surmise that someone who "strongly agrees" supports that statement more intensely than someone who "agrees"--but we don't know how much more intensely.

Most Agree-Disagree (Likert) attitude scales are ordinal data.

***********************************************************************************************

 This is fairly obvious when there are 5-7 categories but it is also true when there are only two categories: someone who favors raising teacher salaries obviously is more in favor than someone who opposes the raise. Illustrative example: Someone who smokes cigarettes "at all" and answers "yes" smokes more than someone who smokes zero cigarettes (and answers "no"), even if it is only 1 cigarette more.

***********************************************************************************************

Other types of ordinal data include:

the order of finish (e.g., class rank or a horse race)

"yes-no" experiences (someone who answers "yes" to "Do you play the lottery?" clearly plays more than someone who answers "no"; see the cigarette example above), or

collapses of numeric data into categories with unequal widths or intervals (e.g., collapsing years of education into degree level).

STATS & PRESENTATION ADVICE: Everything that you can do  with nominal data (graphs, modes, etc.) you can do with ordinal data too. In addition, with ordinal data, you can do percentiles, quartiles, and medians (the category that includes the 50th percentile).

Most statistical processing computer programs, such as SPSS, assign numbers to all categories as a default, even to non numeric nominal and ordinal variables. This is for data processing ease and does not give you any clues as to the type of data you have. THE DATA ANALYST MUST MAKE THAT DECISION!

 INTERVAL-RATIO VARIABLES

You can count the number of books and you can't have less than zero. Number of books is a ratio variable.

In addition to the properties of nominal and interval category systems, interval or ratio  variables possess a common and equal unit that separates adjacent or adjoining categories.

EXAMPLES:one year of age or one year of education or one dollar of income. Each of these examples is one equal unit.

These intervals are equal no matter how high up the scale you go.

EXAMPLE:

• the difference between two and three children = one child.
• the difference between eight and nine children also = one child.
EXAMPLE:
• the difference between completing ninth grade and tenth grade is  one year of school
• the difference between completing junior and senior year of college is one year of school
It is the equal interval between adjacent categories, no matter how small or how large the score may be, that makes the data numeric.
• In addition to all the properties of nominal, ordinal, and interval variables, ratio variables also have a fixed/non-arbitrary zero point. Non arbitrary means that it is impossible to go below a score of zero for that variable. For example, any bottom score on IQ or aptitude tests is created by human beings and not nature. On the other hand, scientists believe they have isolated an "absolute zero." You can't get colder than that.
EXAMPLE: 0 children or 0 years of age. You cannot have fewer than zero children or be less than zero years of age. You cannot have less than zero dollars of income (net worth is another story) or less than zero years of formal education.

Most "count variables" (years of age or years of formal education, children, books, dollars) are ratio variables.

STATS & PRESENTATION ADVICE: With numeric data (interval or ratio variables), in addition to all the options that you have with nominal and ordinal variables, the analyst can perform arithmetic operations on the scores: add, subtract, divide and multiply them. Thus you can calculate arithmetic means on numeric data.

It is nonsense to perform arithmetic operations on clearly nominal data.

For example, suppose you have a group of three men and three women. Can you calculate a mean or arithmetic average score? What could it possibly be? It can't be a number because gender category value is a name or tag ("male" "female") that cannot be added or multiplied.

 LEVELS OF ANALYSIS SUMMARY

 TYPE OF VARIABLE CATEGORIES EXHAUSTIVE CATEGORIES MUTUALLY EXCLUSIVE CASES  CAN BE SEPARATED BY CATEGORY CATEGORIES CAN BE RANK-ORDERED CATEGORIES SEPARATED BY EQUAL INTERVAL FIXED OR NON ARBITRARY ZERO NOMINAL X X X ORDINAL X X X X INTERVAL X X X X X RATIO X X X X X X

Susan Carol Losh
August 28 2017