METHODS REVISED CALENDAR
 

METHODS READINGS AND ASSIGNMENTS
EDF 5481 METHODS OF EDUCATIONAL RESEARCH
FALL 2017

OVERVIEW

 
GUIDE 1: INTRODUCTION
GUIDE 2: VARIABLES AND HYPOTHESES
GUIDE 3: RELIABILITY, VALIDITY, CAUSALITY, AND EXPERIMENTS
GUIDE 4: EXPERIMENTS & QUASI-EXPERIMENTS
GUIDE 5: A SURVEY RESEARCH PRIMER
GUIDE 6: FOCUS GROUP BASICS
GUIDE 7: LESS STRUCTURED METHODS
GUIDE 8: ARCHIVES AND DATABASES

SUSAN CAROL LOSH

GUIDE 4: QUASI-EXPERIMENTS, INTERNAL VALIDITY, AND EXPERIMENTS II

 
 
KEY TAKEAWAYS:
  • Did the materials you read use a variety of methods in their research--or discuss planning to use a variety of methods in followups?
  • Random assignment of participants to treatments is key in experiments and potentially gives experiments strong internal validity.
  • If a study has different levels of "experimental treatments", and people or groups are assigned to these WITHOUT random assignment, we have a quasi-experiment.
  • Two types of design often conducted more often with quasi-experiments include the time series design (sometimes called a "natural experiment") and the case study.
  • Threats to internal validity are essentially threats to causal control. Below is a list of common threats to drawing causal inferences.
  • Did anyone notice? Did the researcher use a manipulation check? What was the result of that check?
  • In  "double blind studies" neither participants nor the actual administrator (e.g., who hands out the pills) know which treatment participants receive, controlling for everyone's expectations. 
  • Reactivity refers to changes in the study participants' behavior simply because they are [aware of] being studied. It can be surprisingly common.
  • More of a threat to external validity is the issue of the reality of the study setting: "mundane" (resembles "everyday life") versus "experimental."
  • However, "experimental reality" can be VERY engrossing!
  • Research is expected to be ethically conducted; university/college Human Subjects Committees check the procedures (the Federal USA Office of Management and the Budget does Human Subjects review for federal government conducted studies.)

 
QUASI-EXPERIMENTS
INTERNAL VALIDITY
ISSUES IN EXPERIMENTS

At last! We now focus on different ways of conducting studies and gathering data. Each data collection technique has its own set of strengths and weaknesses. That is why it is advisable over the long run for a researcher to conduct a series of studies, all with the same independent and dependent variable(s) but using a mix of experiments, ethnographies, surveys, content analysis, focus groups, and so forth. (Check for these followups in the materials you read germain to your field.)

As we saw in Guide 3, a major strength of true experiments is causal control and strong internal validity. Various threats to internal validity are described in more detail below. In an experiment you can literally build your own independent variables by:

(1) Creating "factors" or levels of some kind of treatment then
(2) Randomly assigning participants or groups to different levels of the treatment.

It is RANDOMIZATION or "random assignment" that is the major contributor to making an experiment a true experiment. Randomization controls for everything that you can think of as an alternative causal explanation--and everything that you cannot.

However, a true experiment is simply not always possible, yet investigators still want to make causal statements. Finally, even if you have conducted a true experiment, all experiments do not have equally strong causal control. Issues with reactivity, with poor measurement, and the nature of control groups can all influence the degree of internal validity in an experimental design.

QUASI-EXPERIMENTS

WHAT MAKES FOR A QUASI-EXPERIMENT?

What makes a true experiment is random assignment of people or groups to treatments. Human judgment plays no role in who gets which experimental condition. The strength of randomization is that it creates two or more groups that are approximately equivalent in the very beginning on the average on just about any characteristic you can imagine (and just about any characteristic you can't imagine).

For example, that was one strength of the HRT experimental study I mentioned earlier over the initial prior observations and clinical reports from individual physicians about the effects of hormones for post-menopausal women. A flip of a coin determined whether the woman received the active ingredient or a placebo. Neither the women nor their immediate contacts and observers knew which pill each woman received (we call that a "double blind").

Of course we are speaking long term and reasonable size samples. If you have two groups of five people each, I wouldn't count on them necessarily being very similar. However, even with as few as 10 people per group you will begin to see the beauty of randomization as a research design.

But randomization just isn't always possible. Some treatment groups are initially formed on the basis of performance (high, medium, low, for example), some variables (e.g., bipolar depressive disorder or gender) just aren't able to be (or desirable to be) experimentally induced.

If your study has different levels of treatments, and people or groups are assigned to those treatments WITHOUT random assignment, you probably have a quasi-experiment.

It's not just having intact groups that creates a quasi-experiment. Individuals who are not in intact groups could enter treatment levels through self-selection, because they are in a particular performance category (that bottom quartile in performance, for example), or because a researcher has "paired" individuals that she or he believes are somehow similar.

However, in cases such as these, self-selection, or regression toward the mean effects are alternative explanations for why you found the results you did instead of the treatment.
 
 

 
The difficulty, in quasi-experiments, is trying to find out just how similar the groups were at the very beginning, before any treatment at all began. Sometimes, in fact, if groups are created on the basis of dissimilarities, such as ability, we know the groups are different at the very beginning. If we have basic or prior information about those who comprise the individuals or groups in the different treatments we may, at least, try to institute statistical controls for those variables.

For example, often a school will introduce an experimental new technique, as yet not well evaluated. Scores on some student measure are taken at the beginning and the end of the study period. Was there any kind of comparison group? Was it a true control group? What did the comparison or control group do instead of the experimental treatment?

How might you find out about just how similar or different groups were at the beginning of a study?

Background information. You might have access to grades, test scores, "personality" tests or other standardized test results collected before the study ever began.

Some kind of pretest measures. These vary from requesting background or "demographic" information such as own or parental education, occupation, or income, to various standardized tests. Be careful, however! Remember that pretests can sensitize people that their behavior is under study and lead to pretest-treatment interaction biases.

Supplemental information from other people. Interviews with teachers, parents, physicians, therapists, or others who know the subjects of study well may provide supplemental information.
 
 
 
Here's the basic problem: even if we assign groups to treatments based on their differences, such as a high ability and low ability group, the groups may differ in other respects, on variables that we never measured at all. For example, the high ability group may be more motivated or more confident, on the average, than the low ability group. And it is perhaps those differences in motivation or confidence, instead of the differences in ability (that we originally thought was the true independent variable), that were the true causes of the treatment outcome differences that were observed.

THREAT! Even if you are able to obtain background, pretest, or supplemental information, you may have never measured the true differences between your groups on other variables. And those true differences that you never measured could be the real causes responsible for the outcome effects that you found in your study. (VERY FRUSTRATING)

Now you can begin to see why quasi-experimental designs pose threats to internal validity.

Be patient for a little bit! We will return to the issue of intact groups shortly. Meanwhile, remember if people were initially assigned to intact groups in a random fashion, you may have a true experiment after all. if you want a review from Guide 3, click HERE.

TYPES OF QUASI-EXPERIMENTAL DESIGNS

Many of the types of quasi-experimental designs are very similar to true experimental designs except that randomization never takes place.

Just as we have in experiments, one group may be assigned a treatment. Then, following the treatment, we measure some type of observation or dependent variable for both the group that received a treatment and the group that did not. Here is just one comparison of a quasi experimental design with the corresponding "true" experimental design:

Where "X" is a particular treatment or intervention and "O" is a measured outcome, and "R" indicates whether participants were randomly assigned to treatment groups.

Why is the control group "nonequivalent" for the quasi-experimental design? Because we did not use random assignment to place subjects in treatment groups, we cannot assume that on the average the groups are the same, or even approximately equivalent on the average, to begin with.

However, two types of design often conducted more with quasi-experimental situations include the time series design (sometimes called a "natural experiment") and the case study.

In the time series design, there are several observations over time. While there may be some type of experimental intervention, often "nature" does the experimenting for you:

In all these cases, we assume that there was a series of "pre-intervention" measures, or that a series of pre-intervention measures could be obtained, which the scholar then continues following the intervention. In all likelihood, there isn't a control group (let alone a randomized control group) so you can't tease out specifically what it was about the intervention, legislation, or therapy that caused the observed outcomes. If you have enough advance warning, you may be able to have more groups, although without randomization of treatments to groups, you still have many of the threats to internal validity listed below.

Case studies occur with some frequency in medical, educational and therapeutic fields. Practitioners who work one on one (such as counselors) or with very small groups (special education classes) are the most likely to use case studies. Subjects are not random, the case base is small, and there may be no control group. As you can guess, causal inference is much more difficult. For example, some people believe that they were abducted by Unidentified Flying Objects (UFOs) then returned to Earth. Much of the research on such individuals is conducted by clinicians. Their patients comprise the sample (sometimes a sample of one) and it uses a case study approach wherein inferences are made about the person's proneness to fantasy construction. However, due to the lack of comparison groups, it can be difficult to ascertain what the true causal variables are.

What's the best that can be done under such circumstances? Impose a time series of observations if possible. If the intervention is under researcher control (dispensing a new medication, for example), impose the intervention, remove it, impose it again, remove it, and so forth. Try to use a double blind (see below) administration if possible and the most objective outcome measures that you can find.

ISSUES WITH USING INTACT GROUPS

Many research methods textbooks virtually define quasi-exeriments as those using intact groups, i.e., groups that existed prior to any treatment or intervention. Normally (say, 75 to 80 percent of the time) this is true. What are important are:

(1) HOW participants entered the groups in the first place;
(2) What happens in the group; and
(3) The length of time groups pre-existed prior to interventions.

If  participants are randomly assigned to groups in the first place (which often happens in schools and universities for classes where there are many equivalent sections), AND the tasks to be performed prior to the intervention are virtually identical in each group, AND the pre-intervention time is short (probably a few weeks at most), THEN if you randomly assign groups to conditions, this is probably a true experiment.

Consider some of the alternatives. Participants may be assigned to groups using pre-existing knowledge about them, and the groups consequently differ on variables related to the study. Sports teams grouped by ability, "tracking" systems in schools, and enlisted versus officers in the military are three examples.

Even if random assignment originally places participants in groups, their curricula and itineraries may be different, thus providing participants in different groups with different experiences.

Finally, bosses and teachers differ in their approaches, again providing subjects in different groups with different experiences which diverge further as time goes on.

So, if you use, for example, randomly selected sections of basic college math, random assignments to treatments AND do at least much of your data collection at the very beginning of the academic year, you probably have a true experimental design. Do your data resemble all these criteria in the example? If not, your design is probably quasi-experimental.
 

THREATS TO INTERNAL VALIDITY

Threats to internal validity are essentially threats to causal control. They mean that we do not know for sure what caused the effects that we observed. Naturally, we like to hope that our interventions (experimental treatments) or other known and measured independent variables caused the effects. Unfortunately this is often not the case. For example, because of their multidimensionality, confounded variables (which measure more than one entity) pose  a threat to internal validity.

BIAS VERSUS RANDOM ERROR

If you have tight control over your experimental treatments (and, of course, used randomization), hopefully the only source of variance left in your dependent variables will be random error.

Random error is just that: It is the random variation that occurs on measurements across administrations, situations, or time periods. If random error is VERY large, it can pose a threat to the reliability (predictability, stability) of our measurements. Many political attitudes, for example, are highly unstable or volatile by their very nature. (Don't rely on this year's polls to pick the 2018 congressional candidates!)

On the other hand, because it is random, random error does not usually pose a threat to internal validity.

Bias is systematic error, such as the scale that always weighs you in at five pounds too light. Bias introduces a constant source of error into measurements or results. Bias can occur when test items are used that favor a particular ethnic, age, or gender group. For example, a "culture exam" that asked respondents to identify songs from the 1950s and the 1960s would discriminate against much younger people. Tests of "science knowledge" often favor younger people because they use the most recent definitions of science phenomena and thus favor those with a more recent education.

Bias in testing instruments is a threat to internal validity because it poses an alternative causal explanation for the results that we found.

Many of us have scales that weigh us as lighter than those at the doctor's office. Hmmm.

If we could either control bias experimentally (random assignment controls much of it by making experimental treatment groups roughly equivalent at the beginning of a study, thus controlling factors such as self-selection or regression toward the mean effects) or measure the variables we suspect cause bias and thus control them statistically, we would at least maximize internal validity to the best of our ability.

Unfortunately bias is often hidden, either in the variables you didn't measure--or the variables you didn't consider at all. Thus you didn't measure it and only discover your mistake after all your data are collected. Confounded variables also are a major threat to internal validity.
 

HERE ARE SOME WELL-KNOWN THREATS TO INTERNAL VALIDITY

Self-selection effects : When participants can select their own treatments (e.g., students who decide whether or not to respond to an online survey), we do not know whether the intervention or some pre-existing factor of the participant caused the outcomes we observed. Random assignment can cure this problem. The same problem can occur with differential selection, only in this case, the investigator (rather than the participant) uses human judgement to assign groups or participants to treatments. A common variation on this one is selecting extreme groups (see below).

Experimental mortality: When participants discontinue their participation in a study and this occurs more in certain conditions than others, we do not know how to causally interpret the results because we don't know how people who discontinued participation differed from those who completed it. A pretest questionnaire given to all subjects make help clarify this, but watch out for pretesting effects (a Solomon four group design can help here, see Guide 3.)

History: Some kind of event occurred during the study period (such as the 9-11 assaults on New York City or the 2016 presidential election) and it is reactions to these events that caused the outcomes we observed. Sometimes this is a medical event (such as a flu outbreak) and sometimes an actual political or historical event. Random assignment and a control group helps with this problem.

Maturation effects are especially important with children and youth (such as college freshmen) but could happen at any age. For example, young children's speech will normally become more complex, no matter what reading method you use. Some studies have reported that most college students pull out of a depression within six months, even if they receive no treatment whatsoever. A certain number of people will stop smoking, whether they receive treatment or not. Again, a randomized control group helps interpret the results.

Regression toward the mean effects ("statistical regression") are especially likely when extreme groups are studied. For example, students scoring at the bottom of a test typically improve their scores a least a little when they retake the test. Students with nearly perfect scores might miss an item the second time around. That is, people with extreme scores, or in extreme groups, will often fall back toward the average or "regress to the mean" on a second administration of the dependent variable.

Regression toward the mean effects are especially likely to occur among well-meaning investigators, who want to give a treatment that they believe is very beneficial to the group that appears to need it the most (the top scoring group is usually left alone.) When the scores of the worst group improve after the intervention (and the top group scores a little lower on the readministration if it occurs), misguided investigators are even more convinced that they have found a good treatment (instead of a methodological artifact.) How to avoid this threat to internal validity? Either avoid extreme groups, or if you do use them, randomly assign their members to treatment conditions, INCLUDING A CONTROL GROUP. Thus, among the lowest scoring students, one third would receive intervention #1, one third would receive intervention #2, and one third would receive no intervention at all.

Testing. Just taking a pretest can sensitize people and many people improve their performance with practice. Almost every classroom teacher knows that part of a student' s performance on assessment tests depends on their familiarity with the format. Solution? A Solomon Four Group Design, wherein half the subjects do not receive a pretest is a good way to control inferences in this case.

ISSUES IN EXPERIMENTS

While a true experiment can be higher on internal validity, by no means do all experiments have high internal validity. To enhance internal validity, the investigator must use control groups effectively, control reactivity, and scrutinize experimental reality. Further, you need to know if people noticed and comprehended your treatment or intervention in the first place.

EFFECTIVE USE OF   CONTROL GROUPS

When a new pharmaceutical drug is tested, typically all experimental subjects receive a pill.

Some receive the new active ingredient, such as a brand new antihistamine or antibiotic.
Some receive an older medicine, such as Tavist (clemestine) or Penicillin.
Yet others receive an inert "sugar pill" that has no active ingredients, or a placebo.

These control or comparison groups are an absolute necessity in any design, but certainly for an experiment.

The group receiving the older medication lets us know if the new drug (or intervention) is less effective, as effective, or more effective than treatments currently available (and probably cheaper).

The group receiving the "sugar pill" alerts us to changes that occur with the two active ingredient medication groups above and beyond a placebo effect. In a placebo effect, changes that occur are due to other factors besides the active treatment. For example, a patient might feel "safe" and "treated" if their doctor gives them a pill, even a sugar pill. Because of these psychological changes, their immune system might actually function better. This is very interesting but not what you set out to assess. So, anything that smacks of a placebo effect is a threat to internal validity and must be controlled for.

Notice that the "control group" GETS A PILL. The "nothing at all" control group is generally a very poor design. For example, if you were studying the effects of watching a violent film on aggression imitation among school children, the very act of watching any movie can be physiologically arousing. If your control group watched no movie at all, then you could not control for these effects. So, instead, your control group watches a generally unaggressive movie such as "The Adventures of Milo and Otis" (a puppy and kitten who are friends).

THE MORAL: Design the control group carefully. See that the control group has some features in common with the treatment groups if those features could affect study outcomes (it takes a pill, sees a film, or fills out a pretest questionnaire.)

Ideally your experimental situation will be "double blind." That means neither the participants nor the actual administrator (e.g., who hands out the pills) know which treatment participants receive. Otherwise the expectations of either the participants, the administrators, or both, can influence the results.

REACTIVITY AND THREATS TO INTERNAL VALIDITY

Reactivity refers to changes in the study participants' behavior simply because they are [aware of] being studied.

For example, some people get nervous when a doctor or nurse takes their blood pressure, and their blood pressure goes up. (Sometimes called "white coat syndrome".)

Reactivity poses a distinct threat to internal validity because we don't know what caused the outcome: treatment effects or reactivity. The experimental laboratory is probably the most reactive because people have come for an experiment and they know their behavior is under scrutiny. That is why so many experimenters use deception. They are trying to divert subject attention so that the "true behavior under study" is not altered.

Demand effects, in which subjects or respondents "follow orders" or cooperate in ways that they almost never would under their routine daily lives.

Social Desirability effects take several forms. Most people and groups (who allow you to study them at all) try to cooperate with researchers. But some try to descover the purpose of the intervention and thwart it, or "wreck the study." Social Reactance effects refer to boomerang effects in which individuals or groups "fake bad," or deliberately deviate from study procedures. This happens more among college students, and others who suspect that their autonomy is being threatened.

ON REACTIVITY AND EXTERNAL VALIDITY. If demand effects are specific to a particular situation, reactivity problems may also influence generalizing, or external validity.

However, more often, I think reactivity introduces an alternative causal explanation for our results: the results occurred, not because of the intervention or treatment, but because people were so self-conscious that they changed their behavior. This is internal validity. Reactivity may also statistically interact with the experimental manipulation. For example, if the treatment somehow impacts on self-esteem (say you are told that the stories you tell to the TAT pictures indicate your leadership ability), reactivity may be a greater internal validity problem.
 

MORE ON GENERALIZING: "EXPERIMENTAL" VERSUS "MUNDANE" REALITY

More of a threat to external validity is the issue of the reality of the study setting. In many cases, such as studies of classrooms or online environments, the setting of the study is identical to the "everyday reality" or mundane reality in which most subjects live their lives. High mundane reality makes it easier to generalize to people's typical settings and it facilitates external validity. Field studies of all kinds, and ethnographies, too, take place in typical, as opposed to unusual, settings.

However, laboratory experiments in particular may use unusual settings or tasks. For example, some sports experiments will have participants on a treadmill for hours. In other studies, subjects may be injected with substances (such as adrenaline) or take pills. Subjects may see specially constructed movies that are nothing like they see on TV or at a theater. Or they may be called upon to perform tasks (watching a light "move" in a darkened room) that bear no resemblance to their normal environment.While these settings or tasks may be engrossing or compelling, thus high in experimental reality, they do not resemble the settings to which researchers may really want to generalize.


DID ANYBODY NOTICE?
I HOPE A Manipulation Check WAS USED .

YOU are certain that your intervention will make life healthier or enhance learning. But what if no one pays attention to the treatment or comprehends its message? Then it will appear as if there are no effects at all, whereas if you had simply used a stronger manipulation, your guesswork might have been confirmed.

Anyone doing experimental work needs to have a manipulation check, an inclusion to measure if participants even paid attention to factors in the treatment and understood their messages. For example, if you show different movies to different groups and your topic is filmed aggression, include a short questionnaire that has participants rate the violence of the movie. The group receiving the more aggressive film should rate it as more violent than those receiving an unaggressive movie. If you are trying a new reading technique, make sure that students understand the stories they are exposed to and remember something about them. If you try a new template in your online learning course, did students even pay attention?

THE HUMAN FACTOR: USING DOUBLE  BLIND (repeat in more detail)

When the medical and pharmacy professions test a new medicine, they don't just use a "sugar pill" placebo.

Participants in the study do not know if they are taking a new medication, an old medication, or a sugar pill.

The individuals who pass out the medication and assess the subjects' health and behavior also do not know whether the person is taking a new medication, an old medication, or a sugar pill.

Thus both those involved as subjects and those involved with collecting data are "blind:" blind to the purposes of the study, the condition that participants are in, and the results expected.

This means that

Almost no one who collects data "likes deception" but without at least a little bit of it, reactivity and bias may be introduced into the study. Do the minimum (I prefer "omission" rather than deliberate lies) and be sure to debrief participants after their participation in the study is completed.

Debriefing means that afterwards participants are told the true purpose of the study and any manipulations pertinent to their role in it. Debriefing is ethically mandatory, and is especially important if the manipulation involved lies about the student's performance ("no, you really didn't score in the 5th percentile on that test, all feedback was bogus") or any other aspect of the "real world."


Did participants knew enough about the study procedures IN ADVANCE to give informed consent?

Was it clear to participants that nothing bad will happen to them if they REFUSE to participate in the study (they won't get a bad grade, get kicked out of the Boys and Girls Club, or lose their job)?

Is there a plan to keep the responses of participants confidential to the degree allowed by law? This means locking up surveys or experimental protocols in a cabinet or some other type of very limited access storage.

Were participants provided with contact information, so that they can contact the study supervisor if they have any questions?

For more on what we as researchers owe the participants in our studies, CLICK HERE for the Human Subjects Committee information (the "IRB" or "Institutional Review Board").
 
METHODS READINGS AND ASSIGNMENTS
OVERVIEW

Susan Carol Losh
September 18 2017
This page was built with Netscape Composer