METHODS READINGS AND ASSIGNMENTS |
TYPES OF ERROR AND BASIC SAMPLING DESIGNS |
OVERVIEW |
EDF
5481 METHODS OF EDUCATIONAL RESEARCH
INSTRUCTOR:
DR. SUSAN CAROL LOSH
|
|
|
|
|
|
The goal: to predict the true POPULATION VALUE. We want to minimize ANY deviation from whatever is the true population value.
Populations have PARAMETERS, samples provide ESTIMATES.
A POPULATION is the entire collection of elements that you wish to study, for example, ALL registered students at FSU, Spring 2017; ALL residential telephone numbers in Leon County, Florida.
A SAMPLE is some specified subpart or subset of your population.
In order to take a good sample, you must carefully define your population.
We use samples to generalize to populations and it is usually the well-defined populations we are interested in.
Somewhere along the line, you will need a good FRAME or list of all the elements in the population, (even if it's "far down the line" for a multi-stage sample).
That's why most "national mail surveys" are suspect; any list of mailing addresses for the entire United States would be out of date before the list was completed.
Practitioners of social research usually distinguish between two main sources of error in measuring social phenomena:
(1) SYSTEMATIC ERROR or BIAS--often tricky to discover, often part of a flawed data collection design, and
(2) RANDOM ERROR which is often sampling error.
BIAS is typically hidden. The more sources of input we get before starting a study, the more likely we are to discover bias. In the meantime, to minimize bias in survey research (or in other designs that utilize questionnaires) we can:
|
We can control
sampling error by the TYPES of samples we take and HOW LARGE a sample we
take.
Given enough funding,
of course.
Larger and/or more representative samples
have less sampling error and give more precise estimates of the true POPULATION
PARAMETERS.
|
(Oh dear, we KNEW she lied when she said there were no formulas. Relax! There are very few and it is your understanding of the concepts I am after.)
We make generalizations from SAMPLING DISTRIBUTIONS, hypothetical distributions of a sample statistic (such as an arithmetic mean or a percentage) taken from an infinite number of samples of the same size and the same type (say, n = 900 for each sample and each sample is a Random Digit Dial survey).
In the long run, we hope that the "mean of the means" (the grand average) will be the same as the true population average. If we do a good job, we can estimate the population mean or percentage from just one sample and put approximate limits (called "confidence intervals") around our estimate. Confidence intervals tell us how much on the average we can expect our results to vary from one sample to another (say, "plus or minus 3 percent").
The size of the confidence interval depends on two entities:
|
ALL THIS ASSUMES THAT WE TAKE PROBABILITY SAMPLES THROUGHOUT THE ENTIRE SAMPLING PROCESS!
In probability samples, each element, person, or case has a KNOWN, NON-ZERO chance of selection.
(VERY IMPORTANT NOTE: this does not necessarily mean equal chances of selection. Probabilty samples can have elements selected with unequal probabilities. We call EQUAL probability samples EPSEM samples. Equal Probability of SElection Methods. You could have a sample with unequal but known probabilities of selection--this would STILL be a probability sample. It would not be an EPSEM sample.)
ISSUES IN SAMPLING
To do probability samples, you need a complete FRAME or list, which enumerates all the elements in your population. Random Digit Dialing--RDD--approximates a complete list of telephone numbers. We would not be able to list all the numbers. Specialized programs can create lists of random digit telephone numbers.
If your population is very large, such a list will either be impossible, unwieldy, or VERY expensive. If you break your sample into STAGES, all you need is a complete frame at each stage. Here is an example for a Random Digit Dial telephone sample for the entire United States:
There are books and companies that do this. When doing a local telephone survey, I have found that talking with the local telephone companies is a MUST. They can tell you which banks of numbers (e.g., 576-9000) are empty and that can save a lot of sampling time.
If you do not
have known probabilities of selection, then you have a NONPROBABILITY SAMPLE.
|
|
Simple random samples (srs) | Self-selected samples (e.g., call-in/mail-in "polls") |
Systematic samples with a random start | Available respondents (e.g., "grab"/haphazard/ "convenience" samples) |
Stratified samples (a) proportionate to size; (b) disproportionate to size | Purposive/judgement samples (including "snowball samples") |
Cluster samples (usually based on geographical proximity) | Quota samples |
|
|
A SAMPLE MUST BE A PROBABILITY SAMPLE AT ALL STAGES IN ORDER TO STRICTLY QUALIFY AS A PROBABILITY SAMPLE. For example, if you do a RDD telephone survey, you also must use some type of probability method to select the respondent within households.
THERE REALLY IS NO SUCH THING AS A "QUASI-PROBABILITY" SAMPLE ALTHOUGH SOME COMMERCIAL AGENCIES WOULD LOVE YOU TO BELIEVE THIS.
|
SIMPLE RANDOM SAMPLING (srs). There is a complete list at each stage. Assign numbers or names to each element. Use a random number table or slips of paper to select cases. RDD approximates srs. srs are Equal Probability of Selection Method (EPSEM) samples. In srs, EACH ELEMENT AND EACH COMBINATION OF ELEMENTS HAVE AN EQUAL CHANCE OF SELECTION. Lotteries are usually srs too!
SYSTEMATIC SAMPLING. Each element at a preselected interval is chosen (e.g., every 10th case or every 100th case). A RANDOM START must be used to select the first element chosen. Approximates srs. WATCH FOR CYCLIC VARIATIONS, e.g., only selecting corner apartments which are often more expensive or only selecting Sundays for days of the week. Cyclic variations are often important in studying hierarchical organizations (e.g., the military) or in content analysis of media. If you suspect cyclic repeats in your cases, DO NOT USE systematic samples!
STRATIFIED SAMPLES. Divide the population into mutually exclusive strata PRIOR TO SAMPLING. Often done on a size basis (e.g., large versus small). Use srs or systematic sampling to select cases within strata. Increases representativeness by ensuring cases from each stratum are selected. May be either proportionate or disproportionate to size. For example, you may stratify school classes into gifted, regular, or remedial. Disproportionate samples are UNEQUAL PROBABILITY SAMPLES. THEY ARE STILL PROBABILITY SAMPLES.
CLUSTER SAMPLES. Take part or all of naturally
occurring clusters such as classrooms or city blocks. Often
the frame is not enumerated until the cluster is actually selected. Cluster
samples can be less representative if only a very few large clusters are
selected (e.g., at Florida State University, compare drawing just one classroom
with 250 students versus 250 students selected via srs from all over campus).
|
SELF-SELECTED. Respondents decide for themselves whether to participate, often in response to solicitations, such as questionnaires printed in newspapers, magazines or via 800- or 900- (YOU pay) telephone numbers. Might (big maybe) represent people very motivated or on the extreme of a particular topic. But how would you ever know?
AVAILABILITY/GRAB/HAPHAZARD/CONVENIENCE SAMPLES. Literally "grabbed" from whomever is available (Publix? Landis Green? a teacher who lets you use his classroom? Ed psych undergraduates in a friend's classroom?). Who knows WHO these folks represent? (Not me. Not you.) The only small advantage (missing from self-selected cases) is that at least the interviewer or study director selects the cases.
PURPOSIVE SAMPLES (sometimes called judgment samples). The study director, the client, or the interviewer decides who is "typical" or "representative" using some type of stated criteria. This can lead to tremendous bias depending on the study director's prejudices. Examples: selecting "singles" from bars, apartment buildings, colleges, overlooking single adults who live with their families, go to church or synagogue or mosque, avoid bars or work at small businesses. Selecting "typical users" for a new computer training program.
NOTE: Depending on the situation, this may, in fact, be the best that we can do.
QUOTA SAMPLES. THEY ONLY LOOK GOOD. They are still often used. Interviewers are instructed to grab respondents with the "right" combination of characteristics. Example: married Black women with college degrees between ages 30-35 with at least one child. PROBLEMS: Leaves it to the client, researcher, or interviewer's discretion whom to select; only a few characteristics can be simultaneously considered; no attempts at call-back for not-at-homes or refusal-conversions to completed interviews.
In non-probability samples, we usually have very few ideas of how respondents differ from non-respondents.
THE BEST PROBABILITY SAMPLE
WILL NOT HELP IF THE RESPONSE RATE IS POOR.
The researcher is better
off with a smaller sample and a higher response rate so as not to worry
about bias from non-response.
We often have no idea how refusals or absent households differ from those interviewed. Aim for MINIMUM 50 PERCENT (this is actually pretty bad), 65 percent is more acceptable and over 70-odd percent, you will rival the General Social Survey.
If your time in the field is very short, you will have a high non-response due to inability to locate and contact many selected respondents.
You may want to read a study done by the
Pew Center for People and the Press on response rates HERE.
It's interesting but far from universally accepted.
(Rumor now has it that Pew today gets
only 9% response rates.)
The survey research industry right now
is in a real dilemma because response rates have fallen so drastically.
The industry is struggling to do "representative"
samples for reasonable cost.
Yet remember, neither Gallup nor the
U.S. Census (nor any other entity) can mess with the laws of probability.
If non-probability samples are used, measures of "survey error" cannot
be legitimately used either.
|
|
PLEASE READ THIS SECTION CAREFULLY. THE DISTINCTION CONFUSES A LOT OF STUDENTS--and many RESEARCHERS.
Notice that the researcher defines their population and select the sample BEFORE they assign participants in experiments or quasi experiments to treatment groups.
Sampling is where experimenters often get sloppy. They will take grab samples or even cluster grab samples and sometimes never define their population at all!
RANDOMIZATION AND SIMPLE RANDOM SAMPLING
ARE TWO DIFFERENT THINGS.
IT IS VERY EASY TO CONFUSE THEM UNLESS
YOU KNOW BETTER.
SIMPLE RANDOM SAMPLING: One way the total pool of subjects may be created before any intervention or treatment. However, many other sampling methods, such as cluster or convenience sampling might be used.
The process of how participants were obtained affects external validity. If the researcher used a simple random sample to select elements into the study before any intervention began, other things equal, there will have good external validity.
RANDOMIZATION OR RANDOM ASSIGNMENT: One way of assigning subjects to treatment or intervention groups. Other methods, such as experimenter judgement might be used but are poor on internal validity and quite possibly external validity too.
The process of how subjects were assigned to treatment or intervention groups affects internal validity.
Randomization, or random assignment of participants to treatment groups DOES NOT CORRECT for sloppy sampling of groups or elements in the first place (external validity). What randomization means is that you can typically make strong causal statements about how the treatments influenced the outcomes (internal validity) but only for the participants who took part in your interventions.
Once that's done, whom can the researcher generalize to? If the sample of groups or elements is poor, he or she can't generalize to anyone!
Now, that is a strong statement. Most researchers in practice aren't that fussy. But, where do you draw the line? What if you have a grab sample of classes from the FSU University school for your dissertation? (You grabbed where the instructor was cooperative.) You might generalize to the University school. You might even try to generalize to Leon County public schools (although the Lab School is Whiter and higher in social class than Leon County in general). But what then? Your results don't represent Florida classrooms, Southeastern classrooms, and certainly not United States classrooms. Although you used random assignment of treatments, your sample of classes limits your external validity, or how much you can generalize.
Notice, too, that when classrooms are sampled, this is a CLUSTER SAMPLE. If the students within a classroom are similar (say, grouped by ability level), the researcher has artificially depressed the standard errors. That can lead him or her to believe there are statistically significant results when there are really NOT.
The TRUE standard errors in cluster samples
will be larger than the typical statistical program calculations that you
see on your computer output from programs such as SPSS, which use simple
random sampling formulas to calculate standard errors. This is because
the sample of classrooms under estimated the true heterogeneity present
in the entire school.
|
This page was built with
Netscape Composer.
Susan Carol Losh
|
METHODS READINGS AND ASSIGNMENTS |
OVERVIEW |
|
Much of this site provided the basis for my entry in the following: S.C. Losh (2010) “Sampling Error” in N. J. Salkind (Ed.), The Encyclopedia of Research Design. Thousand Oaks, CA: Sage Publications. If you cite this page, that's the citation.