Sampling and Types of Error

CLICK HERE TO RETURN TO YOUR SPOT IN GUIDE 5.

METHODS READINGS AND ASSIGNMENTS

TYPES OF ERROR AND BASIC SAMPLING DESIGNS

OVERVIEW

EDF 5481 METHODS OF EDUCATIONAL RESEARCH
INSTRUCTOR: DR. SUSAN CAROL LOSH

SAMPLING DISTRIBUTIONS

TYPES OF SAMPLES

PROBABILITY SAMPLES

NONPROBABILITY SAMPLES

RANDOM SAMPLING VERSUS RANDOM ASSIGNMENT

A GUIDE THROUGH SYSTEMATIC AND RANDOM ERROR

The goal: to predict the true POPULATION VALUE. We want to minimize ANY deviation from whatever is the true population value.

Populations have PARAMETERS, samples provide ESTIMATES.

A POPULATION is the entire collection of elements that you wish to study, for example, ALL registered students at FSU, Spring 2017; ALL residential telephone numbers in Leon County, Florida.

A SAMPLE is some specified subpart or subset of your population.

In order to take a good sample, you must carefully define your population.

We use samples to generalize to populations and it is usually the well-defined populations we are interested in.

Somewhere along the line, you will need a good FRAME or list of all the elements in the population, (even if it's "far down the line" for a multi-stage sample).

That's why most "national mail surveys" are suspect; any list of mailing addresses for the entire United States would be out of date before the list was completed.

Practitioners of social research usually distinguish between two main sources of error in measuring social phenomena:

(1) SYSTEMATIC ERROR or BIAS--often tricky to discover, often part of a flawed data collection design, and

(2) RANDOM ERROR which is often sampling error.

BIAS is typically hidden. The more sources of input we get before starting a study, the more likely we are to discover bias. In the meantime, to minimize bias in survey research (or in other designs that utilize questionnaires) we can:

take good samples (HINT: this typically means probability samples)
use good question construction (no inflammatory language, no jargon or technical terms, short, simple, each question measures ONE idea, etc.)
use a MIX of question formats (open/closed; equally balanced; Likert, etc.)
use a MIX of interviewers if possible (e.g., gender/ethnicity/regions)
have corresponding behavioral measures if possible
use well-trained interviewers
keep non-response to an absolute minimum
think multi culturally; think about how other cultures or subcultures will perceive this study design
EXAMPLE: in "science fact" questions, physical sciences are more likely to be male, life sciences are more likely to be female. The mix of science questions may determine who looks more knowledgeable.

Remember our bathroom scale that always weighs 5 pounds too light? That's BIAS. Certain types of questions or question construction, or poor types of samples, produce bias too.

Random [sampling] error often relates to the types and sizes of samples.

We can control sampling error by the TYPES of samples we take and HOW LARGE a sample we take.
Given enough funding, of course.

Larger and/or more representative samples have less sampling error and give more precise estimates of the true POPULATION PARAMETERS.

Samples versus SAMPLING DISTRIBUTIONS

(Oh dear, we KNEW she lied when she said there were no formulas. Relax! There are very few and it is your understanding of the concepts I am after.)

We make generalizations from SAMPLING DISTRIBUTIONS, hypothetical distributions of a sample statistic (such as an arithmetic mean or a percentage) taken from an infinite number of samples of the same size and the same type (say, n = 900 for each sample and each sample is a Random Digit Dial survey).

In the long run, we hope that the "mean of the means" (the grand average) will be the same as the true population average. If we do a good job, we can estimate the population mean or percentage from just one sample and put approximate limits (called "confidence intervals") around our estimate. Confidence intervals tell us how much on the average we can expect our results to vary from one sample to another (say, "plus or minus 3 percent").

The size of the confidence interval depends on two entities:

(1) how much variability there is in the sample (such as the "standard deviation of the mean") and
(2) the sample size.

SMALLER confidence intervals mean MORE PRECISE estimates.
LARGER SAMPLE SIZES produce SMALLER confidence intervals thus more precise estimates.
It is the absolute size of the sample, not the fraction of the population that typically is the most important for precise estimates (except with very small populations and large fractions).

TYPES OF SAMPLES

ALL THIS ASSUMES THAT WE TAKE PROBABILITY SAMPLES THROUGHOUT THE ENTIRE SAMPLING PROCESS!

In probability samples, each element, person, or case has a KNOWN, NON-ZERO chance of selection.

(VERY IMPORTANT NOTE: this does not necessarily mean equal chances of selection. Probabilty samples can have elements selected with unequal probabilities. We call EQUAL probability samples EPSEM samples. Equal Probability of SElection Methods. You could have a sample with unequal but known probabilities of selection--this would STILL be a probability sample. It would not be an EPSEM sample.)

ISSUES IN SAMPLING

To do probability samples, you need a complete FRAME or list, which enumerates all the elements in your population. Random Digit Dialing--RDD--approximates a complete list of telephone numbers. We would not be able to list all the numbers. Specialized programs can create lists of random digit telephone numbers.

If your population is very large, such a list will either be impossible, unwieldy, or VERY expensive. If you break your sample into STAGES, all you need is a complete frame at each stage. Here is an example for a Random Digit Dial telephone sample for the entire United States:

1. Within a state, first you list all area codes (these days, they add so many, this might be a tough one! However there are books available that do this as well as WEB sites.)
2. Perhaps you then sample two area codes per state, with probability proportional to the number of telephone numbers in the area code..
3. Next you need a list of all exchanges for each previously selected area code. Perhaps you sample 100 exchanges per area code.
4. At the fourth stage, you use 4-digit random numbers added to the exchanges to generate the final telephone numbers for the sample. All you need is a complete frame at each separate stage.
5. At the fifth and final stage, you select a respondent within the contacted household.

(Remember, if there is a cell phone subsample, the researcher must find a way to compensate respondents for the minutes used in their phone plan.)
(It is still illegal to use automatic dialers when ringing cell phones although legislation may change that this year.)

There are books and companies that do this. When doing a local telephone survey, I have found that talking with the local telephone companies is a MUST. They can tell you which banks of numbers (e.g., 576-9000) are empty and that can save a lot of sampling time.

If you do not have known probabilities of selection, then you have a NONPROBABILITY SAMPLE.

POPULAR PROBABILITY SAMPLES POPULAR NON-PROBABILITY SAMPLES

Simple random samples (srs) Self-selected samples (e.g., call-in/mail-in "polls")

Systematic samples with a random start Available respondents (e.g., "grab"/haphazard/ "convenience" samples)

Stratified samples (a) proportionate to size; (b) disproportionate to size Purposive/judgement samples (including "snowball samples")

Cluster samples (usually based on geographical proximity) Quota samples

ONLY PROBABILITY SAMPLES [with good response rates] ALLOW YOU TO CONSTRUCT CONFIDENCE INTERVALS, MAKE STATEMENTS ABOUT SAMPLING ERROR, OR LEGITIMATELY USE "STATISTICAL SIGNIFICANCE".
It should be noted, however, that probability samples with very low response rates begin to resemble self-selected samples (see below) so proceed with caution.

THERE IS NO SUCH THING AS A "RANDOM SAMPLE" IN THE TECHNICAL LITERATURE. LAYPEOPLE AND SLOPPY RESEARCHERS CARELESSLY USE THIS TERM TO DENOTE ANY KIND OF PROBABILITY SAMPLE. THERE IS SUCH A THING AS A "SIMPLE RANDOM SAMPLE" (SEE BELOW) AND IT HAS A STRICT TECHNICAL DEFINITION.
NO ONE IN THIS COURSE USES THE TERM "RANDOM SAMPLE." (AND GETS ANY CREDIT FOR IT.)

A SAMPLE MUST BE A PROBABILITY SAMPLE AT ALL STAGES IN ORDER TO STRICTLY QUALIFY AS A PROBABILITY SAMPLE. For example, if you do a RDD telephone survey, you also must use some type of probability method to select the respondent within households.

THERE REALLY IS NO SUCH THING AS A "QUASI-PROBABILITY" SAMPLE ALTHOUGH SOME COMMERCIAL AGENCIES WOULD LOVE YOU TO BELIEVE THIS.

PROBABILITY SAMPLE TYPES

SIMPLE RANDOM SAMPLING (srs). There is a complete list at each stage. Assign numbers or names to each element. Use a random number table or slips of paper to select cases. RDD approximates srs. srs are Equal Probability of Selection Method (EPSEM) samples. In srs, EACH ELEMENT AND EACH COMBINATION OF ELEMENTS HAVE AN EQUAL CHANCE OF SELECTION. Lotteries are usually srs too!

SYSTEMATIC SAMPLING. Each element at a preselected interval is chosen (e.g., every 10th case or every 100th case). A RANDOM START must be used to select the first element chosen. Approximates srs. WATCH FOR CYCLIC VARIATIONS, e.g., only selecting corner apartments which are often more expensive or only selecting Sundays for days of the week. Cyclic variations are often important in studying hierarchical organizations (e.g., the military) or in content analysis of media. If you suspect cyclic repeats in your cases, DO NOT USE systematic samples!

STRATIFIED SAMPLES. Divide the population into mutually exclusive strata PRIOR TO SAMPLING. Often done on a size basis (e.g., large versus small). Use srs or systematic sampling to select cases within strata. Increases representativeness by ensuring cases from each stratum are selected. May be either proportionate or disproportionate to size. For example, you may stratify school classes into gifted, regular, or remedial. Disproportionate samples are UNEQUAL PROBABILITY SAMPLES. THEY ARE STILL PROBABILITY SAMPLES.

CLUSTER SAMPLES. Take part or all of naturally occurring clusters such as classrooms or city blocks. Often the frame is not enumerated until the cluster is actually selected. Cluster samples can be less representative if only a very few large clusters are selected (e.g., at Florida State University, compare drawing just one classroom with 250 students versus 250 students selected via srs from all over campus).

NONPROBABILITY SAMPLE TYPES

SELF-SELECTED. Respondents decide for themselves whether to participate, often in response to solicitations, such as questionnaires printed in newspapers, magazines or via 800- or 900- (YOU pay) telephone numbers. Might (big maybe) represent people very motivated or on the extreme of a particular topic. But how would you ever know?

AVAILABILITY/GRAB/HAPHAZARD/CONVENIENCE SAMPLES. Literally "grabbed" from whomever is available (Publix? Landis Green? a teacher who lets you use his classroom? Ed psych undergraduates in a friend's classroom?). Who knows WHO these folks represent? (Not me. Not you.) The only small advantage (missing from self-selected cases) is that at least the interviewer or study director selects the cases.

PURPOSIVE SAMPLES (sometimes called judgment samples). The study director, the client, or the interviewer decides who is "typical" or "representative" using some type of stated criteria. This can lead to tremendous bias depending on the study director's prejudices. Examples: selecting "singles" from bars, apartment buildings, colleges, overlooking single adults who live with their families, go to church or synagogue or mosque, avoid bars or work at small businesses. Selecting "typical users" for a new computer training program.

NOTE: Depending on the situation, this may, in fact, be the best that we can do.

QUOTA SAMPLES. THEY ONLY LOOK GOOD. They are still often used. Interviewers are instructed to grab respondents with the "right" combination of characteristics. Example: married Black women with college degrees between ages 30-35 with at least one child. PROBLEMS: Leaves it to the client, researcher, or interviewer's discretion whom to select; only a few characteristics can be simultaneously considered; no attempts at call-back for not-at-homes or refusal-conversions to completed interviews.

In non-probability samples, we usually have very few ideas of how respondents differ from non-respondents.

THE BEST PROBABILITY SAMPLE WILL NOT HELP IF THE RESPONSE RATE IS POOR.
The researcher is better off with a smaller sample and a higher response rate so as not to worry about bias from non-response.

We often have no idea how refusals or absent households differ from those interviewed. Aim for MINIMUM 50 PERCENT (this is actually pretty bad), 65 percent is more acceptable and over 70-odd percent, you will rival the General Social Survey.

If your time in the field is very short, you will have a high non-response due to inability to locate and contact many selected respondents.

You may want to read a study done by the Pew Center for People and the Press on response rates HERE. It's interesting but far from universally accepted.
(Rumor now has it that Pew today gets only 9% response rates.)

The survey research industry right now is in a real dilemma because response rates have fallen so drastically.
The industry is struggling to do "representative" samples for reasonable cost.

Yet remember, neither Gallup nor the U.S. Census (nor any other entity) can mess with the laws of probability. If non-probability samples are used, measures of "survey error" cannot be legitimately used either.

ALWAYS CHECK TO SEE IF THE RESEARCH YOU READ MAKES ANY MENTION OF SAMPLING AND RESPONSE RATES!

DIFFERENCES BETWEEN SAMPLING AND RANDOM ASSIGNMENT TO TREATMENT GROUPS

PLEASE READ THIS SECTION CAREFULLY. THE DISTINCTION CONFUSES A LOT OF STUDENTS--and many RESEARCHERS.

Notice that the researcher defines their population and select the sample BEFORE they assign participants in experiments or quasi experiments to treatment groups.

Sampling is where experimenters often get sloppy. They will take grab samples or even cluster grab samples and sometimes never define their population at all!

RANDOMIZATION AND SIMPLE RANDOM SAMPLING ARE TWO DIFFERENT THINGS.
IT IS VERY EASY TO CONFUSE THEM UNLESS YOU KNOW BETTER.

SIMPLE RANDOM SAMPLING: One way the total pool of subjects may be created before any intervention or treatment. However, many other sampling methods, such as cluster or convenience sampling might be used.

The process of how participants were obtained affects external validity. If the researcher used a simple random sample to select elements into the study before any intervention began, other things equal, there will have good external validity.

RANDOMIZATION OR RANDOM ASSIGNMENT: One way of assigning subjects to treatment or intervention groups. Other methods, such as experimenter judgement might be used but are poor on internal validity and quite possibly external validity too.

The process of how subjects were assigned to treatment or intervention groups affects internal validity.

Randomization, or random assignment of participants to treatment groups DOES NOT CORRECT for sloppy sampling of groups or elements in the first place (external validity). What randomization means is that you can typically make strong causal statements about how the treatments influenced the outcomes (internal validity) but only for the participants who took part in your interventions.

Once that's done, whom can the researcher generalize to? If the sample of groups or elements is poor, he or she can't generalize to anyone!

Now, that is a strong statement. Most researchers in practice aren't that fussy. But, where do you draw the line? What if you have a grab sample of classes from the FSU University school for your dissertation? (You grabbed where the instructor was cooperative.) You might generalize to the University school. You might even try to generalize to Leon County public schools (although the Lab School is Whiter and higher in social class than Leon County in general). But what then? Your results don't represent Florida classrooms, Southeastern classrooms, and certainly not United States classrooms. Although you used random assignment of treatments, your sample of classes limits your external validity, or how much you can generalize.

Notice, too, that when classrooms are sampled, this is a CLUSTER SAMPLE. If the students within a classroom are similar (say, grouped by ability level), the researcher has artificially depressed the standard errors. That can lead him or her to believe there are statistically significant results when there are really NOT.

The TRUE standard errors in cluster samples will be larger than the typical statistical program calculations that you see on your computer output from programs such as SPSS, which use simple random sampling formulas to calculate standard errors. This is because the sample of classrooms under estimated the true heterogeneity present in the entire school.

You may or may not use probability methods to select subjects into groups in the first place. IF you use a simple random sample from some population to create all of your groups BEFORE you randomly assign elements to treatment groups, you have BOTH simple random sampling AND random assignment.

This page was built with Netscape Composer.
Susan Carol Losh

METHODS READINGS AND ASSIGNMENTS
OVERVIEW

Much of this site provided the basis for my entry in the following: S.C. Losh (2010) “Sampling Error” in N. J. Salkind (Ed.), The Encyclopedia of Research Design. Thousand Oaks, CA: Sage Publications. If you cite this page, that's the citation.

POPULAR PROBABILITY SAMPLES	POPULAR NON-PROBABILITY SAMPLES
Simple random samples (srs)	Self-selected samples (e.g., call-in/mail-in "polls")
Systematic samples with a random start	Available respondents (e.g., "grab"/haphazard/ "convenience" samples)
Stratified samples (a) proportionate to size; (b) disproportionate to size	Purposive/judgement samples (including "snowball samples")
Cluster samples (usually based on geographical proximity)	Quota samples