KEY TAKEAWAYS:
-
It is important to recognize that using
these databases is not "easier" and it can be time-consuming.
-
What they do make possible is a scope and
breadth an individual researcher could not attain on their own.
-
Thus research with these databases may
(!) have greater construct and external validity.
-
Archival databases may be useful reference
points to find out the most accurate or extensive treatment of a topic
in your study area.
-
They are also useful to new researchers
who are still constrained for resources.
-
The student must ask the same questions
of an archive that they do when evaluating any study or data source, for
example:
-
What was the method of data collection
(e.g., organizational records? surveys? what kind of surveys?)
-
Who or what is the population? What is
the estimated level of external validity?
-
What possibilities for bias exist? Coding
cause of death to spare the family? Biased questions or omitted participants
(e.g., never married mothers)?
-
How complete is the description of the
data (response rate? coverage, e.g., including cell phones)
-
Are the data available? If so, how (including
cost)?
-
Is online data analysis of the data possible?
-
Have the data been used in books, chapters
or articles? (If so, we can learn more about the topic, too.)
|
EDF
5481 METHODS OF EDUCATIONAL RESEARCH
INSTRUCTOR:
DR. SUSAN CAROL LOSH
FALL 2017
PLEASE NOTE: Your
texts do not give much information about online data and secondary analysis,
so this lecture will be the basis for the topic this term. "Big Data"
are increasingly used in original research! You are responsible for this
material on Quizzes and Assignments.
OPTIONAL: What do we know about
"big data"? Check out my address to the AERA Advanced Studies of National
Databases Special Interest Group (SIG):
|
HERE
|
OPTIONAL: Here's a "Big Data" AERA
SIG Newsletter (with thanks to Editor Jim Harvey )
|
HERE
|
WHY EXAMINE WEB-BASED
DATABASES?
|
As you have learned, it is expensive and
time-consuming to collect data, especially datasets that are sizable or
comprehensive. In the early 1970s, the United States Federal government
initiated a series of what have come to be called "Social Indicators."
The idea was to collect data from different domains (education, health,
the status of women and ethnic minorities, public opinion, etc.) and to
continue these series over time, thereby tracking change and continuity
among Americans. At the same time, other countries, particularly Canada,
Western Europe, and Japan , also began indicator series, thus making possible
international comparisons. By now, nearly all regions of the world collect
and store indicator information. In education, one example is the Trends
in International Mathematics and Science Studies (TIMSS). Data were collected
in 42 countries in 1995 and in 38 countries in 1999. More recent additions
(2003, 2007, 2011, with 2015 shortly to come) address experience with computers
and the World Wide Web.
Considerable effort has been devoted
to making many of these indicator series compatible over time and many
of these efforts reference concepts we have utilized all semester, e.g.,
internal, construct and external validity; sampling and question format
issues:
-
Questions are asked in the same way
-
Changes to questions are established via "split-ballot"
testing, i.e., experiments to see whether the revised questions work the
same way as the original questions. A good indicator series NEVER
arbitrarily shifts question format (or open question codes).
-
Variables are defined in the same way
-
Coding categories remain constant
-
If coding changes are made, care is taken
to make new coding systems compatible with the old, such as the detailed
United States Census three digit occupational codes
A series may have an "oversight board."
These boards monitor the content and form of the indicator series. Thus,
principal investigators cannot capriciously change either content or form
without input from a panel of expert professionals.
The number of data archives is already
HUGE and it is growing by the minute. Some of the large archives, such
as ICPSR, The Roper Center or the Odum Institute for Research in Social
Science at the University of North Carolina, are simply staggering in the
amount of data that they hold.
As you look through some of the sample
pages, you will see that several times I have given the warning: "set aside
a day to explore this archive." Do take this warning seriously!
One of these archives may hold the answer to questions you may have about
research in your field, or your proposed dissertation or master's thesis,
or provide the basis for a nice conference paper or article. They are definitely
worth exploring.
These archives may be the source to consult
if a new study garners a lot of publicity and possibly "strange" findings.
With resources such as these, the novice--and
even the experienced--researcher should seriously reconsider whether they
really want to gather all of their own data from scratch.
Analyzing data from these archives is often
called SECONDARY ANALYSIS, partially because
the data were originally gathered for other research and information purposes.
WHY THESE ARCHIVES
ARE IMPORTANT TO YOU
|
-
There is no point in "reinventing the wheel."
Why
do a small local study when data already exist on regional, national or
even international levels? An example is using the "CIRP" (often called
the "Freshman Surveys") to look at college student beliefs, attitudes,
and accomplishments instead of convenience samples of your buddy's classes.
-
"There is plenty of gold in them thar hills."
Most of these databases are so huge that no one investigator could ever
analyze everything in them. With each successive year, the possibilities
for analysis grow. Furthermore, other researchers may have ideas for
analysis that did not occur to the original Principal Investigator. In
other words, there is plenty of data for you to do an original study analysis--without
all the backbreaking work of collecting the data too.
I practice what I preach! Since 2001
I have worked with the National Science Foundation Surveys of Public Understanding
of Science and Technology. These surveys now span 1979 to 2014, an unprecedented
look at public knowledge, reasoning and attitudes about science and technology.
I have built longitudinal files from these data now available at ICPSR
and The Roper Center.
OPTIONAL: One
thing repeated studies can make possible are comparing generational effects
versus chronological aging. For two examples of my examination of generational
versus aging effects on science beliefs and attitudes (CLICK
HERE) and information technology (CLICK
HERE), see the Internet links. Currently I am examining general public
perceptions of climatologists over time.
|
-
Many of these archives offer an unprecedented
opportunity to track trends over time. How did computer use change
from the early 1980s to the early 2000s? What kind of educational preparation
do students receive who rise to eminence later on? What are the average
student characteristics in research universities as opposed to liberal
arts colleges, and how did these characteristics change over time? What
are gender differences in Internet use over time?
-
YOUR time, resources, and energy. Many
researchers, especially junior faculty and doctoral candidates, have limited
resources. With one eye on the tenure clock, junior faculty have limited
time too. It twould be nearly impossible for most young researchers to
collect international data or wait many years to collect repeated measures.
If existing archives have variables that are directly pertinent to your
research interests, it is often in your best professional interests to
use--or at least to reference--these archives.
Obviously, using pre-existing archives
are not for everyone. Many students in disciplines that lend themselves
to experiments or surveys might be able to quickly collect hand-tailored
data with relatively little financial investment. However, even these researchers
may be interested in "triangulation" with survey data or historical records.
CLICK HERE
TO ENTER THE ONLINE DATABASE MENU
|
QUESTIONS YOU SHOULD CONSIDER
ABOUT ONLINE DATABASES
|
-
What is the unit of analysis? Is it
an individual? An organization, such as a college or university? A time
point for a country or state series? Archives vary and the unit is not
always an individual.
-
What kinds of variables does the archive
cover? Degree attainment? Spirituality? Symptoms of stress? Health
practices? Drug or alcohol usage? Water polution?
-
What is the time frame covered by the archive?
Examples:
the average school FCAT scores for 1998-2016 or The General Social Survey
from 1972-2016.
-
What is the geographic frame covered by
the archive (state? local? United States? international?)
-
Who were the sponsor(s) of the archive
(e.g., NSF? NCES? United Faculty of Florida?)
-
How did the archive come to be?
-
Were the data collected especially for
the archive (such as IPEDS or TIMSS)? Or were the data compiled from other
sources (such as Web CASPAR)?
-
Does the archive contain any tutorials
that instruct how to use it (online or otherwise)?
-
Are there codebooks that describe the data,
the variables and the file structure?
-
How are the data available? Are they
ready for online analysis? Are the data available to download into one's
computer? Are the data contained in .pdf format tables? Are there
alternative ways to obtain the data (such as CD?)? If so, how can the data
be obtained?
-
Can you simply download the data or must
you obtain a CD or other device from the archive agency?
-
How "clean" are the data? One good
example is the U.S. government's famous "Falling Through the Net" data
about the "Digital Divide" in Computer and Internet Usage. This is one
of the most cited datasets about early Digital Divides but the data are
appallingly "dirty." Any household resident 14 years of age or older
was
asked to provide information about all other residents in the household.
Considerable data are missing on racial identification. The information
I could locate did not say how the data were gathered (in-person? Random
Digit Dial of landlines?) Apparently, the government was in such a rush
to put up the dataset, the data contain a LOT of careless
errors. As a result, I consider estimates from the early years of these
data to be unreliable despite a usually trustworthy source.
-
Is there a charge for the data? If
so, what is the cost? Most archival costs are surprisingly reasonable,
when you consider the effort involved in the first place. For example,
the cost of the ENTIRE General Social Survey archive, from 1972 to 2016,
in SPSS ready format is about $500. Compare this with the millions
of dollars it cost to gather the data (about three million dollars every
other year). Don't forget: a researcher can incur time and financial costs
to gather and process their own data. It may, indeed, turn out to be cheaper
to use the archive. And University dissertation grants may even cover
the acquisition cost.
-
What kinds of analyses can be done online?
Frequency
distributions? Cross-tabulations? Multiple regression or other multivariate
analyses? See if the archive uses the California-Berkeley Survey Documentation
and Analysis (SDA/DAS) System program which is simple to use, covers
most basic statistics, and is unbelievably fast (including on a dial-up
system where I first used it: it tore through over 131,000 cases in
7 seconds). Many online datasets are now directly linked to the SDA/DAS
system.
-
Is a questionnaire available or some other
original document describing each variable in detail? Maybe it is available
as a separate link or as a .pdf document.
EVERYDAY NOTE:
The IRB will want to see your questionnaire(s) if you do a survey design
yourself--and any questionnaires from an existing database if you conduct
secondary analysis.
-
What is mentioned about coverage or response
rate? For example, data are missing from several states in early data
series about abortion. Some surveys, especially longitudinal or panel studies,
may have completed interviews with less than half of the originally contacted
respondents. In other cases, such as the CIRP, response rates can vary
considerably from college to college.
-
Does the user need any kind of license
from the data agency? Many data sets at the National Science Foundation,
the National Center for Educational Statistics, and other agencies require
the user to have a license if s/he works with what is called the "unit
record" data. "Unit record data" is the "raw data" archive where each record
or line in the datafile is an individual or an institution. This means
the person or institution could plausibly be identified (although in many
if not most cases, this is unlikely). Obtaining a license is typically
not a problem for legitimate researchers but it does necessitate some paperwork
so if you are the user: be prepared to check about this and budget some
time accordingly.
-
What was the mode of data collection? In-person
surveys may give different results than telephone surveys. The top administrator
of a university may access different data than a rank-and-file faculty
member. And, remember, of course, data in the archive might not be surveys.
Instead it might be standardized tests, documents (birth or death records),
economic records, or an institutional archive.
-
How recently has the database been monitored
or updated? See if you can find a date on the page, typically at the
very top or the very bottom of the page. "Old pages" may have missing links,
unfixed errors, omit the most recent updates to files, or simply may not
work.
-
Were the data gathered over time by different
agencies or different principal investigators? If
so, changes in variables, definitions, or coding may have occurred. The
user may find differences attributable to these changes, rather than to
changes in the concepts they are studying--thus threats to internal validity.
-
How far back does the data series extend?
The
longer the series, the more likely you are to encounter strange alphabetic
and non-alphanumeric computer codes, or inconsistencies in definitions
or measures. And the more likely the original data are to be flat out MISSING.
-
Were data compiled from different agencies
into a single archive? Again, check for consistencies in definitions
(even of the same variable!) across agencies.
-
See if the description of the archive notes
any problems or missing information.
-
For prospective analysts: what are your
computer skills? Some databases are in ascii format which you can probably
download into a spreadsheet such as EXCEL. But the field delimiters vary
widely: some use spaces, others use commas, still others rely on a format
statement so that the data can be read. Do you know how to analyze data
using a spreadsheet program? If not, do you know how to transfer spreadsheet
data into a statistical program such as SPSS, SAS, M+, R, or other software?
Do you have file management skills so that you can insert value labels,
variable labels and missing data codes? In other cases, you may have to
save or print tabular displays and hand enter the data into a spreadsheet
(very carefully). As you can see, it is VERY helpful to have good computer
skills--or to have some good friends who do.
Any original problems when the data
were first gathered will STILL be there when the data are archived. See
what you can find out about issues with question format, sampling, coding
categories, and other sources of bias and random error. Sometimes (for
example: the General Social Survey) there will be considerable information
about entities such as response rate, but sometimes there is not.
Always remember this classic cliché:
do the best you can with what you got. Despite any problems, online
databases and archives are a terrific resource for us all.
|
WHERE
TO START HUNTING FOR ONLINE ARCHIVES
|
-
Professional associations in your field
-
The FSU on-line library system (schedule
a meeting with a librarian; FSU is an ICPSR and a Roper Center member)
-
Search engines using your topic of interest
-
Major US government or state WEB sites
(if
you are an International Student, check out sites from your home country).
The National Center for Education Statistics, the National Science Foundation,
the Centers for Disease Control--and even the State of Florida website
all contain links to many, many databases. You will find several of them
(but far from all of them!) in our course database menu.
-
Major archives such as the Inter-university
Consortium for Political and Social Research at University of Michigan
(ICPSR), Pew Center for Research on the People and the Press, or the Roper
Center (now at Cornell University).
-
One link leads to another. I found
the International Social Survey Program link from the General Social Survey
www site.
-
Check with faculty and graduate students
in The College of Information
-
Many recent textbooks have online supplements
or Web sites that list archives
|
CLICK HERE
TO ENTER THE ONLINE DATABASE MENU
|
November 29 2017
This page was built with
Netscape Composer.
Susan Carol Losh
Always
under construction as new databases are entered.