REVISED CALENDAR HERE

OVERVIEW

EDF 5481 METHODS OF EDUCATIONAL RESEARCH
DR. SUSAN CAROL LOSH


ISSUES WITH SAMPLING EMAIL (AND OTHER ISSUES) FOR ONLINE WEB SURVEYS

This site has been added as a resource for students who wish to pursue the topic of online surveys. 


 

SOME BASIC ISSUES
A PROBLEM
RESPONSES
CASRO STANDARDS


For several reasons, there is great interest in conducting online surveys. Many graduate students in the College of Education have already conducted online studies.
Because of growing problems with telephone survey response rates, many researchers and organizations are turning to online surveys.

In one common form, the survey is sent to an email address, the respondent fills it out and presses the "submit", "done", or "send" key.

In a second, increasingly used variation, respondents are invited to visit a linked Web site and complete the survey online. The procedure ends much as the sentence above.

Therefore, it is important to consider methodological issues about online surveys, in particular, sampling.

Very often, the principal investigator (PI) does not know who has completed the survey. This problem, of course, is true for mailed surveys in general (emailed or otherwise).

However, a somewhat different set of problems arises with surveys located on a Web site and also with sampling email addresses.

Although you will want to send reminders, be aware that some people will consider this an unwarrented invasion of privacy. When the Office of Research sent our our survey on undergraduate research experiences, in response to reminders, we received some death threats ("send me one more email about this survey and I will KILL you!") There is a first time for everything.

First, if the survey is on a WEB site, it may be accessible to everyone who is online, whether they are in the study population or not, unless the site is secured in some way. Arming each respondent with a username and password may not feasible for a one-time site visit (remember how much YOU hate memorizing or changing all the passwords that you absolutely must) although this should be cionsidered.

Many online surveys do a sample of email addresses instead and have respondents enter through a site address placed inside the email (is there any way to prevent respondents from passing such an unsecured site to their friends? Might the site be forwarded if the topic is important and/or controversial?)

Respondents with very strong points of view may return to take the survey several times.

Investigators should ensure that a given individual can "vote" only once, or that later visits to the survey site will overwrite earlier visits to the site.

The BIG problem is that there is no single comprehensive email list for ANY locality (country, state, city, etc.). Some professional organizations or associations may have comprehensive lists of their members, and many companies or organizations, such as FSU, have a comprehensive list of their own employees or students. Be assured that such a list will be out of date as soon as you receive it, no matter how recent it is. Plan for this eventuality (e.g., "bounced" email) when you plan your sample size so that you have a final sample close to the size you originally intended.

DON'T INCLUDE A SURVEY AS AN E-MAIL ATTACHMENT! Can I put this one strongly enough? I virtually no longer open ANYONE's attachments. I receive at least four virus or phishing attempts everyday and my experience is typical. Anti-virus or anti-phishing software is usually updated AFTER the latest virus or "phish" begins to hit. So when I see a large attached file, I delete the entire message without even opening it (some of the latest malware attempts try to open themselves.) Seeing a .doc file won't help as much malware has hidden .exe extensions. Depending on how badly a computer is infected, the owner may have to do a "wipe and reload," that is, the hard drive is totally erased and you must reload all your software, documents, etc. Even if you have backup copies of everything, take it from me, it can take hours or even days.

Many other online users are with me. Therefore, if a survey is sent as an attachment, again it will have a self-selected sample of very trusting online respondents, who may differ from the suspicious ones in unknown ways.

Although most of the population is online in many countries, obviously online users are not a representative sample of any particular country, state or locality. Online users are disproportionately well-educated, wealthier, more likely to be young or middle-aged adults. The only way around it is to give everyone in your sample something like Web-TV or a smart phone/tablet and teach them how to use it. If the researcher plans a lengthy panel (repeated interviews) study, this may actually be the cheapest way to go.

You  know what? Not everyone is online even at a university! FSU is considered one of the most "wired" campuses in the country. At many universities, not even a majority of students, faculty, or staff is online. Even at FSU, some people don't check their email or go online at all (few, I will admit--but are these folks systematically different from the majority who are?)

If the researcher simply makes an announcement and has people visit the Web site, this is a self-selected sample that represents no one but themselves.
They may not belong to your desired population. And you won't have many respondents either,.

Keep in mind that online surveys are  fundamentally still self-administered questionnaires, although typically with a well-educated population. Many problems with other types of self-administered questionnaires will apply here too (although currently you don't have to worry about postage.)

Individuals often resent solicitations sent to their email boxes and see them as unmitigated SPAM. Some people (see an example below) become angry if a survey invitation is sent to their box (personally, I think this is over-reacting because you can just press the delete key, but it is true that many people become quite irritated.)

Each time you add a new telephone number line to your home or business, or add a new cell phone with a separate number, you pay extra money. But in many cases, adding a new online email address is free or very cheap. Furthermore, you might be able to have multiple email addresses with the same company. The average online user has five email addresses. Thus, it is easy to oversample particular respondents and this plays havoc with your sampling fraction.

Observe human subjects protections! If one samples from an organization, such as FSU, be sure to obtain permission from that organization to sample email addresses of their employees or students. Give respondents enough information about the survey so that they can make an informed choice about whether to participate. Tell them how the data will be stored and used, too.



Are YOU considering conducting an online survey?
Read the points and exchange of views below from a group of experts and the voices of experience. Sampling issues present very serious problems.

Further, there are very important issues of ethics that are also discussed below.

THE FOLLOWING SECTION IS OPTIONAL FOR CLASS BUT YOU MAY FIND IT INTERESTING READING.
 


 
AN AAPOR EXCHANGE ON EMAIL LIST GENERATION

The American Association for Public Opinion Research (AAPOR) is the leading professional organization for anyone who studies public opinion or who conducts surveys. Its members are an approximate 50-50 mix of academics and survey research full-time professionals who work in government and industry. AAPOR members belong to "aapornet," a very vigorous list_serv. One set of email exchanges turned to online sampling issues and the interchange of views is reproduced for you below.


THE INITIAL STATEMENT OF THE PROBLEM

From: Scholar 1
To: AAPORnet [aapor-net@groups.aapor.org]
Subject: Collecting email addresses from Usenet for academic survey research

We have discussed the appropriateness of using various methods of collecting email addresses and on using the Web or Usenet as a way to collect email addresses.  A acquaintance posted a question on a Usenet group (dealing with net-abuse) about the appropriateness of unsolicited bulk email as a method of getting people to go to websites and fill out a questionnaire.

His initial post asked (among other things):

Can someone give me a pointer to some documents that specifically say that it is not "OK" to do this sort of stuff. I'd like to have something more to respond with than "I've been using newsgroups for ten years, and it isn't OK."

I asked him if he could go into a little more detail about what happened and he sent me this via email:

...................................................................
(From Scholar 1's email):

On November 1st I received an e-mail from someone I did not know, which read, in part:

"I am conducting research on parents' ideas about substance abuse prevention.  The study is called XXXXXX.  If you are not a parent, please consider forwarding this letter to a recovering friend who has children.  As a social work researcher, I am limited in how I can collect email addresses for research purposes.  I apologize in advance if this letter is unwelcome.

"If you would like to participate in this study, the questionnaire will take about 15 minutes of your time.  All information is confidential and there will be no other use made of your information or your email address.

"If you click on the following website . . . ."

Scholar 1 continues:

I replied only by asking how this person obtained my address. Specifically my reply was "How on earth did you get my e-mail address?" S/he replied with the following, again in part:

"I would be happy to explain how I got your email address.  As a university researcher I am not allowed to go to listservs.  I have to get individual email addresses from the public domain. This would be anywhere that the email address is found where you wouldn't have to join or agree to anonymity to gain access.  I found yours at one of the alcohol or drug alt.recovery sites. Email addresses are attached to each posting to the newsgroup.Newsgroups don't require you to join, you just post there.  To satisfy university research review standards, I can't send out a general post, I have to contact each person individually.  Perhaps not the most efficient way, but its all I have available to me under current university review rules."

The reply indicates that either this person is lying, or has no idea how research, e-mail, and newsgroups work in the context of a university setting. In part I think this person is lying, because I have never posted to an alt.recovery* newsgroup. So s/he must have obtained my e-mail address from somewhere else. I think also think that this person is quite ignorant of newsgroup etiquette, as it is considered inappropriate to cull e-mail addresses from newsgroups in order to send out spam. At the same time, it seems doubly inappropriate to cull e-mails from newsgroups devoted to drug and alcohol addiction recovery to ask them to participate in surveys about their addictions.

Strangely enough, I received another spam from a graduate student at another school the next day:

" My name is [XXX].  I am a Ph. D student at [XXX]. I am developing a survey about the impact of telecommuting on different ethnic groups (Hispanics, Blacks, Whites, etc.). If you are a telecommuter and you are willing to participate in this survey, please, visit:

[xxx]

"This research study has been reviewed and approved by the Institutional Review Board - Human Subjects in Research, [XXX] University. For research related problems or questions regarding subjects' rights, you may contact the Institutional Review Board through Dr. [XXX], at (xxx)xxx-xxxx."

I asked that person how s/he got my e-mail address, and got the following reply:

"In one discussion list.  Sorry if I bother you.  It is not my objective. Please, consider answering my survey if you are a telecommuter."

I have filed complaints against both people with their respective ethics boards, department chairs, and Computer Technology offices [for violation ofAcceptable Use Policies]. In my complaints I have asked to know the outcome of my complaint.
.......................................................................

I have to confess it does worry me that at least two universities are allowing/encouraging/teaching researchers to collect email addresses and data in this way.

And while this may be old guy sour grapes (why back when I was in grad school we collected data in cuneiform on clay tablets we made ourselves . .. ) it seems to me that these are not the kinds of research where the only way you could get a sampling frame was on the internet/usenet.
 


RESPONSES TO THE STATED PROBLEM

Return-Path: AAPORnet [aapor-net@groups.aapor.org]
From: Scholar 2
Subject: Re: Collecting email addresses from Usenet for academic survey research

Without intending to dismiss the issues you discuss, I wonder how long it will be before someone programs a computer to generate the electronic equivalent of RDD samples.  If a survey researcher wanted to conduct a study among users of, say, Time Warner Cable, he or she could study the constraints governing the prefix of the e-mail address -- technical, such as number of characters and which ones are disallowed, and "practical," such as presence of vowels or use of word components (morphemes) -- and just fire away.  The electronic equivalent of working number rate would be meaningless.  (This does assume that  the server could not recognize and block such messages,which perhaps it can.)  But would the sender not be able to claim that this was the same as creating random number telephone samples?
 


Return-Path: AAPORnet [aapor-net@groups.aapor.org]
Reply-To: Scholar 3
Subject: Re: Collecting email addresses from Usenet for academic survey research

The big boys are way ahead of you on this:

CHICAGO --10/22/01 - SPSS Inc.  (Nasdaq:  SPSS), a worldwide provider of analytical technology, with their SPSS MR division, the leading strategic technology partner for market research; and America Online, Inc., through its Digital Marketing Services (DMS)  subsidiary, the largest source of online survey respondents for market research firms, today announced a strategic alliance under  which SPSS Inc. has acquired the exclusive rights to distribute survey sample drawn from the more than 31 million AOL members and tens of millions of users of America Online's other interactive properties.  America Online, DMS and SPSS MR will work closely to expand online industry survey and sample services through OpinionPlace.com, the online industry's largest portal for reliable survey research respondents.

NOTE: this particular response precedes the acquisition of SPSS by the IBM company.


Sender: AAPORnet [aapor-net@groups.aapor.org]
From: Scholar 1 (who returns)
Subject: RE: Collecting email addresses from Usenet for academic survey research

FOLKS: READ THIS ONE FOR AN EXCELLENT LIST OF POTENTIAL SAMPLING PROBLEMS!

While they could claim it was the same as using an RDD sample the norms of using email are dramatically different than the telephone.  As are the economics.

I suspect the biggest problem with this would be the reaction of the large ISPs such as AOL or TWC.  They have already taken several commercial spammers to court successfully for repeatedly spamming their customers.  I also assume that this is the type of thing you would just be able to do once before blocks would be place on all incoming mail from whatever source the survey originated.  Look at the problems Harris Interactive had with an opt-in list.

This of course puts aside the problems of the science of such an endeavor:

1. Getting a list of all the internet registered domains that have associated email addresses.

or

1a.  Assuming that those people whose email addresses end in aol.com have the same views as those whose end in usc.edu.

2. Generating proportional Random Email Addresses for each domain.

3. Quotas or weighting for each domain (or subset of domains).

4. The problem of multiple email addresses (I have 4 email addresses).

5. The problems with response rates (as James points out).

And many more.

Of course many of these are dealt with successfully in telephone surveys so I am sure that it would be possible to deal with at least some of these.



Return-Path: AAPORnet [aapor-net@groups.aapor.org]
From: Scholar 4
Subject: RE: Collecting email addresses from Usenet for academic survey re  search

I work in a company that has an ASP technology for doing online surveys and other data collection activities.  Not only do we make sure that we never use anything but an opt-in list when sending out email invitations, but it's part of our contractual agreement with our customers.  Also a while back, one respondent, forgetting that he had opted in, complained to our ISP, who, without contacting us, shut us down for alleged spamming. (we now have a different provider)

While traditional surveys -- snailmail, face-to-face, and telephone -- do not require a pre-existing agreement to even being approached for a survey, the history of the internet and its attitude re: spam means that only opt-in is acceptable.

A 'snowball' sample might work (and may be perfectly legitimate in qualitative research, as mentioned by one aapor member), but one of our clients recently wanted us (over our strenuous objections) to get their opt-in respondents to send the survey to friends, and we got 0 responses. Your mileage may vary, of course.

One can acquire an opt-in list for about $.15 - $.25/name.  It may not be a representative population of the universe (and such a sample frame may be too costly to obtain anyhow), but it oughta work better than sitting and gleaning names off listservs.


Reply-To: AAPORnet [aapor-net@groups.aapor.org]
From: Scholar 5
Subject: Re: Collecting email addresses from Usenet for academic survey resear ch

Scholar 1 writes:

> I have to confess it does worry me that at least two universities are allowing/encouraging/teaching researchers to collect email addresses and data in this way.

What worries me as much, if not more, is if this sort of recruitment is meant to provide a sample from which any sound conclusions could be drawn.  If it is the moral equivalent of a qualitative study, then that is one thing.  But one would think it would be far more effective to recruit participants locally where one could meet with them and learn, in theory much more.  I wonder if universities think this is a reasonable substitute for science.



A RESPONSE FROM CASRO (Council of American Survey Research Organizations)

From: Scholar 6
To: AAPORnet [aapor-net@groups.aapor.org]
Subject: RE: CASRO Standards for Using E-mail addresses

The Council of American Survey Research Organizations (CASRO) has a Code of Internet Standards which specifically rejects the use of unsolicited bulk email broadcasts to elicit survey responses. Responding to the AAPORNET posting, the CASRO Standards say specifically and definitively that it is "not ok."  The Standards require research organizations to protect respondent confidentiality by verifying that "individuals contacted for research by email have a reasonable expectation that they will receive e-mail contact for research."

These CASRO Standards also prohibit research organizations "from using any subterfuge in obtaining email addresses of potential respondents, such as collecting email addresses from public domains, using technologies or techniques to collect email addresses without individuals' awareness, and collecting email addresses under the guise of some other activity."

These standards were developed because the Internet is a private network, unlike the U.S. Mail and telephone, which are public networks.  Because the Internet is a private network, Internet providers have the right to suspend or even terminate service of those who do mass emailing or spamming. Unsolicited email requests to participate in surveys may be considered spam.   Several research organizations have already had their service suspended for short periods because they were accused of unsolicited emails.

I have reproduced the appropriate Standards Section below.

*************************************************

Council of American Survey Research Organizations
Internet Standards and the Code of Standards and Ethics for Survey Research

The new language that addresses Internet research is inserted into the Responsibilities to Respondents section of the Code of Standards and Ethics for Survey Research.

I. Responsibilities to Respondents

Section 3.  Internet Research

a. The unique characteristics of internet research require specific notice that the principle of respondent privacy applies to this new technology and data collection methodology.  The general principle of this section of the Code is that survey research organizations will not use unsolicited emails to recruit respondents for surveys.

1. Research organizations are required to verify that individuals contacted for research by email have a reasonable expectation that they will receive e-mail contact for research.  Such agreement can be assumed when ALL of the following conditions exist:

            a.  A substantive pre-existing relationship exists between the individuals contacted and the research organization, the client or the list owners contracting the research (the latter being so identified);
            b. Individuals have a reasonable expectation, based on the pre-existing relationship, that they may be contacted for research;
             c. Individuals are offered the choice to be removed from future email contact in each invitation; and,
             d. The invitation list excludes all individuals who have previously taken the appropriate and timely steps to request the list owner to remove them.

2. Research organizations are prohibited from using any subterfuge in obtaining email addresses of potential respondents, such as collecting email addresses from public domains, using technologies or techniques to collect email addresses without individuals' awareness, and collecting email addresses under the guise of some other activity.

  3. Research organizations are prohibited from using false or misleading return email addresses when recruiting respondents over the Internet.

  4. When receiving email lists from clients or list owners, research organizations are required to have the client or list provider verify that individuals listed have a reasonable expectation that they will receive e email contact, as defined, in (1) above.



Return-Path: AAPORnet [aapor-net@groups.aapor.org]
From: Scholar 2
Subject: Re: CASRO Standards for Using E-mail addresses

How does "verifying that individuals . . . have a reasonable expectation that they will receive e-mail contact for research" relate in any way to protecting their confidentiality?  Absent some very unusual definition of "confidentiality," this is a complete non-sequitur.

It also does not follow that mail and telephone being "public" precludes prohibition of specified acts, such as unsolicited contacts.  Public media specifically prohibit certain uses, such as sending pornography through the mail or using the telephone to plan a crime.  So it's a matter of whoever controls the medium deciding that certain things can or cannot be done, not whether the entity is public or private.

Upon careful reading, the logic of the CASRO statement doesn't hold up.  Why is what we routinely do in one realm (telephone RDD) acceptable but its equivalent on the internet wrong?  Certainly not because one is "public" and the other "private."

If CASRO felt that not issuing a "spam prohibition" would result in the public receiving unacceptably large volumes of survey solicitations and that this would be bad for the industry, why not just say so?  I think that would be easier to defend than the present statement.


Return-Path: AAPORnet [aapor-net@groups.aapor.org]
From: Scholar 6
Subject: RE: CASRO Standards for Using E-mail addresses

Are you suggesting that it is OK to send unsolicited e-mail to people with whom you have no prior relationship? You seem to be missing the point that the Internet community is different than mail and telephone and that spamming is a serious issue that negatively impacts almost everyone who uses e-mail.

The term apparently originated from the famous Monty Python Spam sketch, wherein the Vikings, who were sitting in a restaurant whose menu only included dishes made with spam, would sing "Spam, Spam, Spam..." over and over, rising in volume until it was impossible for the other characters in the sketch to converse.

This is the effect that spam has on e-mail systems and users, especially when the number of junk e-mails exceeds the number of legitimate e-mails.

You wrote: "Why is what we routinely do in one realm (telephone RDD) acceptable but its equivalent on the internet wrong?" The phone allows you to make one call at a time and you get charged for every one.  With spammimg you can send 1000s of e-mails at once with no cost to you, but with a cost to the recipient.

Clearly we do not want to go down that road.



Return-Path: AAPORnet [aapor-net@groups.aapor.org]
From: Scholar 7
Subject: RE: CASRO Standards for Using E-mail addresses

Just for the sake of discussion, let me throw out this information.

1) There is an Internet technology known as geolocation which can determine the geographic location of the person who connects to a website by reading the access identifiers of the ISP server used.  At present, this has some degree of accuracy down to the city/county level.  To apply it to research of course we have to assume that most people use ISP servers located in their own geographic location of residence.  While mostly true for home access, it is not necessarily the case for office users.

2) There is also an initative being explored by the USPS to assign all people who have postal addresses USPS e-mail addresses to use for various purposes.  While these addresses are certain not to be only ones used by everyone to receive their e-mail, they may make possible mixed mode research amongst the general population much like that of last year's U.S. Census short form test.

Neither of these is yet a complete solution to the RDD dilemmas being discussed here and frankly I don't know much more about them than that which I have included.  However, each raises some interesting thoughts related to the subject of Norman's e-mail and might be of interest to this list.



From: Scholar 8
To: AAPORnet [aapor-net@groups.aapor.org]
Subject: Re: CASRO Standards for Using E-mail addresses

The CASRO Internet committee has indeed struggled with this issue for some time.  At the outset, the committee was pushed very hard to adopt a standard that would virtually eliminate all Internet surveys except for opt-in panels.  Fortunately, the standards were made more open to allow others to participate.

As a commercial provider of Internet survey services who does not manage a panel (opt-in or otherwise), our clients often come to us with their own e-mail sample that we have no control over.  We try our best to screen out bad lists, but it is not a cut and dried situation.  Other times we are required to negotiate with list suppliers for the sample our clients request.  The "nth" technology method of picking every nth person to hit a web page is another commonly used source of sample.

Another issue that needs to be reviewed is the fact that AOL recently signed an exclusive arrangement with another Internet S/W supplier to handle their OpinionPlace surveys.  It should always be a concern when the huge companies use their size to establish exclusive arrangements that restrict trade for their smaller competition.

Finally, I believe that standards for commerical marketing research may be very different from standards for public opinion research.  CASRO's standards may very well be a good reference point, but those working in the public sector must never lose sight of the fact that good research is heavily dependent on good sample, and not necessarily commercially correct sample.

Richard Rands



Return-Path: AAPORnet [aapor-net@groups.aapor.org]
From: Scholar 2
Subject: Re: CASRO Standards for Using E-mail addresses

I was not advocating spam -- either dictionary spam, or legitimate survey inquiries that, apparently, some feel are the equivalent of spam.

My intended contribution was to note that computer programming is probably capable of generating the equivalent of RDD samples for e-mail, something which I think is interesting.

The other point was that the CASRO statement does not hold up to careful reading.  After having looked at the article in USA Today, I think that the issues of dictionary spamming technology and prior relationship are hopelessly conflated.  Making a survey request of someone with whom one does not have a "substantive prior relationship" does not mean that you are marshalling cyber technology to bombard thousands of people.

CASRO and others have been effective in making legislators aware of the differences between telemarketing and telephone survey research and in keeping that channel open for research.  Maybe it was felt that a second struggle in that area would not be successful, so just drop back to the opt-in/pseudo-panel approach.  I think we are opening the door to some real problems in data quality with all but the very best managed of these.



From: Scholar 1
To: AAPORnet [aapor-net@groups.aapor.org]
Subject: RE: CASRO Standards for Using E-mail addresses

Geolocation is not terribly accurate on the micro level:

"At the country level, most geolocation services guarantee 99 percent accuracy or better. Figuring out which city someone is connecting from gets fuzzier. Akamai says it can accurately identify a North American user's city at least 85 percent of the time, while NetGeo promises an 80 percent success rate for cities worldwide. "
 
 
EDF 5481 READINGS AND ASSIGNMENTS

This page was constructed with Netscape Composer
Susan Carol Losh