Audience Dialogue

Internet surveys: new ways to go wrong

Case study 13

After years of experience with telephone surveys, face-to-face surveys, mail surveys, and every other sort of survey we could think of, we thought we'd seen it all. Then the internet turned up: and our first few internet surveys revealed new problems we'd never encountered before.

1. Blatant lying

When an interviewer is present, a respondent can seldom get away with lying. Any experienced interviewer will know straight away when a respondent is embroidering the truth. It hardly ever happens. Respondents tend to give answers that make them look good - more so in phone surveys or self-completed questionnaires, but bare faced lying is both unusual and very obvious. Such questionnaires don't get used - but they amount to far less than 1% of responses.

Our first internet survey was for viewers of a current-affairs TV program aimed at high-school students and their teachers. The questions included "where do you live" and the respondent's sex, age group, and occupation.

One of the cases seemed odd: the respondent lived in the USA, though the survey was for an Australian program. Has this person just moved to America? we wondered. She was a university teacher, aged 65-plus. That in itself was unusual. And in the space for comments about the program, she had written some very bad language.

From curiosity, we checked the site log. We got the IP number of this respondent, and traced it to a computer pool in Monash University, Australia. Some student had found the online questionnaire, and was having us on. The solution was simple: we deleted that response. But then we wondered how many more responses could be like that. If they didn't include such blatant contradictions, we'd never have found them. Luckily, our file had collected enough data to be able to track a response back to its source. We decided that in future all responses would be run through our typicality checker, and the most atypical ones would be closely scrutinized.

2. Multiple responses

With printed self-completion questionnaires, once in a while you find that somebody has photocopied a questionnaire and returned it. It seems that some recipients of questionnaires decide that a friend or relative should fill one in. If the questionnaires are individually numbered and well printed, a photocopy is obvious, if only because two questionnaires come back with the same serial number.

With the internet, as with a mail survey, you never really know who has filled in a questionnaire. What never happens with mail surveys, though, is for a respondent to forget that he or she has returned a questionnaire, and to send an extra copy, just in case. With the internet, this is common. The respondent clicks on the Submit button, and nothing seems to happen. So he or she clicks again, and now two copies have been sent. Even when we have used a supposedly instant acknowledgement, this has happened. Sometimes the multiple responses arrive a few seconds apart (as shown on the log file), and sometimes the second - or third, or fourth - copy arrives days later, from a different IP number. (This can be a different modem, from the same ISP.)

In some cases, when two questionnaires came back from the same respondent, the first one was half-completed, and the second one fully completed. This happens with the respondent hits the Submit button by mistake, before answering all the questions. In this case, the second answer is the "best" one. In other cases, you may not want respondents to change their minds about some answers - e.g. after speaking to other people.

How can you know whether two similar (or identical) responses are from the same respondent? The best way is to ensure that at least one field has a different answer for every respondent. You can pre-screen respondents, and issue each one with a different password. Or you can have each response come back as a separate, numbered file. A second response can then overwrite the first, or be refused.

3. Over-strict validation

This drives respondents crazy. For example, recently I was completing an online questionnaire. One question was "Which state do you live in?" The answers were offered in a drop-down box: all 50 states of the U.S.

So what were you meant to answer if you didn't live in the U.S? This is the Worldwide Web after all - surely even Americans aren't that insular.

As I wanted to see the rest of the questionnaire, I clicked on some U.S. state - I didn't even look at which one it was. If the survey organizers got some wrong answers they probably would never know (though it's just possible they could tell from the IP number that I wasn't in the U.S, there'd be no reason for them to check my response - the rest of it would have looked fairly normal).

The lessons from this: always include an "other" category - even when you think it's unnecessary. Also, give plenty of opportunities for open- ended comments.

4. Losing data

When you have a paper questionnaire, a computer problem isn't the end of the world. As long as you know there's a problem with the data entry, you always have the paper questionnaire, and you can re-enter that.

But with internet surveys, everything is electronic, and horrifying losses are possible. Of course, computer files should always be backed up, but with some systems backup is not at all easy. One job required the use of Microsoft Front Page. This makes producing a web site easy, with a working version on a local computer, which is then "published" to the web site. However, when questionnaire responses come back, they are on the web site itself. If you change something on the site, then publish all the changes, it's very easy to overwrite the data file on the web site with its blank equivalent on the local computer. And you can't publish back in the other direction, nor can you access the data file from the Web. The only way to back up the file seems to be to copy the file in Front Page, then paste it back to the local computer. Not knowing this, I came within a hair's breadth of losing a complete data file by overwriting it with a blank one.

Another strange problem with Front Page is that (as far as I can figure out what happens) if the first case doesn't include all the fields - e.g. because not all respondents need to answer the last set of questions - those last fields will never be saved, on any case. So you need to put in a dummy case first, taking care to answer all questions. When you analyse the data, this case needs to be dropped.

It's a pity that Front Page has these problems (not to mention numerous others), because this software greatly simplifies the otherwise messy process of getting questionnaire responses back. Plain old FTP, which works in both directions and creates no obstacles to backing up, has turned out to be a lot safer.

- Dennis List