Glossary of data management

This glossary covers the terms used in collecting research data (mainly through surveys) and managing it after collection.

Verifying or validating an interview, by recontacting the respondent, and asking some of the same questions again. This term is used mainly in Britain.

When a desired survey respondent is not interviewed on the first attempt, a callback is made: i.e. a second or later attempt at an interview.

One "case" in a survey is usually one questionnaire, or one respondent. In computer terms, this is often one "record".

Computer-Assisted Telephone Interviewing: doing surveys - usually by phone - directly from a computer screen, with no printed questionnaire. There's also CAPI (computer-aided personal interviewing), and an emerging generic term CAI.

Central location
A type of research method where respondents are all interviewed at one venue - as opposed to interviewers going out and interviewing respondents in many different places. See intercept, hall testing.

Entering the answer to survey questions into a computer in abbreviated form. For example, M for male, F for female. Coding in qualitative research uses the same principle, on a larger scale.

Door to door
A door to door survey is one in which respondents are interviewed in their homes. A door-to-door survey is always face to face, but not vice versa.

Face to face
A type of survey where respondents are interviewed in person - not on the phone, by mail. etc. Sometimes abbreviated as F2F. See also door to door.

Data mining
A set of computer-based techniques for extracting useful meanings from huge databases.

Used in forecasting, this means calculating the next number in a series by applying a mathematical formula. For example, if last year a magazine sold 1,000 copies, and this year 2,000 copies were sold, what will the sales be next year? A linear extrapolation would be 3,000 (adding 1,000 each year), while a geometric extrapolation would be 4,000 (doubling each year).
Projection is a more general term, including extrapolation.

In a computer file, a field is a set of digits read as one number. For example the digits 310599 could be divided into fields thus: 3,1,0,5,9,9 or thus: 31, 05, 99.

Field (the)
Where the public are. An interviewer "in the field" is out interviewing people, which is known as Fieldwork. Don't confuse this field with the other type of field used in computing.

To include in an analysis only a certain category of respondents. Notice that this is the opposite in meaning to "filter out"; thus a table headed "filtered to age 35-plus" means that only respondents aged 35 and over are included.

A method of combining results from two surveys, by matching each respondent in one survey with a similar person in the other, then treating the combined set of answers as if they were all given by that person. For example, a TV audience survey might be fused with a product-use survey to help decide which channels the product's users watch.

Hall test
Getting a group of people together, e.g. in a public hall, usually to see a product demonstration and to fill in questionnaires on the spot. A type of central location study. Often called a theatre test, specially in North America.

Making an assumption about a missing value. E.g. if a survey respondent hasn't supplied his or her sex, and the person's occupation is Housewife, it's a reasonable imputation that the sex is female.

Intercept interviewing
A type of research where respondents are intercepted by interviewers in a public place (often a shopping area) and asked to take part in a survey.

Interview log
Record of attempts by an interviewer to contact and survey a respondent. Different researchers refer to this by many different names (e.g. Contact History Record, Call Sheet), but Interview Log seems clearest, and is commonly used.

Missing value
In a computer file for a survey, unanswered questions and those with answers like "Don't know" are declared to be missing values, and are usually excluded when calculating percentages etc. Sometimes, missing values can be guesstimated using imputation.

Non-response bias
If you try to survey 100 people, and 40 of them don't respond, those 40 could be different in some important way from the 60 who did respond. That's non-response bias - a problem often ignored in survey research. Non-response bias can be estimated by comparing data on the current sample with other data (e.g. from a Census) on the same population.

Personal interview
The type of interview where an interviewer questions a respondent face-to-face. Cf. telephone interviews and
self-completion questionnaires.

Primary data
Data from primary research - that is, research on specific individuals, from a survey or other database. Much the same as
raw data. Contrasts with secondary data.

A technique used by interviewers to get more information from respondents on particular questions. E.g. "Can you tell me more about that?" or "Is there any other reason you feel that way?"

A mathematical process similar to
extrapolation: multiplying the number of respondents who gave a particular answer in a survey by a raising factor (as explained below) to estimate the corresponding number in the population. Psychological projection has a completely different meaning.

Raising factor
Most easily explained with an example. If a population has 1000 people and you interview 100 of them, and 20 of them (20%) answer Yes to a question, you can them assume (as long as the sample was random) that 20% of the population would also answer Yes. 20% of 1000 is 200. To project the 20 in your sample to 200 in the population, you need to multiply the sample number by 10. That figure of 10 is the raising factor. It's also the reciprocal of the sampling fraction.

Raw data
Specific answers to survey questions from individuals. Also called primary data or (by Census bureaus) unit record data. The opposite is secondary data.

Grouping survey answers together in a computer file. E.g. in a question on radio stations listened to, the commonest stations might be listed separately, and all stations rarely listened to could be recoded as "other".

In a computer data file, one record is equivalent to one questionnaire, or one interview. In a word-processing file, a record is one line or paragraph on the screen.

An agency which recruits people, e.g. for focus group discussions. See screener.

Person who responds to questions in a survey; a person interviewed. If an experiment was being done instead of a survey, the person would be called a Subject. In a group discussion, they'd be called a Participant.

Stands for "Recency, Frequency, Monetary value." A term used in database marketing: an often-used criterion for deciding which people on the database should be targeted for an offer. Usually it's those who have bought something most Recently, or most Frequently, or who have spent the most Money. RFM formulas are also used to work out who should be dropped from a database, and when.

A screening questionnaire, as used by recruiters to determine who is eligible to attend a group discussion. There are also screener questions, asked early in a questionnaire to weed out those not eligible to answer the remaining questions.

Secondary data
Data derived from secondary research: that is, from the results of a survey, not using the
raw data, and presented in summary form. Contrasts with primary data.

A whole exercise of measuring public opinion. Don't confuse a survey with a questionnaire: some people say "The interviewer did 50 surveys" when they mean 50 interviews, for one survey. As a verb, "to survey" is used much more loosely, and often means the same as "to interview."

Theatre test
Same as hall test. This term is used mainly in North America, where it is spelled theater test. Also called an auditorium test.

A military concept applied to marketing. Divide the population into three groups: those who will never use the product or service in question, those who will always use it, and the rest. Focus on the rest, because the behaviour of the other two groups can't be changed.

A computer term for an answer to a question. On a questionnaire, a question has an answer; but in a computer record, a variable has a value.

Validation = verification, in the context of backchecking interview data. This is also referred to as backchecking.

Computer talk for a question which can have one answer. A question which allows multiple answers will have one variable for each possible answer.

A quality control method used in fieldwork: reinterviewing a percentage of respondents (usually 10%) to check that the original interview had in fact taken place, and had recorded their answers accurately. Also known as validation or (specially in Britain) as backchecking.

Giving some questionnaires more "votes" than others when tabulating survey results. Usually happens when one type of respondent is over-represented in the sample, but not in the population. For example, if in a survey the response rate among men was half that among women, the men's questionnaires could be given a weight of 2 (i.e. counted twice) to produce a balanced result. See also Propensity score.