This glossary contains terms used when planning and designing samples, for surveys and other quantitative research methods.

**Abduction**

A useful but little-known concept first used by the philosopher Peirce around 1900. Similar to induction, it has been described as "testing a theory by fitting it over a framework of facts." If the facts fit the theory, this is *some* evidence that the theory may be valid.

**Adaptive sampling**

A method used for sampling rare populations. When you find something that you're interested in, you start to look more closely at nearby areas. For example, if you're studying a rare disease and find somebody who has it, you sample their neighbours as well. This is often used in quality control - e.g. when checking every 100th item that comes off the production line, if one day's production has more errors than usual, a higher proportion of those products will be checked. Adaptive sampling is more efficient than random sampling, but calculating population estimates becomes more complex.

**Analysis**

Understanding something by dividing into smaller parts, and studying each part separately. The opposite of analysis is synthesis. Survey analysis involves considering the responses to each survey question and pairs of questions.

**Axiom**

A statement that is so obvious that it's a starting point for any research. For example, the laws of arithmetic are axioms. To say that something is **axiomatic** is to imply that it must be true. But beware! Sometimes axioms are revealed to be assumptions that are not always true.

**Birthday rule**

When a researcher contacts a household and says "I'd like to interview the person who last had a birthday" this is not an advertising gimmick, but a way of ensuring randomness - it assumes that people's birthdays are spread evenly across the year.

**Census**

Survey of a whole population. Most countries have a Population Census (with a capital C) every 5 or 10 years, but a researched population can be much smaller. Thus a census (with a small c) of all staff of an organization would be a survey where everybody was sampled.

**Children**

In market research, children are defined internationally by ESOMAR
( www.esomar.org) as people aged under 14. Their parents' permission is needed to interview them. Some countries have specific laws on this, and have higher age limits - up to 18.

**Cluster**

When you are surveying people in their homes, it's very expensive to travel all around a city, interviewing at one home in each suburb. For efficiency, most door-to-door surveys use cluster sampling, where a starting address is chosen at random, and interviewing is done at a number of nearby homes - often about 10 of them.

**Convenience sample**

Using a sample of people who happen to be handy or easy to survey. May be OK in preliminary research, but not guaranteed to be representative of the population. Cf.
random and purposive samples.

**Deduction**

Deduction is what you do when you know the principles of something, and deduce a particular case. For example, if you know the principles of arithmetic, you can deduce that 23 + 161 = 184, even if you have never seen this example before. Induction is the opposite process. See also abduction.

**Dependent variable**

A statistical term for whatever measure you are trying to predict. See
independent variableand
regression.

**Dwelling**

The premises where a household lives. May be a house, flat, caravan, boat, etc. It is a place, not a group of people.

**Effective sample size**

If you interview two people at the same household, and ask them a question they give the same answer to - such as "how many TV sets are in your home?" the effective sample size is not 2 people but 1. People in the same household tend to give identical answers, so in order not to waste interviews, it's often best to interview only one person per household.
Cf. sample size.

**Empirical**

Based on actual data. You might believe that 50% of the population is male and 50% female, but empirical data for nearly all countries shows that the balance is closed to 49% men and 51% women. The opposite of empirical is theoretical.

**Error types**

Because a survey doesn't include all members of a population, or include all possible questions, various types of error can occur...
**Type I error (error of the first kind**): Finding a result statistically significant when in fact it is not, i.e. when a survey wrongly finds something that doesn't exist in the population.
**Type II error**: When a survey finds that a result is not significant, though in fact it is.
**Type III error**: Getting the right answer to the wrong problem. (Not so easily quantified.)

Another way of looking at error types depends on the source of the error...
**Coverage error ** - when the population sampled wasn't quite the population wanted. (Maybe a survey only included people who lived in the area, not visitors.)

** Non-response error ** - error that arises because some of the people chosen for the sample did not participate (e.g. if children avoided having their height measured, the average height measured in the survey would be too high).

** Measurement error ** - When responses weren't measured accurately (e.g. it's not easy to measure people's height when they're moving).

** Sampling error ** - because the survey used one sample of respondents rather than another. As long as the sample was chosen at random, the amount of sampling error is predictable - e.g. "there's a 95% chance that the average height of all people is within 3cm of the average of our sample".

**Generalization**

When you generalize from a particular case to a broad conclusion, you are making a generalization. For example, "All my friends agree with me on this question, so everybody else must agree with me too." On a more professional scale, when a survey takes a sample from a population, the results are based on the sample, but are generalized to the whole population that the sample was taken from. See
sample and
population

**Household**

A group of people who live in the same dwelling, and usually eat together. A household is usually the same as a family, but sometimes several families share one household.

**Hypothesis**

A statement or proposition capable of being tested. It must be stated in enough detail that its truth can be confirmed, e.g. by a survey. For example, "TV news is more interesting than comedies is not a hypothesis, but "Most Australians think that TV news is more interesting than comedies" *is* a hypothesis. A set of related hypotheses can be built into a theory.

**Independent variable**

One of a set of measures which is used to predict a dependent variable. See regression.

**Induction**

Inferring a general principle from a number of examples. The counterpart of
deduction. See also abduction.

**Market**

An area of interest for a commercial organization, usually corresponding with the area where a survey is done. See population. A market can also be
restricted to a type of product or service, e.g. the musical-instrument market in Vietnam.

**Maximum Variation sampling**

Also called a **maximum diversity sampling**. A type of purposive sampling in which respondents are chosen to be as different as possible from one another. When sample sizes are small (less than about 30) maximum variation samples can be more representative than random samples.

**Mesh block**

The smallest census unit, known as a Collector's District in some countries.

**Mirror sample**

When you suspect that respondents may not tell the truth about their behaviour, you can use a mirror sample. Instead of asking about "you", ask each respondent to think of a close friend of the same sex and age group, and ask them to answer for that friend - without naming the friend. The theory is that, because they don't know all the details about their friends, they'll really describe their own behaviour. Mirror samples have been used in
KAP surveys, to ask about sexual behaviour, in studies of AIDS. Similar to third person technique.

**n**

Shorthand for sample size, or number of
respondents, as in n=500. Technically it should be lower-case n, but upper case N is often used.

**Panel**

A group of respondents who agree to be surveyed a number of times - for exmple, each month, for a year - in order to detect trends in their behaviour or opinions. For regular surveys, this is also cheaper than finding new respondents each time. However there's a risk of **panel conditioning** - when members' behaviour is affected by their being on the panel, thus making them less representative.

**Paradata**

Information from a survey, apart from respondents' answers to questions. Includes interview times and dates, respondent's address, name of interviewers, and time taken on the questionnaire. In internet surveys, paradata includes the time the response was submitted and the IP number of the respondent's computer.

**Population**

Everybody (or thing) of a defined type, which could possibly be surveyed. Often the number of adults in a defined geographical area or market. Also known as **universe**.

**Probabilistic sample**

A more general term for a random sample.

**Probability**

Chance, expressed as a percentage or decimal. E.g. a 50-50 chance is a probability of 50%, or 0.50. **Odds** of 3 to 1 correspond to a probability of 25%.

**Purposive sample**

A type of non-random sample in which respondents are specifically sought out. For example, an industrial research project may use a purposive sample of organizations which are the largest buyers of a product, or a survey of poverty may be done only in the poorest localities in the area surveyed. Contrasts with
random and
convenience samples. A special type of purposive sample is the maximum variation sample.

**Quota sampling**

An alternative to random sampling, often used in street surveys. For example, if each sex makes up 50% of the population, 50% of interviews must be with men and 50% with women. A random sample will get 50% of each, on average, but a quota sample will get 50% every time. Nevertheless, other things being equal, a random sample is more accurate than a quota sample.

**Random**

A type of sample, selected in a way that gives each member of the population an equal chance of being included in the sample.

**Random digit dialling.**

Selecting numbers at random to produce telephone numbers to be dialled - usually beginning with a prefix with numbers known to exist, and adding some random digits on the end. Often abbreviated as **RDD**.

**Random walk**

In door to door surveys, a technique for gathering a random sample of households after starting at a particular point. E.g. turning left after leaving the first house, walking anti-clockwise around the block and trying to interview somebody at every fourth house. Notice that, though it's called a random walk, the selection of households follows a clear rule.

**RDD** = random digit dialling.

**Reliability**

A statistical term used in assessing an
instrument, meaning consistency or predictability. E.g. a survey question has 100% reliability if the survey is repeated and each respondent gives the same answer both times. See validity.

**Representativeness**

A sample is **representative** when it accurately reflects the population it is drawn from. When drawan at random, a very large sample can be assumed to be representative, but a small sample may be unrepresentative. To guard against unrepresentativeness, a sample can be stratified, or a maximum variation ample can be drawn.

**Response rate**

The number of questionnaires completed, as a percentage of the number of people who were approached in the survey. For example, if 100 people are asked to participate, and 70 questionnaires are completed, the response rate is 70%. Also known as **Return rate**. See also strike rate.

**Sample**

Part of a population: everybody (or everything) from who (or which) data was gathered.

**Sample size**

The number of questionnaires completed in a survey. Usually equals the number of people interviewed. Often shown in computer printouts as **N**. See also effective sample size.

**Sampling error**

The inaccuracy that arises because you interviewed one sample of the population rather than another equivalent sample. If the whole population is interviewed (see census), there can be no sampling error.

**Sampling fraction**

The fraction of the population that is sampled for a survey. Often a very small figure, like 1 in 1000. The sampling fraction is the reciprocal of the
raising factor.

**Stratification**

A **stratified** sample is one divided into a number of smaller samples. For example, in a survey covering a city, a stratified sample would divide the city into a number of smaller areas or **strata** (of known population) and sample a specific number of households in each **stratum**.

**Strike rate **

A measure used in telephone surveys, similar to response rate. Usually expressed as the number of interviews made as a proportion of people contacted, thus higher than response rate.

**Synthesis**

An understanding that comes from combining separate data into a whole. Almost the opposite of
analysis.

**Theory**

A theory is usually a set of hypotheses, suggesting a form of causal connection between sets of variables. A well known example is Darwin's theory of evolution. When knowledge is described as **theoretical**, it's based on a theory, but when knowledge is based on actual data, it's known as
empirical.

**Universe** =
population.

**Validity**

The extent to which an instrument is measuring what it's supposed to be measuring. For example, counting growth rings is a valid measure of a tree's age. If no measure is fully valid, indicators can be used. See also reliability.