In most countries of the developed world - and many of the developing countries as well - regular TV audience measurement surveys take place. In the past these were done by having random samples of TV viewers fill in diaries to show which channels they had been watching, at which times. These days, the diaries have generally been replaced by meters attached to the TV sets of a panel of randomly selected households.
Audience measurement has become an essential part of broadcasting, specially for commercial channels, because advertisers want to know how many people might be reached by a commercial placed in a particular timeslot.
However, the mere fact that so many thousands of people are watching a program does not necessarily mean they like it. For those who first decide to watch TV, and only then choose a channel, the program they watch may be considered mediocre - but they had to watch it to discover that. Alternatively, programs on the other channels may be even less attractive.
Conversely, when two channels deliberately schedule attractive programs at the same time, a small audience (as measured in a diary or meter survey) may conceal the fact that both programs were very popular with their viewers.
The Australian Broadcasting Corporation, in the 1980s (bearing in mind that the audience size and viewer popularity of a program could be quite different) decided to set up a regular TV Appreciation survey. The goal was to introduce a system that was simple to carry out, relatively cheap, and which provided very quick results. My task was to design that system.
The system used simple questionnaires which were completed by the respondents themselves. We did a separate survey every week, with this schedule:
|Wednesday||Prepare questionnaire for following weekend, including programs up to Friday|
|Thursday||Have questionnaires printed|
|Friday||Distribute questionnaires to interviewers (6 interviewers worked each week).|
|Saturday||Interviewers distribute questionnaires to 2 clusters of 10 households: 1 questionnaire for each person aged 10 and over.|
|Sat-Sun||Respondents complete questionnaires.|
|Sunday||Interviewers call back to collect completed questionnaires, leaving reply paid envelope for questionnaires not completed.|
|Sun-Mon||Interviewers return questionnaires to office.|
|Monday||Questionnaire data entered into computer. Summary program run, and tables produced.|
|Tuesday||Results distributed to TV managers.|
To run the entire survey took about one day a week of my time (excluding time spent discussing the results and their implications with TV managers) and two to three days a week for a junior researcher. Though the weekly sample size was only about 300, we produced amalgamated monthly reports with samples of over 1000 - enough for reliable breakdowns of age groups, etc.
To minimize sampling error, we used what I thought was a very elegant sample design. The survey area was divided into six sub-areas, of almost equal population. Within each sub-area, one of several hundred possible Collector's Districts (the smallest unit of reporting Census data in Australia, averaging about 200 households) was chosen at random. To compensate for any peculiarities of that Collector's District, we used Census data to find another Collector's District which best compensated for those peculiarities, averaging across 30 or so key measures. For example, if the Collector's District chosen at random had a high proportion of children, we'd select a matching one with an appropriately low proportion of children.
The result of this method of sampling - one I invented, and have never seen reported anywhere else, but similar to "dumb-bell sampling" - was that our effective sample sizes were increased. This compensated for the effective sample size reduction caused by the fact that our interviews took place in clusters of nearby households. Because neighbouring households tend to be similar, you in effect gain less information from the second neighbour than from the first. With our clusters of 10 households, even though alternate households were skipped to reduce the clustering effect, our effective sample size was only about 70% of what it would have been had we chosen households strictly at random. The compensatory sampling method brought the effective sample size back very close to the actual sample size.
So why didn't we just choose households strictly at random? Because they would have been scattered over a wide area, and the costs of having interviewers travel between them would have been prohibitive. As it was, more than half the interviewers' time was taken up with travel, even though we deliberately employed interviewers who lived close to the areas where they were working.
Each week's questionnaire was a single sheet of paper. On the front of this was a list of 70 or so current TV programs. Respondents were asked to rate the quality of each program by writing in a number from 0 to 5, as follows:
|0||Have not seen|
This scale was based on the TvQ system, used by CBS TV in the USA in the 1950s. Though very simple, it produced useful results. We also experimented with other sets of wording, such as a Liking scale, ranging from 1="one of my favourites" to 5="I hate it") and an Action scale, ranging from 1="I always try to watch this" to 5="I always switch this off". As the correlations between all these scales were between .8 and .9 we settled for our Quality scale (as above) because quality was what we were trying to achieve.
It may have struck you that the Quality scale is lopsided. Why not settle for Very good / good / average / poor / very poor? Answer: because people tended to be over-generous, and when we tried the latter scale about 80% of ratings were either "good" or "very good". Having "excellent" at one end and "poor" at the other spread the figures more evenly, making the results more useful.
For each program, we presented two key figures, which we called Familiarity and Appreciation. Familiarity was the percentage of all respondents who had seen a program (i.e. the percentage who did not rate it "have not seen). Appreciation was the percentage of those who had seen a program who rated it as "excellent" or "very good". For many programs, this figure was around 60 to 70% of its viewers.
Notice that people who had not seen a program did not get to rate its quality. This was deliberate: nonviewers were rated as, in effect, not members of the target audience.
We found that changes in a series program's appreciation figure often occurred months before a change in its audience. It was as if, after a while, people suddenly said to themselves "Why am I watching this? I don't like it any more."
Another use for appreciation data was finding programs that appealed to more people than had been expected. The British comedy "The Young Ones" was originally shown very late at night in Australia, but it achieved such high appreciation figures that it was rescheduled to peak time, where it gathered an enormous following.
We found that by comparing the figures from this survey with those from the regular diary surveys, we could fairly accurately predict audiences. In its simplest version (in a market with only two channels) we found by trial and error that around 30% of a program's audience was related to the difference in Appreciation between the previous programs on the two channels, 5% to the difference in Appreciation between the following programs on the two channels, and 65% to the difference in Appreciation between the current programs on the two channels. If we knew the Appreciation figures for the six programs in this set, we could usually predict the ratings within about a 5% margin.
TV appreciation surveys don't replace diary-based or meter-based surveys of audience size. They measure a different thing, which is why they are useful. Audience size is a far-from-accurate indicator of program popularity. There are many other factors, related to the unwillingness of TV viewers to change channels, personal dominance within households, and the lower awareness levels of recently introduced programs.
Our system worked well for six years, till a new generation of managers (more interested in "bums on seats" than audience appreciation) stopped funding it. With appropriate modifications, it would be equally effective now: specially for public broadcasters in countries with not very many channels available.