On the outskirts of Brisbane, Australia, lies Ipswich: not quite a suburb, but no longer a separate town. Over the years, Brisbane has steadily grown, and the gap of farmland that used to separate Brisbane from Ipswich has all but vanished. In the late 1990s, a mood of self-doubt settled over Ipswich. Its future was uncertain: what should it try to be?
A local broadcaster commissioned a forum to discuss the future of Ipswich, trying out the practice of public journalism. This was a one-day public forum, with the proceedings and outcomes to be broadcast on local radio and TV. To give the participants some facts to focus on, my group carried out some preliminary research.
We surveyed 390 adults by telephone, using a questionnaire that was a mixture of multiple-response and open-ended questions. The multiple-response questions produced some interesting answers (e.g. 54% said they'd prefer to live somewhere else than Ipswich), but the open-ended questions produced far richer responses.
Why should this be, we wondered? We realized that the multiple-response questions we'd chosen to ask were based on our own knowledge of Ipswich. But as we were based in Adelaide, 1500km away, there was a lot we didn't know about Ipswich, and thus a lot of questions which we wouldn't have thought to ask.
Fortunately, the open-ended questions were broad enough to enable us to gather all sorts of perceptions about life in Ipswich. For example, we asked "As far as you're concerned, what are the main advantages of living in Ipswich, compared with other places?"
290 of our 390 respondents answered this, giving detailed comments. We transcribed all these comments, and had 5,500 words. That's about 12 pages of closely printed text - and that was only one question. It seemed unlikely that most participants in the forum would read all that. We needed some way of distilling the comments, presenting only the most interesting and most representative ones.
We did this by writing a computer program. It produced a listing of all the words and phrases, showing the frequency of occurrence of each. The program compared the frequency of each word in the comments with its frequency in normal spoken English. Words occurring much more commonly in the comments than in normal English (apart from 180 very common stopwords, meaningless by themselves) formed the significant themes in the comments.
For example, the most significant terms were Brisbane and i>close/ closeness. These became a theme of "close to Brisbane".
The next most significant group of terms were: good, shops/ shopping, live/ living/ lived, city, Ipswich, cheaper, area/s, work, house/ houses/ housing, place/s, country.
Reading through that group of terms gives a clear idea of the advantages of living in Ipswich, as seen by the people who live there.
Having defined these main themes (as well as about 40 others), the computer program then analysed each comment in turn for its density of themes. The more of these themes in each comment, in proportion to its total length (ignoring stopwords), the more concentrated was that comment. The program then listed the comments in descending order of concentration.
We picked the 20 most concentrated comments, and included these in the report. Instead of 12 pages, we had just over 1. Reading through the selected comments, then all the others, it was clear that the computer selection had captured the essence of the comments.
The same process was repeated for each of the nine other comment questions. With only one day's work (not counting writing the computer program, which also took about a day) we produced a comprehensive report in time for the forum, with survey data less than a week old.
At first it seemed surprising that such a crude computer program could produce such reasonable results. But all it was doing was selecting the most relevant comments.
The normal way to report survey comments is for somebody to read through them all and write a summary. Though this is subjective -in that no two people would write the same summary - our computer program was also subjective, because of various decisions we made about how it would work.
The main disadvantage of using this program to select comments was that, because it analysed only one or two words at a time, it was unable to select well-argued comments which described a commonly occurring concept in different words. But this can be overcome by reading through the rejected comments, looking for any which are particularly coherent and well expressed. Sometimes there's no substitute for a human! But as a way of sorting out many pages of comments, this computer-based approach proved very efficient.