Audience Dialogue

Software for content analysis

Spreadsheet software, such as Excel, can work quite well for content analysis. I don't recommend using spreadsheets for analysis of survey data, but content analysis is generally simpler than survey analysis, and if you don't have any statistical software, a spreadsheet is the next-best choice.

Put each content unit on one row of the spreadsheet, and each "question" in one column. (See this technology diffusion example for a detailed description of how this can be done.)

Quantitative software for content analysis

Several software packages are available that have been written specifically for content analysis of documents, particularly when the units are small and clearly separated. These programs count:

  1. Word frequencies: how often each word occurs in a document. Often, the commonest words (stopwords)can be excluded: words like in and the which add little meaning but get in the way of the analysis. Also commonly included is lemmatization, which means combining words with the same stem, such as intend, intended, intends, intending, intent, intention, intentions, etc.
  2. Category frequencies: synonyms are first grouped into categories, and the program shows how many times each category occurs in the document.
  3. Concordance, or KWIC (key word in context) - showing each word in the document, in alphabetical order - and its context in the document.
  4. Cluster analysis, which groups together words used in similar contexts.
  5. Co-word citation - looks at the occurrence of pairs of words. The US government's Echelon Project, which tries to spy on the entire world's email, uses this method. If you write "bomb" and "chocolate" in the same sentence, no problem - but writing "bomb" near "Moslem" may get you rendered by the US military. (Oops!)

Among the widely used quantitative programs for content analysis are:

General Inquirer from Harvard - the first software of this type: very powerful, but perhaps not as user-friendly as some successors

VBPro - still a DOS program, but widely used. There are several other VB Pros (Visual Basic and Visitor's Book) but this is the text analysis one by Mark M Miller.

Wordsmith. This is Mike Scott's Wordsmith Tools, one of several software packages of this name.

Textpack - a system for computer-aided quantitative content analysis, orignally designed for the analsyis fo open-ended questions in surveys, but extended to cover many aspects of content analysis. Windows only, in English and Spanish versions. From ZUMA in Cologne.

TACT - Text Analysis Computing Tools. A text analysis and retrieval system, with 16 separate programs, designed for literary studies - but also useful for social research. Good old DOS software.

Textstat- a simple text analysis tool, from the Dutch Linguistics group at the Free University of Berlin. A freeware program for the analysis of texts; it runs under Windows. It can read a website and put the pages into its own corpus. Then it can do word frequencies and concordance.

The key point about these programs is that, as long as your unit of content is very short (a word, or a small group of words) these programs do the content analysis for you. You do no coding (coding frames either don't apply or are built in), you need no judges, and the tedious drudgery of large-unit content analysis just doesn't apply. All you have to do is interpret the results.

So use these programs when possible - but the problem is that for any sort of subtlety in content analysis, there's no substitute for human judgement - and that takes time.

Qualitative software

When your content units are large chunks of text, and codes may overlap (as in transcripts of interviews, and magazine articles) you need quite a different type of software. This is not automatic, because part of the coding job is to decide where each theme begins and ends - "units" are not used in this context. This software displays the text (from a computer file) on screen. This can be coded in two ways:

(1) by coding chunks of text right on the screen

(2) by (a) printing out the text, (b) writing codes on the printout, then (c) going back and coding it on the computer.

Most people prefer the second method - partly because you usually can't see the whole context of a text on a small computer screen.

All of this software is fairly complex, and mostly not cheap. It will probably take you at least a full week to learn how to use it. You may need to take a course, lasting for several days. Though this software saves time in the long run, it's probably not worthwhile to buy it and learn it for a single content analysis project - unless that project is a very large one, with coding likely to take a month or more.

Commonly used software of this type includes

With most of these, you set up a hierarchical coding frame, then code chunks of text. EZ-Text is designed for semi-structured databases, such as surveys with open-ended questions. The others are designed to handle large text files with no particular structure. Nud*ist seems to be the most widely used, and the most powerful. Though it's not the easiest of these to use, nor is it very difficult. If you're looking for help with this software, you could try contacting a local university, particularly the sociology or anthropology department or management school.

The CAQDAS (Computer Assisted Qualitative Data Analysis Software) project aims to provide practical support, training and information in the use of a range of software programs which have been designed to assist qualitative data analysis. Another useful page of links is from the < Department of Social Sciences at Loughborough University in the UK.

For more structured text, such as open-ended questions in a survey, the main possibilities are EZ-Text (discussed above), Ask Sam, Info Select, and the Hypercard type of software.

Ask Sam is a database specially designed to handle text divided into fields, with the same fields in each record. Instead of the usual fields such as "name" and "address" you could have "answer to Q1" and "general comments" etc.

Info Select is software designed to collect lots of little bits and pieces of text that start off in no particular order, and gradually grouping them into order. Not so useful for questionnaire data, but if you were making lots of short interviews or collecting short segments of text from many sources, Info Select could be very useful indeed.

Hypercard (for Macintosh) was excellent in many ways, but seems to have faded from public consciousness. A big advantage of Hypercard was that it had one of the easiest-ever scripting languages. Though Hypercard is no longer supported by Apple, the software is still around, supported by its user group Also, there are Windows equivalents such as Supercard and Revolution. Far from cheap, though.

A few years ago I designed a content analysis program that would work on the Web. It was designed for classifying shortish comments (up to one computer screen full). Instead of using coding, it used a principle of grouping similar comments together, then grouping those groups. It would have done for comments what Microsoft Outlook does for email: each comment would be like one email, and comments would be grouped in the same way that emails can be grouped into directories and subdirectories. Unfortunately the software that the programmer developed was rather clumsy and unusable. If anybody else would like to have a go at this, you can read the full specifications here. You could possibly achieve a similar effect by using Notelens. It looks very promising, but I haven't yet tried it on a real project - when limitations become obvious.

Software for automatic content analysis

There's software that's so smart it will do the content analysis for you. Just find a set of text files, already in computer format, start up this software, tell it where to find the coding frame, and relax for a few seconds while the content analysis is done for you. Two examples of this are KEDS/TABARI and CAMEO- Conflict And Mediation Events Observations. The catch here is the limited scope. CAMEO is designed to study political conflict and might not work very well in a different context.

For open-ended survey questions, two options are Verbastat and Statpac. If you have lots and lots of money, try Verbastat, which is owned by SPSS. This is heavy-duty software for huge market research companies, and costs many thousands of dollars. If you're analysing open-ended questions by the thousand, all day, every day, this could save a lot of time. For a more reasonable price, around 700 USD, you can buy the excellent statistical package Statpacwhich includes an automatic coding module called Verbatim Blaster.

This looks like the future for content analysis: semi-intelligent software that can do most of the coding, supervised by a human who defines the purpose of the study and teaches the software how to correct its mistakes. I predict that by about 2020 practically all content analysis will be done mostly by computer. And, because it will be much less labour-intensive, there will be a lot more of it. But because a lot of it will be done by people who don't understand the assumptions they are making, there won't be much more that's actually useful.

Heavy-duty database software

Heavy-duty database software can also handle simple content analysis, but this software can be very difficult to set up - e.g. Microsoft Access and MySQL. Filemaker Pro and Lotus Approach are the simplest general database programs that we've tried. A hint: don't let computer experts talk you into something that's difficult to use. They might think it's simple, because they've been using it for years. If anybody recommends that you use a program in this "heavy duty" group for content analysis, ask if they will promise to help you with all the problems you encounter. Unless the answer is a very clear Yes, I suggest you use simpler software, such as a spreadsheet. Excel works surprisingly well - see this example.


This section is short, because there are several pages on the Web that have many links on content analysis software. Why reinvent the wheel - and have more links to keep up to date?) Try these...

Related pages on this website:
Content analysis | Statistical software | Concept mapping software | Software for qualitative research