Audience Dialogue

Using Excel for survey analysis (4)
Nominal variables

For questions with answers that are not continuous numbers, and for which you don't want to calculate averages and the like, pivot tables work better than the Data Analysis command. If you have entered any non-numeric codes (like Y for yes and N for no) the Data Analysis command ignores them.

Pivot tables have a reputation for being difficult to use, but they can be quite simple. (They can also get fearsomely complicated, and produce apparently wrong results. You think they're wrong, but Excel in its weird logic is giving you what it believes you asked for.)

The first instructions apply to Excel 97; Excel 2000 is a bit different, and notes for that are given below.

Excel 97

To create a frequency distribution for one variable (i.e. one column):

1. Select the whole column by clicking on its letter at the top (e.g. C)

2. Go to the Data menu and click on Pivot Table Report

3. The Pivot Table Wizard comes up, with this screen:

So far, so good. Click on the Next> box.

4. Now this window comes up:

The range (in this case $C:$C, which means Column C) is automatically inserted for you - because you selected that row in Step 1 above. Now click Next> again.

5. Now the tricky part:

With the mouse, grab the box labelled Q1 - or whatever is the heading of the column you want to analyse. In this case, cell C1 was labelled Q1. Drag that box from the top right-hard corner to the space labelled ROW. The original box at the top right will stay put.

6. Now, drag the same box from the top right corner to the DATA area. When it reaches the data area, the label will change from Q1 to Count of Q1 (or whatever). The window will look like this:

  7. You could have clicked Finish at that point, but if you click Next> you see this window:

8. Choose New worksheet. Much safer - strange things can go wrong if you put the pivot table on the current worksheet, and later edit it. Also, if you rename each new worksheet, you can quickly find each table. (How do you rename a worksheet? It's easy, though not obvious: put the cursor over the worksheet label near the left foot of the screen, click the right mouse button, and you'll see a menu that includes Rename. If you need to add more worksheets, go to the Insert menu and click on Worksheet; this will add a worksheet before the one you're currently looking at. )

9. Click Finish, and your pivot table appears - like this...

Now you see the number of cells with each value in Q1. In this case there were 135 with n (No) and 298 with y (Yes). You can change the labels (e.g. change n to No and y to Yes) but you can't edit the figures.

Excel 2000 is different

The Pivot Table Wizard now has only 3 steps. These are steps 1, 2, and 4 of the Excel 97 version. What used to be step 3 now appears at the end. A new spreadsheet window opens, looking like this:

Microsoft seems to have tried to make the task easier, but (in its usual way) has ended up making it more confusing. As with Excel 97, you move the button to the table. With Excel 2000, you drag the button off the Pivot Table window, looking like this:

...onto the table above that. (In the screenshot above the possible "buttons" are labelled column 1 and column 2 and column 3.)

You need to do it twice: once into the area labelled "Drop Data Items Here", and again into the area labelled "Drop Row Fields Here". Depending on where you drag the button first, you get misleading initial results. Don't worry, you've done nothing wrong, but you won't see the correct frequency distribution till you drag that button twice.

It will then look like this, with the table of results appearing at the top left of the worksheet.

Fixing problems caused by missing data

Be aware that "blank" in pivot tables does not mean "empty" - it means cells that contain blank spaces. Cells that had nothing entered in them will be ignored in the pivot table.

If, during data entry, you skipped over cells when a question hadn't been answered, these cells won't be counted as (blank) in the pivot table. But you can calculate the number of missing cells by going back to the original spreadsheet. Look at the number of the last row of data, subtract one (because the top row has headings, not data), then subtract that from the grand total.

Only rows with non-empty cells for both questions are shown in the pivot table. If you want to find out what replies the people who didn't answer (say) Q1 gave to Q2, the only way you can do this is to change the empty cells to blanks. To do this, highlight the columns you want to change, and all the rows with survey data, then use the Replace command. Type nothing in the Find box, and a blank space in the Replace box. Both boxes look the same (empty), but the replacements will be made. Now the "grand total" shown in the pivot table will be the real grand total.

Let's take a previous screenshot as an example...

In the worksheet that produced this pivot table, the last row used was 501, so we have data from 500 respondents. (The difference between 500 and 501 is due to the top row, with the headings.) As the Grand Total in the above screenshot is only 433, there must be 500-433 blank cells: that's 67. So below the pivot table, in cell B7 (in the above example) enter the formula =501-B6.

So that people will know what it means after you print it out, you can enter the description  in A7 (e.g. "not answered"). You could even write "Total respondents" in A8, and "500" in B8.

You can also calculate percentages of the Grand Total, and put them in column C. It would be nice if you could automatically do percentages with pivot tables - it may be possible, but I haven't yet found a way to do it. (But I also haven't tried very hard - why would I want to, when I can use SPSS, Epi Info, etc? ... Maybe some day...)

Next » Cross-tabulation with pivot tables