Audience Dialogue

Using Excel for survey analysis (6)
Using the Frequencies function

Among Excel's statistical functions, there is a Frequency function. Until recently I'd never managed to make this work, but now I've discovered the way to do it. This page explains how.

Suppose you have asked 9 people a question, to which they give one of 5 numerical answers, coded as 1, 2, 3, 4, or 5. If they didn't answer the question, no number is ended. You want to know how many people gave an answer of 1, how many said 2, and so on. In other words, you want a frequency distribution of their answers.

A common example of such a question is "How much do you agree with the following statement... [Insert any statement here] ... Answer 5 if you agree strongly, 4 if you agree a little, 3 if you are neutral, 2 if you disagree a little, and 1 if you disagree strongly." Excel doesn't know or care about the wording: all that matters is that the answer is a number, or a blank cell.

Suppose you have a set of figures like this, showing the answers that the 9 respondents gave to Question 1.

 

A

B

1

person

Q1

2

1

3

3

2

1

4

3

1

5

4

4

6

5

4

7

6

3

8

7

3

9

8

4

10

9

4

The Frequencies command will show how many people gave each possible answer: i.e. how many answers were 1, how many were 2, etc, and so on up to 5.

First of all, you need to make a list of the possible answers. (You'd expect Excel to do this automatically, but it can't.) Type the set of possible answers vertically, above or below the raw data, as in A13 to A17 below...

 

A

B

1

person

Q1

2

1

3

3

2

1

4

3

1

5

4

4

6

5

4

7

6

3

8

7

3

9

8

4

10

9

4

11

12

answer

13

1

14

2

15

3

16

4

17

5

18

Now highlight the blank cells to the right of the codes by moving the mouse across them. You must also include the cell below the highest code (B18 in the above example) - in case there's a figure higher than you expect.

Now you are ready to type in the function. You'd expect to be able to choose this from a menu, e.g. Insert > Function > Statistical > Frequency. If you do this, it will bring up a box asking for a Data array and a Bin array. However that doesn't seem to work. I can make it work only by typing in the command. So, with the cells still highlighted as above, type in

=FREQUENCY(B2:B10,A13:A18)

...but when it's typed in, don't press Enter! (If you do that, it gives you only one answer, not the whole set.) Instead, press Enter and Control and Shift together.

 

A

B

1

person

Q1

2

1

3

3

2

1

4

3

1

5

4

4

6

5

4

7

6

3

8

7

3

9

8

4

10

9

4

11

12

answer

13

1

2

14

2

0

15

3

3

16

4

4

17

5

0

18

0

There are your frequency counts (shown with a yellow background in the above example).

Are you clear about the meaning of those figures? Taking the example of lines 13-18 above, read it like this:
the answer of 1 appeared 2 times,
the answer of 2 appeared 0 times,
the answer of 3 appeared 3 times,
the answer of 4 appeared 4 times,
the answer of 5 appeared 0 times, and
answers greater than 5 (the unlabelled cell in A18) appeared 0 times.

So what were those cell addresses you typed in? The first was the data array: the cells where the raw answers were stored. In the above example, that's B2 to B10, shown as B2:B10. The second part of the Frequency command was the bin array: where the results are to go. In the above example, that was B13 to B18.

Why B18? Why not stop at B17, because there were only 5 possible answers? This is because of the way Excel treats the "bins". Actually a bin code of 1 means "anything up to and including 1". Likewise, the last bin (row 18 above) means "anything over 5". You can type that label in A18 if you like. Excel will ignore anything that's not a number in the bottom cell of the bin array.

Now try changing one of the above raw values - e.g. change cell B10 from 4 to 4.5. See how this increases the number of 5s by 1, while decreasing the number of 4s. Change B10 to 6, and the 0 in B18 now changes to a 1. Also, because the lowest bin was 1, anything less than 1 gets tallied in cell B13. (So don't use 0 to mean "no answer" - or, if you do, insert a bin code (in column B, between 12 and 13) or 0.

Copying across to other questions

If the same set of responses applies to later questions, and you don't want to repeat the bin array (possible codes) put a $ sign before the bin column - e.g. instead of B13:B18 above, put $B13:$B18. (The dollar sign before referring to each column means "use that same column when copying across.") Then, when you fill to the right, the frequencies are correct. (If you don't do this, you'll get a nasty "circular error" message).

 

A

B

C

1

person

Q1

Q2

2

1

3

5

3

2

1

6

4

3

1

5

4

4

5

6

5

4

4

7

6

3

5

8

7

3

3

9

8

4

3

10

9

4

2

11

12

answer

13

1

2

0

14

2

0

1

15

3

3

2

16

4

4

1

17

5

0

3

18

0

1

Limitations

Excel's Frequency function is fussy, in at least three different ways.

1. It only works with numeric answers, not alphabetic ones. For example, if you code male respondents as M and females as F, Excel will ignore all of these. You will need to change the letters to numbers to make it work.

2. It only works for vertical arrays, not horizontal ones. If the numbers that you want a frequency distribution for are in a row, not a column, you'll have to use the Transpose function to flip the array on its side, before calculating frequencies.

3. The bin numbers have to be in numerical order, going down - otherwise Excel gets horribly confused.

However, once you've defined the arrays and the bin codes, you can change the raw data, and the frequencies will automatically change too.

Also, you can use the bins to group the values. If you have asked respondents their exact age, but want to report them in age groups (e.g. under 25, 25 to 44, and 45-plus), you can put actual ages in the upper part, and label the first two bins 25 and 45. The people aged 45 and over will be counted in the third bin, at the bottom.

All this applies to Excel 97, 98, and 2000 - perhaps Microsoft has made the function work properly with later versions. (But I suspect not, because it's embedded in the strange logic that Excel uses.)