Audience Dialogue

Learning statistics

For a lot of people who are beginners to research, statistics is an impenetrable forest of numbers. Here are some of the most accessible web-based guides to statistical thinking. (If you're a statistician, this site is not intended for you. Try somewhere more advanced, such as this list of statistical resources from York University in Toronto.

Learning statistics

What's the best way to learn statistics? There are several possibilities:

The ideal, method, I think, would be a kind of apprenticeship: working under an expert, who'd gradually introduce you to new concepts and show you how to use them. But I don't know anybody who learned stats this way.

There are plenty of stats courses available - if you're under about 40 and have tertiary education, you've probably done at least one compulsory course - and very likely not understood much of it. If you're choosing a course, I recommend one with a lot of hands-on manipulation of numbers, and a minimum of listening to a lecturer. Statistics is one of those things you don't learn unless you actually do it - lots and lots of it.

Some people take the approach that these days, with statistical software widely available, you don't need to know much about statistics. Though nobody does statistical calculations manually any more, if you're going to use statistics, you need to understand what you're doing. The unthinking use of a computer program is likely to produce a perfectly accurate (and perfectly irrelevant) result.

Statistical methods

When you're studying statistics for the first time, initially it seems not too difficult. At the beginning, you learn what's known as "descriptive statistics." But after perhaps ten lectures, there's a sudden escalation in difficulty. Many students, when they encounter the "sampling distribution of the mean," seem to lose their grip on the subject. To break through this barrier requires strong motivation, patient teaching, a lot of time, and plenty of practice exercises. Textbooks (by themselves), are useless for all but the most determined students. Internet-based courses, such as Hyperstat online could be a lot better, but a real person could be better still.

If you know absolutely nothing about statistics, and you want to organize a survey, at the very least you need to acquire a good grasp of descriptive statistics. As for inferential statistics (on the far side of that barrier), you can probably get away with a vague knowledge of what they're used for, and when you might need to consult an expert.

I've searched the Web, and found a few clear guides to basic statistics:

The above guides, obscenely oversimplified as they may seem to a statistician, are probably still too difficult for a lot of people. Years ago, I was tutoring psychology undergraduates in statistics. Many of them didn't have a strong mathematical background, and were having trouble understanding their textbook. So I wrote a set of disgustingly simple notes, which I called The Absolute Beginner's Guide to Statistics. These days it would probably be called Statistics for Dummies. But the students weren't dummies at all - they just hadn't learned to think statistically.

I planned to unearth my guide, and put it on this web site, but I can't find it. I must have thrown it out. Maybe I'll try to rewrite it some day. In the meantime, I'll attempt the impossible: to explain statistics in a single paragraph. Here goes.

The key to statistics is that it deals with probabilities rather than certainties. When a survey includes only a small part of a population, you can understand intuitively that if a different group of people had been sampled, the result would have been slightly different. Statistics is used to estimate just how different the result could have been.
Though that's far from the whole of statistics, at least it gives you an idea of how statistics can be used.

Learning statistics from books

I don't recommend trying to learn statistics from a book. If you could have done that, you've probably done it already. But if you have a hazy knowledge, or you half-learned statistics years ago, perhaps a book could help you recall what you once knew well.

Bear in mind that statisticians place a high value on conciseness - or "elegance" as they like to call it. This means that you have to read most statistics books very slowly. Don't rush through a stats book as if you're reading a novel - you will get confused, disheartened, and dejected. Instead, read it like poetry; savour it, even. One of the first steps is to learn the notation for formulas, and how the sigma symbol translates into a whole set of calculations. Whenever a formula appears, try to understand it, by working out an example. If you don't fully understand one page, you won't have a chance of understanding the next one.

Back in the 1970s, when I first learned statistics, our official textbook was British, and about 100 pages. It was very "elegant," but most students found it forbidding: too many formulas, and not enough explanation.

The solution that several of us found best was to visit the university library (this was in Christchurch, New Zealand), which had several shelves of statistics textbooks: maybe 100 of them. When we found something in our textbook that we couldn't understand, we discussed it among ourselves (which often helped), or failing that, looked up that section in several other textbooks, until we found an explanation that we could understand. Different students preferred different books. One book was prized above all others. A mark of its value was that it was missing from most libraries. It was out of print, even then. I managed to get a secondhand copy, but lent it to a friend, who lent it to somebody else, and I never saw it again.

The authors were Wallis and Roberts, and the title was Statistics: A New Approach. A later version was entitled The Nature of Statistics. I've never seen such clear explanations of statistical concepts. As Wallis and Roberts stated: in most textbooks "the great ideas of statistics are lost in a sea of algebra." Not with theirs! I searched for this book on the Web, hoping it was back in print. No such luck - but I did unearth a few secondhand copies for sale, between 14 and 50 US dollars. This is a lot for a 40-year-old textbook: another indication of how highly it's prized. Why doesn't some enterprising publisher reprint it?

The problem with many statistics textbook is the sudden jump in difficulty between descriptive and inferential statistics. Descriptive statistics is quite easy, really: it's about distributions, means, standard deviations, and so on. Learners don't usually have much trouble with that. But then, after a few chapters on descriptive statistics, a lot of the books make stratospheric leaps into the "sampling distribution of the mean," and other such recondite concepts. At this point, many learners give up. The strength of the Wallis and Roberts book was that it took students gently through the transition.

The closest modern equivalent to the Wallis and Roberts book could be Statistics by David Freedman, Robert Pisani, and Roger Purves. As with the Wallis and Roberts book, the emphasis is on understanding what you're doing, and why. The authors have mostly tried hard to make themselves clear. However, it seems to me that in some places they've sacrificed clarity for "elegance". Making an explanation as short as possible doesn't always make it clearer. Different people understand things in different ways.

Some of the statistical software packages - particularly SPSS, Epi Info, and Statistica - have manuals or associated books that explain not only how the packages work, but what the statistics mean.