Confidence in testing
I doubt there is a person in the Western world over the age of 4 who hasn’t taken a psychological or educational test. Yet very few of us know one of the most important facts about these tests—their scores are always imprecise.
When you measure the height of a child, you can be pretty confident that the measurement you make is correct within a fraction of an inch on either side. And if you check the time on your mobile phone, you can be pretty certain that it is accurate within a fraction of a minute on either side. Rulers and clocks are well-calibrated measures that we can use with great confidence if we use them correctly. The same is true of measures of temperature, speed, frequency, and weight.
But even measurements made with these metrics are more or less precise. They’re correct within a range. These ranges are called confidence intervals. The confidence interval around the measurement of a child’s height would be expressed as something like “82 centimeters plus or minus 1/2 of a centimeter.” Statisticians would say that the child’s true height is likely to be somewhere in this range.
Scores on educational and psychological tests have confidence intervals too. But there is a difference between these confidence intervals and those for physical measurements. The confidence intervals around scores on psychological and educaitonal tests are larger than the confidence intervals around measurements in the physical world. How much larger? Let’s look at an example.
The psychological and educational tests with the smallest confidence intervals are those made by high-stakes test developers like ETS. For their high stakes tests — the ones used to make decisions like who gets to go to which college — they set the highest standard. This standard, if it was applied to measuring height, would allow us to to say something along the lines of, “We’re confident that this child is 82 centimeters tall, give or take 8 centimeters.”
Now, you may argue that 8 centimeters isn’t all that much, but if you’re buying a car seat or deciding who gets to ride a roller coaster, it could be the difference between life and death. Measurement precision matters.
The more imprecise our measurements are — the bigger the confidence intervals around them — the more careful we need to be about the kinds of decisions we make with them. When it comes to educational and psychological assessment, I think we’re far too careless. Too many people who buy and use assessments don’t know enough about statistics to make well-informed assessment decisions.
Fortunately, I believe we can remedy this! And it seems to me that the best place to begin is with confidence, so, in the next article in this series I’m going to share a super-easy way to figure out how much confidence you can have in any test’s scores.