How much confidence should you have in your test score?
When you measure the height of a table, you can be pretty confident that the measurement you make is correct. Rulers are well-calibrated measures that we can use with great confidence if we use them correctly. Scores on tests are not like points on a ruler. They always have bands around them called confidence intervals. A confidence interval is the range around your score in which your "true ability" is likely to reside. Usually, a confidence interval represents a likelihood somewhere between 70% and 95%.
The overall level of confidence we can have in a test score is represented in the test's statistical reliability. (A reliability of 1 is perfect.) As a general rule, no test that is used to evaluate individuals (as opposed to group trends) should have a statistical reliability below .85. Also, the higher the stakes, the higher the reliability. For example, the SAT and GRE have reliabilities in the .95 range.
There is a close relation between confidence intervals and reliability. If you place a series of 95% confidence intervals end-to-end along the scale of a really good standardized test—imagine putting pieces of string end to end along the length of a ruler—you won't be able to fit more than 4 to 6 of them on the scale without allowing them to overlap. This means that the test can distinguish only 4 to 6 truly different levels of performance.
So, why do scores get reported on scales that span more than 4 to 6 levels?