Posts Tagged educational testing

The limitations of testing

It is important for those of us who use assessments to ensure that they (1) measure what we say they measure, (2) measure it reliably enough to justify claimed distinctions between and within persons, and (3) are used responsibly. It is relatively easy for testing experts to create assessments that are adequately reliable (2) for individual assessment, and although it is more difficult to show that these tests measure the construct of interest (1), there are reasonable methods for showing that an assessment meets this standard. However, it is more difficult to ensure that assessments are used responsibly (3).

Few consumers of tests are aware of their inherent limitations. Even the best tests, those that are highly reliable and measure what they are supposed to measure, provide only a limited amount of information. This is true of all measures. The more we hone in on a measureable dimension—in other words, the greater our precision becomes—the narrower the construct becomes. Time, weight, height, and distance are all extremely narrow constructs. This means that they provide a very specific piece of information extremely well. When we use a ruler, we can have great confidence in the measurement we make, down to very small lengths (depending on the ruler, of course). No one doubts the great advantages of this kind of precision. But we can’t learn anything else about the measured object. Its length usually cannot tell us what the object is, how it is shaped, its color, its use, its weight, how it feels, how attractive it is, or how useful it is. We only know how long it is. To provide an accurate account of the thing that was measured, we need to know many more things about it, and we need to construct a narrative that brings these things together in a meaningful way.

A really good psychological measure is similar. The LAS (Lectical Assessment System), for example, is designed to go to the heart of development, stripping away everything that does not contribute to the pure developmental “height” of a given performance. Without knowledge of many other things—such as the ways of thinking that are generally associated with this “height” in a particular domain, the specific ideas that are associated with this particular performance, information from other performances on other measures, qualitative observations, and good clinical judgment—we cannot construct a terribly useful narrative.

And this brings me to my final point: A formal measure, no matter how great it is, should always be employed by a knowledgeable mentor, clinician, teacher, consultant, or coach as a single item of information about a given client that may or may not provide useful insights into relevant needs or capabilities. Consider this relatively simple example: a given 2-year-old may be tall for his age, but if he is somewhat under weight for his age, the latter measure may seem more important. However, if he has a broken arm, neither measure may loom large—at least until the bone is set. Once the arm is safely in a cast, all three pieces of information—weight, height, and broken arm—may contribute to a clinical diagnosis that would have been difficult to make without any one of them.

It is my hope that the educational community will choose to adopt high standards for measurement, then put measurement in its place—alongside good clinical judgment, reflective life experience, qualitative observations, and honest feedback from trusted others.

, , ,

No Comments

Promoting development

There is a vast literature exploring ways to promote development. Much of this literature focuses on speeding up development, some of it focuses on optimizing development. Although both approaches are intended to support development, there is evidence that approaches focused on optimizing development are likely to do a better job. This is because development involves two intertwined processes, differentiation (broadening and deepening knowledge) and integration. In plain(er) English, you get more adequate integrations at each level if you accomplish rich differentiation at the prior level.

When we code an assessment, we pay close attention to the degree to which the test-taker elaborates each of the sub-skills it targets. In our personal feedback, we note areas of strength and areas that appear to require further growth. The basic idea is to bring all of the sub-skills up to an optimal level of elaboration to support the emergence of next-level integrations.

Most of the readings we suggest are targeted one to two phases (1/4 to 1/2 of a level) above the level of a given performance. This practice has been shown to provide the ideal level of challenge (scaffolding) for optimal growth. We also suggest activities like engaging in discourse with peers, journaling, cultivating a habit of reflection, and improving metacognitive skills, all of which provide support for growth.

We do not teach people to think at higher levels. Higher levels of performance emerge when knowledge is adequately elaborated and the environment supports higher levels of thinking and performance. We focus on helping people to think better at their current level and challenging them to elaborate their current knowledge and skills—including the not-so-sexy nuts-and-bolts knowledge required for success in any context.

, , , , ,

4 Comments

What is a developmental assessment?

A developmental assessment is a test of knowledge and thinking that is based on extensive research into how students come to learn specific concepts and skills over time. All good developmental assessments require test-takers to show their thinking by making written or oral arguments in support of their judgments. Developmental assessments are less concerned about “right” answers and more concerned with how students use their knowledge and thinking skills to solve problems. A good developmental assessment should be educative in the sense that taking it is a learning experience in its own right, and each score is accompanied by feedback that tells students what they are most likely to benefit from learning next.

, , ,

No Comments

A good test

In this post, I explore a way of thinking about testing that would lead to the design of tests that are very different from most of the tests students take today.

Two propositions, an observation, and a third proposition:

Proposition 1. Because adults who do not enjoy learning are at a severe disadvantage in a rapidly changing world, an educational system should do everything possible to nurture children’s inborn love of learning.

Proposition 2. In K-12, the specific content of a curriculum is not as important as the development of broadly applicable skills for learning, reasoning, communicating, and participating in a civil society. (The content of the curriculum would be chosen to support the development of these skills and could—perhaps should—differ from classroom to classroom.)

Observation. Testing tends to drive instruction.

Proposition 3. Consequently, tests should evaluate relevant skills and be employed in ways that support students’ natural love of learning.

Given these propositions, here is my favorite definition of a “good test.”

A good test is part of the conversation between a “student” and a “teacher” that tells the teacher what the student is most likely to benefit from learning next.

I’ll unpack this definition and show how it relates to the proposals listed above:

Anyone who has carefully observed an infant in pursuit of knowledge will understand the conversational nature of learning. A parent holds out a shiny spoon and an infant’s arms wave wildly. Her hand makes contact with the spoon and a message is sent to her brain, “Something interesting happened!” The next day, her arm movements are a little less random. She makes contact several times, feeling the same sense of satisfaction. Her parents laugh with delight. She coos. In this way, her physical and social environment provide immediate feedback each time she succeeds (or fails). Over time, the infant uses this information to learn how to reach out and touch the spoon at will. Of course, she is not satisfied with merely touching the spoon, and, through the same kind of trial and error, supplemented with a little support from Mom and Dad, she soon learns to bring the spoon to her mouth. And the conversation goes on.

Every attempt to touch the spoon is a kind of test. Every success is an affirmation that the strategy just employed was an effective strategy, but the story does not end here. In her quest to master her environment, the infant keeps moving the bar. Once she can do so at will, touching the spoon is no longer satisfying. She moves on to the next skill—holding the spoon, and the next—bringing it to her mouth, etc. Having observed this process hundreds of times, I strongly suspect that a sense of mastery is the intrinsic reward that motivates learning, while conversation, including both social and physical interactions, acts as the fuel.

Conversation

A good educational test should have the same quality of conversation, in the form of performance and feedback, that is illustrated in the example above. In an ideal testing situation, the student shows a teacher how he or she understands new concepts and skills, then the teacher uses this information to determine what comes next.

Part of the conversation

However, a good test is part of the conversation—not the entire conversation. No single test (or kind of conversation) will do. For example, the infant reaches for the spoon because she finds it interesting, and she must be interested enough to reach out many dozens of times before she can grasp an object at will. Good parents recognize that she expresses more sustained interest if they provide her with a number of different objects—and don’t try to force her to manipulate objects when she would rather be nursing or sleeping. Each act is a test embedded in a long conversation that is further embedded in a broader context.

What comes next?

In the story, I suggest that the spoon must be both interesting and within an infant’s reach before it can become part of an ongoing conversation. In the same way, a good test should both be engaging and within a student’s reach in order to play its role in the conversation between student and teacher.

An engaging test of appropriate skills can tell us how a student understands what he or she is learning, but this knowledge, by itself, does not tell the teacher (or the student) what comes next. To find out, researchers must study how particular concepts and skills are learned over time. Only when we have done a good job describing how particular skills and concepts are learned can we predict what a student is most likely to benefit from learning next.

So, a good test must not only capture the nature of a particular student’s understanding, it must also be connected to knowledge about  the pathways through which students come to understand the concepts and skills of the knowledge area it targets.

Back to conversation

I argue above, that in infancy, a sense of mastery is the intrinsic reward that motivates learning, while conversation is the fuel. If conversation is the fuel, tests that do a good job serving the conversational function I outline here are likely to fuel students’ natural pursuit of mastery and a lifelong love of learning.

Later: But what about accountability?

, , ,

4 Comments