Archive for category Lectical Assessment System
Maintaining inter-rater agreement
Posted by Theo in Lectical Assessment System, cognitive development, measurement on April 24, 2010
How we maintain inter-rater agreement and ensure high reliability at DTS/DiscoTest
First, we design assessments with 5-7 essay questions, partly because this number is required to allow us to achieve a level of reliability that allows us to identify 4 phases per lectical level. This corresponds with a corrected alpha of .95 or greater.
Second, we engage in continuous learning. Certified analysts and trainees attend mandatory weekly scoring meetings (called scoring circles) where they discuss scoring and review challenging cases.
Third, when we begin working with data from a new subject area, the scoring circle always examines a diverse sample of protocols before starting to score in earnest. Then, when we begin scoring a new assessment, two Certified Analysts score every performance until agreement rates are consistently at or above 85% within 1/4 of a level.
Fourth, we second score a percentage of all performances, some selected at random and some selected because the first analyst lacks confidence in his or her score.
- 5%-10% of all assessments, selected at random, are second-scored by a blind analyst (a higher percentage on newer assessments or when the rate of inter-rater agreement is unacceptable.)
- A second, blind scorer is required to score an assessment any time the first scorer’s confidence level is below the level we call “confident”.
When the scores of the first and second scorers are different by more than 1 phase, first and second scorers must reconcile through discussion. If they cannot reconcile, they must consult a third Certified Analyst.
Confidence levels
4 = very confident: exemplary, prototypical
3 = confident: no guesswork, not too much variation, no more than 2 responses where scorer wavers, no lack of coherence, no language problems, adequate explanation, no suspicion of plagarism, not idiosyncratic
2 = less than confident: guesswork, too much variation, more than 2 responses where scorer wavers, lack of coherence, language problems, inadequate explanation, suspicion of plagarism, idiosyncratic
1 = not confident at all: unscorable or almost unscorable, very idiosyncratic, very incoherent
The limitations of testing
Posted by Theo in Lectical Assessment System, educational testing, measurement, testing in general on March 15, 2010
It is important for those of us who use assessments to ensure that they (1) measure what we say they measure, (2) measure it reliably enough to justify claimed distinctions between and within persons, and (3) are used responsibly. It is relatively easy for testing experts to create assessments that are adequately reliable (2) for individual assessment, and although it is more difficult to show that these tests measure the construct of interest (1), there are reasonable methods for showing that an assessment meets this standard. However, it is more difficult to ensure that assessments are used responsibly (3).
Few consumers of tests are aware of their inherent limitations. Even the best tests, those that are highly reliable and measure what they are supposed to measure, provide only a limited amount of information. This is true of all measures. The more we hone in on a measureable dimension—in other words, the greater our precision becomes—the narrower the construct becomes. Time, weight, height, and distance are all extremely narrow constructs. This means that they provide a very specific piece of information extremely well. When we use a ruler, we can have great confidence in the measurement we make, down to very small lengths (depending on the ruler, of course). No one doubts the great advantages of this kind of precision. But we can’t learn anything else about the measured object. Its length usually cannot tell us what the object is, how it is shaped, its color, its use, its weight, how it feels, how attractive it is, or how useful it is. We only know how long it is. To provide an accurate account of the thing that was measured, we need to know many more things about it, and we need to construct a narrative that brings these things together in a meaningful way.
A really good psychological measure is similar. The LAS (Lectical Assessment System), for example, is designed to go to the heart of development, stripping away everything that does not contribute to the pure developmental “height” of a given performance. Without knowledge of many other things—such as the ways of thinking that are generally associated with this “height” in a particular domain, the specific ideas that are associated with this particular performance, information from other performances on other measures, qualitative observations, and good clinical judgment—we cannot construct a terribly useful narrative.
And this brings me to my final point: A formal measure, no matter how great it is, should always be employed by a knowledgeable mentor, clinician, teacher, consultant, or coach as a single item of information about a given client that may or may not provide useful insights into relevant needs or capabilities. Consider this relatively simple example: a given 2-year-old may be tall for his age, but if he is somewhat under weight for his age, the latter measure may seem more important. However, if he has a broken arm, neither measure may loom large—at least until the bone is set. Once the arm is safely in a cast, all three pieces of information—weight, height, and broken arm—may contribute to a clinical diagnosis that would have been difficult to make without any one of them.
It is my hope that the educational community will choose to adopt high standards for measurement, then put measurement in its place—alongside good clinical judgment, reflective life experience, qualitative observations, and honest feedback from trusted others.
About measurement
Posted by Theo in Lectical Assessment System, cognitive development, measurement on July 29, 2009
The story of how measurement permits scientific advance can be illustrated through any number of examples. One such example is the measurement of temperature and its effects on our understanding of the molecular structure of lead and other elemental substances.
The tale begins with an assortment of semi-mythical early scientists, who agreed in their observations that lead only melts when it is very hot—much hotter than the temperature at which ice melts, and quite a bit cooler than the temperature at which iron melts. These observations, made repeatedly, resulted in the hypothesis that lead melts at a particular temperature.
To test this theory it was necessary to develop a standard for measuring temperature. A variety of early thermometers were developed and implemented. Partly because these early temperature-measuring devices were poorly calibrated, and partly because different temperature-measuring devices employed different scales, the temperature at which lead melted seemed to vary from device to device and context to context.
Scientists divided into a number of ‘camps’. One group argued that there were multiple pathways toward melting, which explained why the melting seemed to occur at different temperatures. Another group argued that the melting of lead could not be understood apart from the context in which the melting occurs. Only when a measure of temperature had been adequately developed and widely accepted did it become possible to observe that lead consistently melts at about 327º C.
Armed with this knowledge, scientists asked what it is about lead that causes it to melt at this particular temperature. They then developed hypotheses about the factors contributing to this phenomenon, observing that changes in altitude or air pressure seemed to result in small differences in its melting temperature. So, context did seem to play a role! In order to observe these differences more accurately, the measurement of temperature was further refined. The resulting observations provided information that ultimately contributed to an understanding of lead’s and other elements’ molecular structure.
While parts of this story are fictional, it is true that the thermometer has greatly contributed to our understanding of the properties of lead. Interestingly, the thermometer, like all other measures, emerged from what were originally qualitative observations about the effects of different amounts of heat that were quantified over time. The value of the thermometer, as we all know, extends far beyond its use as a measure of the melting temperature of lead. The thermometer is a measure of temperature in general, meaning that it can be employed to measure temperature in an almost limitless range of substances and contexts. It is this generality, in the end, that makes it possible to investigate the impact of context on the melting temperature of a substance, or to compare the relative melting temperatures of a range of elemental substances. This generality (or context-independence) is one of the primary features of a good measure.
Good measurement requires (1) the identification of a unidimensional, content and context-independent trait (temperature, length, time); (2) a system for assessing the amount of the trait; (3) determinations of the reliability and validity of the assessments; and finally (4) the calibration of a measure. A good thermometer has all of the qualities of a good measure. It is a well-calibrated instrument that can be employed to accurately and reliably measure a general, unidimensional trait across a wide range of contexts.
It was this perspective on measurement that first inspired me to try to find a good general measure of the developmental dimension. To read more about how this way of thinking relates to the Lectical Assessment System (LAS), read About Measurement on the DTS site. Pay special attention to the list of things we can do with the LAS.
Integrative complexity and the LAS
Posted by Theo in Lectical Assessment System, cognitive development on July 15, 2009
Suedfeld and Tetlock’s Integrative Complexity Scale is one of a number of developmental scales—most of which have been informed by Jean Piaget’s cognitive developmental theory—that subscribe to the notion of hierarchical integration. Piagetian and neo-Piagetian theorists view development as a process of differentiation (increasing knowledge) and integration (organizing knowledge). Rather than viewing learning as an additive process in which we simply accumulate bits of knowledge over time, integrative theories propose that learning is an active process through which we organize our knowledge in particular ways, depending on where we are in our development. Moving from one development level to another involves a reorganization of our knowledge that translates into a new way of thinking.
For example, when most 6-year-olds think about lying, they are likely to think about it in terms of a single consequence—keeping out of trouble, getting into trouble, or making Dad sad. An eight-year-old can think about lying in terms of multiple possible consequences—getting in trouble and keeping out of trouble, which makes it possible to decide which outcome is more likely given past experience. You can view a more detailed description of this process in an online article, The Lectical Assessment System.
Suedfeld and Tetlock’s Integrative Complexity Scoring System (ICSS), like the Lectical Assessment System (LAS) and the General Hierarchical Complexity Scoring System (HCSS) is a content-independent scoring system that can be used to score the level of integrative complexity in a wide range of texts. What differs between these scoring systems are the scoring rules. Here, I discuss the difference between the scoring rules of the LAS and the ICSS.
The LAS goes to the heart of differentiation and integration by asking analysts to examine the way arguments are explicitly structured (single elements, linear arguments, or systems) and the way the meanings of their elements are implicitly structured (single elements, linear arguments, or systems). We call this core structure. The LAS has been subjected to a number of psychometric studies and has been shown to be a valid and reliable measure of the cognitive-developmental dimension, reliably (in the statistical sense) distinguishing 20 developmental phases between age 5 and the highest levels of adulthood.
Domain-based developmental assessment systems generally target conceptual content and aspects of surface structure. The ICSS relies primarily upon indicators of surface structure. In other words, instead of directly examining core structures, the developers of this system focus on a number of indicators that point to these core structures—including things like perspective, compartmentalization, setting up “straw men”, inclusion/exclusion rules, conflict avoidance, recognizing “exceptions to the rule”, probability statements, etc. The reliability of this assessment is generally too low to justify its clinical use (i.e., to provide a score for an individual), and some forms of the assessment do not appear to meet the reliability requirements for group studies. (see Reliability 2: How high should it be?)

Recent Comments