Statistics for all: Estimating confidence

In the first post in this series, I promised to share a quick and dirty trick for determining how much confidence you can have in a test score. I will. But first, I want to show you a bit more about what estimating confidence means when it comes to educational and psychological tests.

Let’s start with a look at how test scores are usually reported. The figure below shows three scores, one at level 8, one at level 6, and one at level 4. Looking at this figure, most of us would be inclined to assume that these scores are what they seem to be—precise indicators of the level of a trait or skill.

How test scores are usually presented

But this is not the case. Test scores are fuzzy. They’re best understood as ranges rather than as points on a ruler. In other words, test scores are always surrounded by confidence intervals. A person’s true score is likely to fall somewhere in the range described by the confidence interval around a test score.

In order to figure out how fuzzy a test score actually is, you need one thing—an indicator of statistical reliability. Most of the time, this is something called Cronbach’s Alpha. All good test developers publish information about the statistical reliability of their measures, ideally in refereed academic journals with easy to find links on their web sites! If a test developer won’t provide you with information about Alpha (or its equivalent) for each score reported on a test, it’s best to move on.

The higher the reliability (usually Alpha) the smaller the confidence interval. And the smaller the confidence interval, the more confidence you can have in a test score.

The table below will help to clarify why it is important to know Alpha (or its equivalent). It shows the relationship between Alpha (which can range from 0 to 1.0) and the number of distinct levels (strata) a test can be said to have. For example, an assessment with a reliability of .80, has 3 strata, whereas an assessment with a reliability of .94 has 5.

Reliability Strata
.70 2
.80 3
.90 4
.94 5
.95 6
.96 7
.97 8
.98 9

Strata have direct implications for the confidence we can have in a person’s score on a given assessment, because they tell us about the range within which a person’s true score would fall—its confidence interval—given the score awarded.

Imagine that you have just taken a test of emotional intelligence with a score range of 1 to 10 and a reliability of .95. The number of strata into which an assessment with a reliability of .95 can be divided is about 6, which means that each strata equals about 1.75 points on the 10 point scale (10 divided by 6). If your score on this test was 8, your true score would likely be somewhere between 7.13 and 8.88—your score’s confidence interval.

The figure below shows the true score ranges for three test takers, CB, RM, and PR. The fact that these ranges don’t overlap gives us confidence that the emotional intelligence of these test-takers is actually different**.

If these scores were closer together, their confidence intervals would overlap. And if that was the case—for example if you were comparing two individuals with scores of 8 and 8.5—it would not be correct to say the scores were different form one another. In fact, it would be incorrect for a hiring manager to consider the difference between a score of 8 and a score of 8.5 in making a choice between two job candidates.

By the way, tests with Alphas in the range of .94 or higher are considered suitable for high-stakes use (assuming that they meet other essential validity requirements). What you see in the figure below is about as good as it gets in educational and psychological assessment.

estimating confidence when alpha is .95

Most assessments used in organizations do not have Alphas that are anywhere near .95. Some of the better assessments have Alphas as high as .85. Let’s take a look at what an Alpha at this level does to confidence intervals.

If the test you have taken has a score range of 1–10 and an Alpha (reliability) of .85, the number of strata into which this assessment can be divided is about 3.4, which means that each strata equals about 2.9 (10 divided by 3.4) points on the 10 point scale. In this case, if you receive a score of 8, your true score is likely to fall within the range of 6.6 to 9.5*.

In the figure below, note that CB’s true score range now overlaps RM’s true score range and RM’s true score range overlaps PR’s true score range. This means we cannot say—with confidence—that CB’s score is different from RM’s score, or that RM’s score is different from PR’s score.

Assessments with Alphas in the .85 range are suitable for classroom use or low-stakes contexts. Yet, every day, schools and businesses use tests with reliabilities in the .85 range to make high stakes decisions—such as who will be selected for advancement or promotion. And this is often done in a way that would exclude RM (yellow circle) even though his confidence interval overlaps CB’s (teal circle) confidence interval.

estimating confidence when alpha is .85

Many tests used in organizations have Alphas in the .75 range. If the test you have taken has a score range of 1–10 and an Alpha of .75, the number of strata into which this assessment can be divided is about 2.2, which means that each strata equals about 4.5 points on the 10 point scale. In this case, if you receive a score of 8, your true score is likely to fall within the range of 6–10*.

As shown in the figure below, scores would now have to differ by at least 4.5 points in order for us to distinguish between two people. CB’s and PR’s scores are different, but RM’s score is uninterpretable.

Tests or sub-scales with alphas in the .75 range are considered suitable for research purposes. Yet, sad to say, schools and businesses now use tests with scales or sub-scales that have Alphas in or below the .75 range, treating these scores as if they provide useful information, when in most cases the scores—like RM’s—are uninterpretable.

estimating confidence when alpha is .75

If your current test providers are not reporting true score ranges (confidence intervals), ask for them. If they only provide Alphas (reliability statistics) you can use the table and figures in this article to calculate true score ranges for yourself. If you don’t want to do the math, no problem. You can use the figures above to get a feel for how precise a score is.

Statistical reliability is only one of the ways in which assessments should be evaluated. Test developers should also ask how well an assessment measures what it is intended to measure. And those who use an assessment should ask whether or not what it measures is relevant or important. I’ll be sharing some tricks for looking at these forms of validity in future articles.

Related Articles

Statistics for all: What the heck is confidence?


*This range will be wider at the top and bottom of the scoring range and a bit narrower in the middle of the range.

**It doesn’t tell us if emotional intelligence is important. That is determined in other ways.


References

Guilford J. P. (1965). Fundamental statistics in psychology and education. 4th Edn. New York: McGraw-Hill.

Kubiszyn T., Borich G. (1993). Educational testing and measurement. New York: Harper Collins.

Wright B. D. (1996). Reliability and separation. Rasch Measurement Transactions, 9, 472.

 

Please follow and like us:

Dear Sir Ken Robinson

 

This morning, I received a newsletter from Sir Ken Robinson, a popular motivational speaker who focuses on education. There was a return email address, so I wrote to him. Here's what I wrote:

Dear Sir Ken,

"I love your message. I'm one of the worker bees who's trying to leverage the kind of changes you envision.

After 20+ years of hard work, my colleagues and I have reinvented educational assessment. No multiple choice. No high stakes. Our focus is on assessment for learning—supporting students in learning joyfully and deeply in a way that facilitates skills for learning, thinking, inquiring, relating and otherwise navigating a complex world. Our assessments are scalable and standardized, but they do not homogenize. They are grounded in a deep study of the many pathways through which students learn key skills and concepts. We're documenting, in exquisite (some would say insane) detail, how concepts and skills develop over time so we can gain insight into learners' knowledge networks. We don't ask about correctness. We ask about understanding and competence and how they develop over time. And we help teachers meet students "where they're at."

We've accumulated a strong base of evidence to support these claims. But now that we're ready to scale, we're running up against hostility toward all standardized assessment. It's difficult to get to the point where we can even have a conversation with our pedagogical allies. Ouch!

Lectica is organized as a nonprofit so we can guarantee that the underprivileged are served first. We plan to offer subscriptions to our assessments (learning tools) without charge to individual teachers everywhere. 

We've kept our heads down as we've developed our methods and technology. Now we're scaling and want to be seen. We know we're part of the solution to today's educational crisis—perhaps a very big part of the solution. I'm hoping you'd like to learn more."

My email was returned with this message: "The email account that you tried to reach does not exist." How frustrating.

So, I thought I'd pen this post and ask my friends and colleagues to help me get access to Sir Ken's ear. If you know him, please forward this message. I'm certain he'll be interested in what we're doing for learning and development. Where are you Sir Ken Robinson? Can you hear me? Are you out there? 

Please follow and like us:

From Piaget to Dawson: The evolution of adult developmental metrics

I've just added a new video about the evolution of adult developmental metrics to YouTube and LecticaLive. It traces the evolutionary history of Lectica's developmental model and metric.

If you are curious about the origins of our work, this video is a great place to start. If you'd like to see the reference list for this video, view it on LecticaLive.

 

 

Please follow and like us:

Learning how to learn or learning how to pass tests?

how to learnI've been auditing a very popular 4.5 star Coursera course called "Learning how to learn." It uses all of the latest research to help people improve their learning skills. Yet, even though the lectures in the course are interesting and the research behind the course appears to be sound, I find it difficult to agree that it is a course that helps people learn how to learn.

First, the tests used to determine how well participants have built the learning skills described in this course are actually tests of how well they have learned vocabulary and definitions. As far as I can tell, no skills are involved other than the ability to recall course content. This is problematic. The assumption that learning vocabulary and definitions builds skill is unwarranted. I believe we all know this. Who has not had the experience of learning something well enough to pass a test only to forget most of what they had learned shortly thereafter?

Second, the content in tests at the end of the videos aren't particularly relevant to the stated intention of the course. These tests require remembering (or scrolling back to) facts like "Many new synapses are formed on dendrites." We do not need to learn this to become effective learners. The test item for which this is the correct answer is focused on an aspect of how learning works rather than how to learn. And although understanding how learning works might be a step toward learning how to learn, answering this question correctly doesn't tell us how the participant understands anything at all.

Third, if the course developers had used tests of skill—tests that asked participants to show off how effectively they could apply described techniques, we would be able to ask about the extent to which the course helps participants learn how to learn. Instead, the only way we have to evaluate the effectiveness of the course is through participant ratings and comments—how much people like it. I'm not suggesting that liking a course is unimportant, but it's not a good way to evaluate its effectiveness.

Fourth, the course seems to be primarily concerned with fostering a kind of learning that helps people do better on tests of correctness. The underlying and unstated assumption seems to be that if you can do better on these tests, you have learned better. This assumption flies in the face of several decades of educational research, including our own [for example, 1, 2, 3]. Correctness is not adequate evidence of understanding or real-world skill. If we want to know how well people understand new knowledge, we must observe how they apply this knowledge in real-world contexts. If we want to evaluate their level of skill, we must observe how well they apply the skill in real-world contexts. In other words, a course in learning how to learn—especially a course in learning how to learn—should be building useable skills that have value beyond the act of passing a test of correctness.

Fifth, the research behind this course can help us understand how learning works. At Lectica, we've used the very same information as part of the basis for our learning model, VCoL+7. But instead of using this knowledge to support the status quo—an educational system that privileges correctness over understanding and skill—we're using it to build learning tools designed to ensure that learning in school goes beyond correctness to build deep understanding and robust skill.

For the vast majority of people, schooling is not an end in itself. It is preparation for life—preparation with tomorrow's skills. It's time we held our educational institutions accountable for ensuring that students know how to learn more than correct answers. Wherever their lives take them, they will do better if equipped with understanding and skill. Correctness is not enough.

 


[1] FairTest; Mulholland, Quinn  (2015). The case against standardized testing. Harvard Political Review, May 14.

[2] Schwartz, M. S., Sadler, P. M., Sonnert, G. & Tai, R. H. (2009). Depth versus breadth: How content coverage in high school science courses relates to later success in college science coursework. Science Education, 93, 5, 798-826.

[3] Kontra, C., Goldin-Meadow, S., & Beilock, S. L. (2012). Embodied learning across the lifespan. Topics in Cognitive Science, 4, 4, 731–739.

 

Please follow and like us:

Lectica’s story: long, rewarding, & still unfolding


Lectica's story started in Toronto in 1976…

Identifying the problem

During the 70s and 80s I practiced midwifery. It was a great honor to be present at the births of over 500 babies, and in many cases, follow them into childhood. Every single one of those babies was a joyful, driven, and effective "every moment" learner. Regardless of difficulty and pain they all learned to walk, talk, interact with others, and manipulate many aspects of their environment. They needed few external rewards to build these skills—the excitement and suspense of striving seemed to be reward enough. I felt like I was observing the "life force" in action.

Unfortunately as many of these children approached the third grade (age 8), I noticed something else—something deeply troubling. Many of the same children seemed to have lost much of this intrinsic drive to learn. For them, learning had become a chore motivated primarily by extrinsic rewards and punishments. Because this was happening primarily to children attending conventional schools (Children receiving alternative instruction seemed to be exempt.) it appeared that something about schooling was depriving many children of the fundamental human drive required to support a lifetime of learning and development—a drive that looked to me like a key source of happiness and fulfillment.

Understanding the problem

Following upon my midwifery career, I flirted briefly with a career in advertising, but by the early 90's I was back in school—in a Ph.D. program in U. C. Berkeley's Graduate School of Education—where I found myself observing the same pattern I'd observed as a midwife. Both the research and my own lab experience exposed the early loss of students' natural love of learning. My concern was only increased by the newly emerging trend toward high stakes multiple choice testing, which my colleagues and I saw as a further threat to children's natural drive to learn.

Most of the people I've spoken to about this problem have agreed that it's a shame, but few have seen it as a problem that can be solved, and many have seen it as an inevitable consequence of either mass schooling or simple maturation. But I knew it was not inevitable. Children and those educated in a range of alternative environments did not appear to lose their drive to learn. Additionally, above average students in conventional schools appeared to be more likely to retain their love of learning.

I set out to find out why—and ended up on a long journey toward a solution.

How learning works

First, I needed to understand how learning works. At Berkeley, I studied a wide variety of learning theories in several disciplines, including developmental theories, behavioral theories, and brain-based theories. I collected a large database of longitudinal interviews and submitted them to in-depth analysis, looked closely at the relation between testing and learning, and studied psychological measurement, all in the interest of finding a way to support childrens' growth while reinforcing their love of learning.

My dissertation—which won awards from both U.C. Berkeley and the American Psychological Association—focused on the development of people's conceptions of learning from age 5 through 85, and how this kind of knowledge could be used to measure and support learning. In 1998, I received $500,000 from the Spencer Foundation to further develop the methods designed for this research. Some of my areas of expertise are human learning and development, psychometrics, metacognition, moral education, and research methods.

In the simplest possible terms, what I learned in 5 years of graduate school is that the human brain is designed to drive learning, and that preserving that natural drive requires 5 ingredients:

  1. a safe environment that is rich in learning opportunities and healthy human interaction,
  2. a teacher who understands each child's interests and level of tolerance for failure,
  3. a mechanism for determining "what comes next"—what is just challenging enough to allow for success most of the time (but not all of the time),
  4. instant actionable feedback, and 
  5. the opportunity to integrate new knowledge or skills into each learner's existing knowledge network well enough to make it useable before pushing instruction to the next level. (We call this building a "robust knowledge network"—the essential foundation for future learning.)*

Identifying the solution

Once we understood what learning should look like, we needed to decide where to intervene. The answer, when it came, was a complete surprise. Understanding what comes next—something that can only be learned by measuring what a student understands now—was an integral part of the recipe for learning. This meant that testing—which we originally saw as an obstacle to robust learning—was actually the solution—but only if we could build tests that would free students to learn the way their brains are designed to learn. These tests would have to help teachers determine "what comes next" (ingredient 3) and provide instant actionable feedback (ingredient 4), while rewarding them for helping students build robust knowledge networks (ingredient 5).

Unfortunately, conventional standardized tests were focused on "correctness" rather than robust learning, and none of them were based on the study of how targeted concepts and skills develop over time. Moreover, they were designed not to support learning, but rather to make decisions about advancement or placement, based on how many correct answers students were able to provide relative to other students. Because this form of testing did not meet the requirements of our learning recipe, we'd have to start from scratch.

Developing the solution

We knew that our solution—reinventing educational testing to serve robust learning—would require many years of research. In fact, we would be committing to possible decades of effort without a guaranteed result. It was the vision of a future educational system in which all children retained their inborn drive for learning that ultimately compelled us to move forward. 

To reinvent educational testing, we needed to:

  1. make a deep study of precisely how children build particular knowledge and skills over time in a wide range of subject areas (so these tests could accurately identify "what comes next");
  2. make tests that determine how deeply students understand what they have learned—how well they can use it to address real-world issues or problems (requires that students show how they are thinking, not just what they know—which means written responses with explanations); and
  3. produce formative feedback and resources designed to foster "robust learning" (build robust knowledge networks).

Here's what we had to invent:

  1. A learning ruler (building on Commons [1998] and Fischer [2006]);
  2. A method for studying how students learn tested concepts and skills (refining the methods developed for my dissertation);
  3. A human scoring system for determining the level of understanding exhibited in students' written explanations (building upon Commons' and Fischer's methods, refining them until measurements were precise enough for use in educational contexts); and 
  4. An electronic scoring system, so feedback and resources could be delivered in real time.

It took over 20 years (1996–2016), but we did it! And while we were doing it, we conducted research. In fact, our assessments have been used in dozens of research projects, including a 25 million dollar study of literacy conducted at Harvard, and numerous Ph.D. dissertations—with more on the way.

What we've learned

We've learned many things from this research. Here are some that took us by surprise:

  1. Students in schools that focus on building deep understanding graduate seniors that are up to 5 years ahead (on our learning ruler) of students in schools that focus on correctness (2.5 to 3 years after taking socioeconomic status into account).
  2. Students in schools that foster robust learning develop faster and continue to develop longer (into adulthood) than students in schools that focus on correctness.
  3. On average, students in schools that foster robust learning produce more coherent and persuasive arguments than students in schools that focus on correctness.
  4. On average, students in our inner-city schools, which are the schools most focused on correctness, stop developing (on our learning ruler) in grade 10. 
  5. The average student who graduates from a school that strongly focuses on correctness is likely, in adulthood, to (1) be unable to grasp the complexity and ambiguity of many common situations and problems, (2) lack the mental agility to adapt to changes in society and the workplace, and (3) dislike learning. 

From our perspective, these results point to an educational crisis that can best be addressed by allowing students to learn as their brains were designed to learn. Practically speaking, this means providing learners, parents, teachers, and schools with metrics that reward and support teaching that fosters robust learning. 

Where we are today

Lectica has created the only metrics that meet all of these requirements. Our mission is to foster greater individual happiness and fulfillment while preparing students to meet 21st century challenges. We do this by creating and delivering learning tools that encourage students to learn the way their brains were designed to learn. And we ensure that students who need our learning tools the most get them first by providing free subscriptions to individual teachers everywhere.

To realize our mission, we organized as a nonprofit. We knew this choice would slow our progress (relative to organizing as a for-profit and welcoming investors), but it was the only way to guarantee that our true mission would not be derailed by other interests.

Thus far, we've funded ourselves with work in the for-profit sector and income from grants. Our background research is rich, our methods are well-established, and our technology works even better than we thought it would. Last fall, we completed a demonstration of our electronic scoring system, CLAS, a novel technology that learns from every single assessment taken in our system. 

The groundwork has been laid, and we're ready to scale. All we need is the platform that will deliver the assessments (called DiscoTests), several of which are already in production.

After 20 years of high stakes testing, students and teachers need our solution more than ever. We feel compelled to scale a quickly as possible, so we can begin the process of reinvigorating today's students' natural love of learning, and ensure that the next generation of students never loses theirs. Lectica's story isn't finished. Instead, we find ourselves on the cusp of a new beginning! 

Please consider making a donation today.

 


A final note: There are many benefits associated with our approach to assessment that were not mentioned here. For example, because the assessment scores are all calibrated to the same learning ruler, students, teachers, and parents can easily track student growth. Even better, our assessments are designed to be taken frequently and to be embedded in low-stakes contexts. For grading purposes, teachers are encouraged to focus on growth over time rather than specific test scores. This way of using assessments pretty much eliminates concerns about cheating. And finally, the electronic scoring system we developed is backed by the world's first "taxonomy of learning," which also serves many other educational and research functions. It's already spawned a developmentally sensitive spell-checker! One day, this taxonomy of learning will be robust enough to empower teachers to create their own formative assessments on the fly. 

 


*This is the ingredient that's missing from current adaptive learning technologies.

 

Please follow and like us:

Adaptive learning. Are we there yet?

Adaptive learning technologies are touted as an advance in education and a harbinger of what's to come. But although we at Lectica agree that adaptive learning has a great deal to offer, we have some concerns about its current limitations. In an earlier article, I raised the question of how well one of these platforms, Knewton, serves "robust learning"—the kind of learning that leads to deep understanding and usable knowledge. Here are some more general observations.

The great strength of adaptive learning technologies is that they allow students to learn at their own pace. That's big. It's quite enough to be excited about, even if it changes nothing else about how people learn. But in our excitement about this advance, the educational community is in danger of ignoring important shortcomings of these technologies.

First, adaptive learning technologies are built on adaptive testing technologies. Today, these testing technologies are focused on "correctness." Students are moved to the next level of difficulty based on their ability to get correct answers. This is what today's testing technologies measure best. However, although being able to produce or select correct answers is important, it is not an adequate indication of understanding. And without real understanding, knowledge is not usable and can't be built upon effectively over the long term.

Second, today's adaptive learning technologies are focused on a narrow range of content—the kind of content psychometricians know how to build tests for—mostly math and science (with an awkward nod to literacy). In public education during the last 20 years, we've experienced a gradual narrowing of the curriculum, largely because of high stakes testing and its narrow focus. Today's adaptive learning technologies suffer from the same limitations and are likely to reinforce this trend.

Third, the success of adaptive learning technologies is measured with standardized tests of correctness. Higher scores will help more students get into college—after all, colleges use these tests to decide who will be admitted. But we have no idea how well higher scores on these tests translate into life success. Efforts to demonstrate the relevance of educational practices are few and far between. And notably, there are many examples of highly successful individuals who were poor players in the education game—including several of the worlds' most productive and influential people.

Fourth, some proponents of online adaptive learning believe that it can and should replace (or marginalize) teachers and classrooms. This is concerning. Education is more than a process of accumulating facts. For one thing, it plays an enormous role in socialization. Good teachers and classrooms offer students opportunities to build knowledge while learning how to engage and work with diverse others. Great teachers catalyze optimal learning and engagement by leveraging students' interests, knowledge, skills, and dispositions. They also encourage students to put what they're learning to work in everyday life—both on their own and in collaboration with others.

Lectica has a strong interest in adaptive learning and the technologies that deliver it. We anticipate that over the next few years, our assessment technology will be integrated into adaptive learning platforms to help expand their subject matter and ensure that students are building robust, usable knowledge. We will also be working hard to ensure that these platforms are part of a well-thought out, evidence-based approach to education—one that fosters the development of tomorrow's skills—the full range of skills and knowledge required for success in a complex and rapidly changing world.

Please follow and like us:

Proficiency vs. growth

We've been hearing quite a bit about the "proficiency vs. growth" debate since Betsy DeVos (Trump's candidate for Education Secretary) was asked to weigh in last week. This debate involves a disagreement about how high stakes tests should be used to evaluate educational programs. Advocates for proficiency want to reward schools when their students score higher on state tests. Advocates for growth want to reward schools when their students grow more on state tests. Readers who know about Lectica's work can guess where we'd land in this debate—we're outspokenly growth-minded. 

For us, however, the proficiency vs. growth debate is only a tiny piece of a broader issue about what counts as learning. Here's a sketch of the situation as we see it:

Getting a higher score on a state test means that you can get more correct answers on increasingly difficult questions, or that you can more accurately apply writing conventions or decode texts. But these aren't the things we really want to measure. They're "proxies"—approximations of our real learning objectives. Test developers measure proxies because they don't know how to measure what we really want to know.

What we really want to know is how well we're preparing students with the skills and knowledge they'll need to successfully navigate life and work.

Scores on conventional tests predict how well students are likely to perform, in the future, on conventional tests. But scores on these tests have not been shown to be good predictors of success in life.*  

In light of this glaring problem with conventional tests, the debate between proficiency and growth is a bit of a red herring. What we really need to be asking ourselves is a far more fundamental question:

What knowledge and skills will our children need to navigate the world of tomorrow, and how can we best nurture their development?

That's the question that frames our work here at Lectica.

 

*For information about the many problems with conventional tests, see FairTest.

 

Please follow and like us:

Support from neuroscience for robust, embodied learning

Human connector, by jgmarcelino from Newcastle upon Tyne, UK, via Wikimedia Commons

Fluid intelligence Connectome

For many years, we've been arguing that learning is best viewed as a process of creating networks of connections. We've defined robust learning as a process of building knowledge networks that are so well connected they allow us to put knowledge to work in a wide range of contexts. And we've described embodied learninga way of learning that involves the whole person and is much more than the memorization of facts, terms, definitions, rules, or procedures.

New evidence from the neurosciences provides support for this way of thinking about learning. According to research recently published in Nature, people with more connected brains—specifically those with more connections across different parts of the brain—demonstrate greater intelligence than those with less connected brains—including better problem-solving skills. And this is only one of several research projects that report similar findings.

Lectica exists because we believe that if we really want to support robust, embodied learning, we need to measure it. Our assessments are the only standardized assessments that have been deliberately developed to measure and support this kind of learning. 

Please follow and like us:

Correctness, argumentation, and Lectical Level

How correctness, argumentation, and Lectical Level work together diagnostically

In a fully developed Lectical Assessment, we include separate measures of aspects of arguments such as mechanics (spelling, punctuation, and capitalization), coherence (logic and relevance), and persuasiveness (use of evidence, argument, & psychology to persuade). (We do not evaluate correctness, primarily because most existing assessments already concern themselves primarily with correctness.) When educators use Lectical Assessments, they use information about Lectical Level, mechanics, coherence, persuasiveness, and sometimes correctness to diagnose students' learning needs. Here are some examples:

Level of skill (low, average, high) relative to expectations

  Lectical Level Mechanics Coherence Persuasiveness Correctness
Case 1 high high low average high
Case 2 high high high low low
Case 3 low average low low high

Case 1

This student has relatively high Lectical, mechanics, and correctness scores, but their performance is low in coherence and the persuasiveness of their answers is average. Because lower coherence and persuasiveness scores suggest that a student has not yet fully integrated their new knowledge, this student is likely to benefit most from participating in activities that require them to apply their existing knowledge in relevant contexts (using VCoL).

Case 2

This student's scores, with the exception of their correctness score, are high relative to expectations. This students' knowledge appears to be well integrated, but the combination of average persuasiveness and low correctness suggests that there are gaps in their content knowledge relative to targeted content. Here, we would suggest filling in the missing content knowledge in a way that integrates it into this students' well-developed knowledge network.

Case 3

The scores received by this student are high for correctness, while they are average for mechanics, and low for Lectical Level, coherence, and persuasiveness. This pattern suggests that the student is memorizing content without integrating it effectively into his or her knowledge network and has been doing this for some time. This student is most likely to benefit from applying their existing content knowledge in personally relevant contexts (using VCoL) until their coherence, persuasiveness, and Lectical scores catch up with their correctness scores.

Please follow and like us:

Interpreting CLAS Demo reports

What the CLAS demo measures

The CLAS demo assessment (the LRJA) is a measure of the developmental level of people's reasoning about knowledge, evidence, deliberation, and conflict. People who score higher on this scale are able to work effectively with increasingly complex information and solve increasingly complex problems. 

CLAS is the name of our scoring system—the Computerized Lectical Assessment System. It measures the developmental level (hierarchical complexity) of responses on a scale called the Lectical Scale (also called the skill scale). 

It does not measure:

  • your use of particular vocabulary
  • writing mechanics (spelling, punctuation, capitalization)
  • coherence (quality of logic or argument)
  • relevance
  • correctness (measured by most standardized tests) 

These dimensions of performance are related to Lectical Level, but they are not the same thing. 

The reliability of the CLAS score

The Lectical Scores on CLAS demo assessments are awarded with our electronic scoring system, CLAS.

  • CLAS scores agree with human scores within 1/5 of a level about 90% of the time. That's the same level of agreement we expect between human raters. This level of agreement is more than acceptable for formative classroom use and program evaluation. It is not good enough for making high stakes decisions.
  • We don't recommend making high stakes decisions based on the results of any one assessment. Performance over time (growth trajectory) is much more reliable than an individual score.
  • CLAS is not as well calibrated above 11.5 as it is at lower levels. This is because there are fewer people in our database who perform at the highest levels. As our database grows, CLAS will get better at scoring those performances.

Benchmarks

You can find benchmarks for childhood and adulthood in our article, Lectical levels, roles, and educational level.

The figure below shows growth curves for four different kinds of K-12 schools in our database. If you want to see how an individual student's growth relates to this graph, we suggest taking at least three assessments over the course of a year or more. (The top performing school "Rainbow," is the Rainbow Community School, in North Carolina.)

 

Please follow and like us: