Interpreting CLAS Demo reports

What the CLAS demo measures

The CLAS demo assessment (the LRJA) is a measure of the developmental level of people's reasoning about knowledge, evidence, deliberation, and conflict. People who score higher on this scale are able to work effectively with increasingly complex information and solve increasingly complex problems. 

CLAS is the name of our scoring system—the Computerized Lectical Assessment System. It measures the developmental level (hierarchical complexity) of responses on a scale called the Lectical Scale (also called the skill scale). 

It does not measure:

  • your use of particular vocabulary
  • writing mechanics (spelling, punctuation, capitalization)
  • coherence (quality of logic or argument)
  • relevance
  • correctness (measured by most standardized tests) 

These dimensions of performance are related to Lectical Level, but they are not the same thing. 

The reliability of the CLAS score

The Lectical Scores on CLAS demo assessments are awarded with our electronic scoring system, CLAS.

  • CLAS scores agree with human scores within 1/5 of a level about 90% of the time. That's the same level of agreement we expect between human raters. This level of agreement is more than acceptable for formative classroom use and program evaluation. It is not good enough for making high stakes decisions.
  • We don't recommend making high stakes decisions based on the results of any one assessment. Performance over time (growth trajectory) is much more reliable than an individual score.
  • CLAS is not as well calibrated above 11.5 as it is at lower levels. This is because there are fewer people in our database who perform at the highest levels. As our database grows, CLAS will get better at scoring those performances.


You can find benchmarks for childhood and adulthood in our article, Lectical levels, roles, and educational level.

The figure below shows growth curves for four different kinds of K-12 schools in our database. If you want to see how an individual student's growth relates to this graph, we suggest taking at least three assessments over the course of a year or more. (The top performing school "Rainbow," is the Rainbow Community School, in North Carolina.)


Please follow and like us:

4 thoughts on “Interpreting CLAS Demo reports

  1. As a teacher of writing and critical thinking, and a published, qualitative researcher in writing in the disciplines, I found the questions in the activity impressive. However, the scoring and feedback seems to value certain kinds of answers more than others. I know my responses were well-reasoned and clearly explained with concrete details, but I focused my answers more on the politics of research funding, with briefer references to study design.  Although I did address all of the key issues in the feedback, I didn't necessarily use its terminology which comes directly from rhetoric and writing textbooks. I suspect the computers can only judge based on the language options they been progammed to look for.  Finally, you may want to have someone edit the "dilemma" to fix erros in subject-verb agreement.

    • Hi Mary,

      Thank you for your thoughtful remarks. They are much appreciated.

      CLAS scores are not based on the use of particular jargon or the way you frame your responses. Nor are they based on whether or not test-takers touch on certain themes. They are based on the complexity level of the performance. CLAS determines this by looking at the "developmental distribution" of hundreds of meanings in a given performance (generally 1.4 – 1.9 multiplied by the number of unique words in your response).

      You may have noticed that CLAS reported that it was uncertain of your score due to "too few unique words." This kind of flag leads to automatic human review.

      It's important to us that users understand that we do not claim to score the test-taker, just the performance. And we do not claim perfect accuracy. We aim for 85% agreement within .20 of a level, which is considered adequate for formative or research use. (In your case, human analysts agreed with CLAS within .08 of a level.)

      You are correct that CLAS is limited by what it has learned so far. This will always be true. All assessment is limited in this way. One difference between our approach and other approaches is that our methods and technology are built to support continuous improvement. Your performance has already been used to further educate CLAS.

      Thank you for pointing out the subject-verb snafu! We're still too poor to hire an editor, and owe a great deal to users who provide editorial feedback.

  2. The explanation above says that iCLAS measures the “complexity level of the performance”, by looking at the “developmental distribution” of hundreds of meanings in a given performance (generally 1.4 – 1.9 multiplied by the number of unique words in your response).” Could you clarify the method of determining the meanings of the words used in a response? Frequent words often have large inventories of meanings – and thus would seem to be highly conplex. For example, “take” has 20+ meanings. On the other hand, a less frequent word such as “justice” has a much smaller number of meanings – yet it is a highly complex concept. Where could I go to understand the basis of this measure?

Leave a Reply

Your email address will not be published. Required fields are marked *