For many years, we’ve been arguing that learning is best viewed as a process of creating networks of connections. We’ve defined robust learning as a process of building knowledge networks that are so well connected they allow us to put knowledge to work in a wide range of contexts. And we’ve described embodied learning—a way of learning that involves the whole person and is much more than the memorization of facts, terms, definitions, rules, or procedures.
New evidence from the neurosciences provides support for this way of thinking about learning. According to research recently published in Nature, people with more connected brains—specifically those with more connections across different parts of the brain—demonstrate greater intelligence than those with less connected brains—including better problem-solving skills. And this is only one of several research projects that report similar findings.
Lectica exists because we believe that if we really want to support robust, embodied learning, we need to measure it. Our assessments are the only standardized assessments that have been deliberately developed to measure and support this kind of learning.
An ideal educational assessment strategy—represented above in the assessment triangle—includes three indicators of learning—correctness (content knowledge), complexity (developmental level of understanding), and coherence (quality of argumentation). Lectical Assessments focus primarily on two areas of the triangle—complexity and coherence. Complexity is measured with the Lectical Assessment System, and coherence is measured with a set of argumentation rubrics focused on mechanics, logic, and persuasiveness. We do not focus on correctness, primarily because most assessments already target correctness.
At the center of the assessment triangle is a hazy area. This represents the Goldilocks Zone—the range in which the difficulty of learning tasks is just right for a particular student. To diagnose the Goldilocks Zone, educators evaluate correctness, coherence, and complexity, plus a given learner’s level of interest and tolerance for failure.
When educators work with Lectical Assessments, they use the assessment triangle to diagnose students’ learning needs. Here are some examples:
Level of skill (low, average, high) relative to expectations
This student has relatively high complexity and correctness scores, but his performance is low in coherence. Because lower coherence scores suggest that he has not yet fully integrated his existing knowledge, he is likely to benefit most from participating in interesting activities that require applying existing knowledge in relevant contexts (using VCoL).
This student’s scores are high relative to expectations. Her knowledge appears to be well integrated, but the low correctness suggests that there are gaps in her content knowledge relative to targeted content. Here, we would suggest filling in the missing content knowledge in a way that engages the learner and allows her to integrate it into her well-developed knowledge network.
The scores received by this student are high for correctness, while they are low for complexity and coherence. This pattern suggests that the student is memorizing content without integrating it effectively into his or her knowledge network—and may have been doing this for some time. This student is most likely to benefit from applying their existing content knowledge in personally relevant contexts (using VCoL) until their coherence and complexity scores catch up with their correctness scores.
The scores received by this student are high for correctness, complexity, and coherence. This pattern suggests that the student has a high level of proficiency. Here, we would suggest introducing new knowledge that’s just challenging enough to keep her in her personal Goldilocks zone.
The assessment triangle helps educators optimize learning by ensuring that students are always learning in the Goldilocks Zone. This is a good thing, because students who spend more time in the Goldilocks Zone not only enjoy learning more, they learn better and faster.
The CLAS demo assessment (the LRJA) is a measure of the developmental level of people's reasoning about knowledge, evidence, deliberation, and conflict. People who score higher on this scale are able to work effectively with increasingly complex information and solve increasingly complex problems.
CLAS is the name of our scoring system—the Computerized Lectical Assessment System. It measures the developmental level (hierarchical complexity) of responses on a scale called the Lectical Scale (also called the skill scale).
These dimensions of performance are related to Lectical Level, but they are not the same thing.
The reliability of the CLAS score
The Lectical Scores on CLAS demo assessments are awarded with our electronic scoring system, CLAS.
CLAS scores agree with human scores within 1/5 of a level about 90% of the time. That's the same level of agreement we expect between human raters. This level of agreement is more than acceptable for formative classroom use and program evaluation. It is not good enough for making high stakes decisions.
We don't recommend making high stakes decisions based on the results of any one assessment. Performance over time (growth trajectory) is much more reliable than an individual score.
CLAS is not as well calibrated above 11.5 as it is at lower levels. This is because there are fewer people in our database who perform at the highest levels. As our database grows, CLAS will get better at scoring those performances.
The figure below shows growth curves for four different kinds of K-12 schools in our database. If you want to see how an individual student's growth relates to this graph, we suggest taking at least three assessments over the course of a year or more. (The top performing school "Rainbow," is the Rainbow Community School, in North Carolina.)
Ten years ago, Kirschner, Sweller, & Clark published an article entitled, Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching.
In this article, Kirschner and his colleagues contrast outcomes for what they call "guidance instruction" (lecture and demonstration) with those from constructivism-based instruction. They conclude that constructivist approaches produce inferior outcomes.
The article suffers from at least three serious flaws.
First, the authors, in making their distinction between guided instruction and constructivist approaches, have created a caricature of constructivist approaches. Very few experienced practitioners of constructivist, discovery, problem-based, experiential, or inquiry-based teaching would characterize their approach as minimally guided. "Differently guided" would be a more appropriate term. Moreover, most educators who use constructivist approaches include lecture and demonstration where these are appropriate.
Second, the research reviewed by the authors was fundamentally flawed. For the most part, the metrics employed to evaluate different styles of instruction were not reasonable measures of the kind of learning constructivist instruction aims to support—deep understanding (the ability to apply knowledge effectively in real-world contexts). They were measures of memory or attitude. Back in 2010, Stein, Fisher, and I argued that metrics can't produce valid results if they don't actually measure what we care about (Redesigning testing: Operationalizing the new science of learning.) Why isn't this a no-brainer?
And finally, the longitudinal studies Kirschner and his colleagues reviewed had short time-spans. None of them examined the long-term impacts of different forms of instruction on deep understanding or long-term development. This is a big problem for learning research—one that is often acknowledged, but rarely addressed.
Since Kirschner's article was published in 2006, we've had an opportunity to examine the difference between schools that provide different kids of instruction, using assessments that measure the depth and coherence of students' understanding. We've documented a 3 to 5 year advantage, by grade 12, for students who attend schools that emphasize constructivist methods vs. those that use more "guidance instruction".
We have spent over 20 years developing our methods and the technology required to deliver our learning tools—known as Lectical™ Assessments or DiscoTests®—at scale. These assessments are backed by a large body of research, including ongoing investigations of their validity and reliability. Here are some links to research reports:
The following video provides an overview our research and mission:
In the fall of 2016, we introduced our first electronically scored Lectical Assessment—the LRJA (an assessment of reflective judgment/critical thinking). The LRJA can be used in research and program evaluation as a summative assessment, or in the classroom as a formative assessment—or both.
The best way to learn about the LRJA is to experience it first-hand at lecticalive. Just click on this link, then select the "go straight to the demo" button. On the next page, fill in the sign up form with the educational level of your choice. Click "submit", then, click on the "autofill" button (top right, under the header) to fill the responses form with an example.
If you're interested in working with the LRJA or would like to learn more about using Lectical Assessments to optimize thinking and learning, please contact us.
Knewton defines adaptive learning as "A teaching method premised on the idea that the curriculum should adapt to each user." In a recent blog post, Knewton's COO, David Liu, expanded on this definition. Here are some extracts:
You have to understand and have real data on content… Is the instructional content teaching what it was intended to teach? Is the assessment accurate in terms of what it’s supposed to assess? Can you calibrate that content at scale so you’re putting the right thing in front of a student, once you understand the state of that student?
On the other side of the equation, you really have to understand student proficiency… understanding and being able to predict how that student is going to perform, based upon what they’ve done and based upon that content that I talked about before. And if you understand how well the student is performing against that piece of content, then you can actually begin to understand what that student needs to be able to move forward.
The idea of putting the right thing in front of a students is very cool. That's part of what we do here at Lectica. But what does Knewton mean by learning?
Curiosity got the better of me, so I set out to do some investigating.
What does Knewton mean by learning?
In Knewton's white paper on adaptive learning the authors do a great job describing how their technology works.
To provide continuously adaptive learning, Knewton analyzes learning materials based on thousands of data points — including concepts, structure, difficulty level, and media format — and uses sophisticated algorithms to piece together the perfect bundle of content for each student, constantly. The system refines recommendations through network effects that harness the power of all the data collected for all students to optimize learning for each individual student.
They go on to discuss several impressive technological innovations. I have to admit, the technology is cool, but what is their learning model and how is Knewton's technology being used to improve learning and teaching?
Unfortunately, Knewton does not seem to operate with a clearly articulated learning model in mind. In any case, I couldn't find one. But based on the sample items and feedback examples shown in their white paper and on their site, what Knewton means by learning is the ability to consistently get right answers on tests and quizzes, and the way to learn (get more answers right) is to get more practice on the kind of items students are not yet consistently getting right.
In fact, Knewton appears to be a high tech application of the content-focused learning model that's dominated public education since No Child Left Behind—another example of what it looks like when we throw technology at a problem without engaging in a deep enough analysis of that problem.
We're in the middle of an education crisis, but it's not because children aren't getting enough answers right on tests and quizzes. It's because our efforts to improve education consistently fail to ask the most important questions, "Why do we educate our children?" and "What are the outcomes that would be genuine evidence of success?"
Don't get me wrong. We love technology, and we leverage it shamelessly. But we don't believe technology is the answer. The answer lies in a deep understanding of how learning works and what we need to do to support the kind of learning that produces outcomes we really care about.
In this post, I'll be describing and comparing three basic forms of assessment—surveys, tests of factual and procedural knowledge, and performative tests.
Surveys—measures of perception, preference, or opinion
What is a survey? A survey (a.k.a. inventory) is any assessment that asks the test-taker to choose from a set of options, such as "strongly agree" or "strongly disagree", based on opinion, preference, or perception. Surveys can be used by organizations in several ways. For example, opinion surveys can help maintain employee satisfaction by providing a "safe" way to express dissatisfaction before workplace problems have a chance to escalate.
Surveys have been used by organizations in a variety of ways. Just about everyone who's worked for a large organization has completed a personality inventory as part of a team-building exercise. The results stimulate lots of water cooler discussions about which "type" or "color" employees are, but their impact on employee performance is unclear. (Fair warning: I'm notorious for my discomfort with typologies!) Some personality inventories are even used in high stakes hiring and promotion decisions, a practice that continues despite evidence that they are very poor predictors of employee success .
Although most survey developers don't pretend their assessments measure competence, many do. The item on the left was used in a survey with the words "management skills" in it's title.
Claims that surveys measure competence are most common when "malleable traits"—traits that are subject to change, learning or growth—are targeted. One example of a malleable trait is "EQ" or "emotional intelligence". EQ is viewed as a skill that can be developed, and there are several surveys that purport to measure its development. What they actually measure is attitude.
Another example of surveys masquerading as assessments of skill is in the measurement of "transformational learning". Transformational learning is defined as a learning experience that fundamentally changes the way a person understands something, yet the only way it appears to be measured is with surveys. Transformational learning surveys measure people's perceptions of their learning experience, not how much they are actually changed by it.
The only survey-type assessments that can be said to measure something like skill are assessments—such as 360s—that ask people about their perceptions. Although 360s inadvertently measure other things, like how much a person is liked or whether or not a respondent agrees with that person, they may also document evidence of behavior change. If what you are interested in is behavior change, a 360 may be appropriate in some cases, but it's important to keep in mind that while a 360 may measure change in a target's behavior, it's also likely to measure change in a respondent's attitude that's unrelated to the target's behavior.
360-type assessments may, to some extent, serve as tests of competence, because behavior change may be an indication that someone has learned new skills. When an assessment measures something that might be an indicator of something else, it is said to measure a proxy. A good 360 may measure a proxy (perceptions of behavior) for a skill (competence).
There are literally hundreds of research articles that document the limitations of surveys, but I'll mention only one more of them here: All of the survey types I've discussed are vulnerable to "gaming"—smart people can easily figure out what the most desirable answers are.
Surveys are extremely popular today because, relative to assessments of skill, they are inexpensive to develop and cost almost nothing to administer. Lectica gives away several high quality surveys for free because they are so inexpensive, yet organizations spend millions of dollars every year on surveys, many of which are falsely marketed as assessments of skill or competence.
Tests of factual and procedural knowledge
A test of competence is any test that asks the test taker to demonstrate a skill. Tests of factual and procedural knowledge can legitimately be thought of as tests of competence.
The classic multiple choice test examines factual knowledge, procedural knowledge, and basic comprehension. If you want to know if someone knows the rules, which formulas to apply, the steps in a process, or the vocabulary of a field, a multiple choice test may meet your needs. Often, the developers of multiple choice tests claim that their assessments measure understanding, reasoning, or critical thinking. This is because some multiple choice tests measure skills that are assumed to be proxies for skills like understanding, reasoning, and critical thinking. They are not direct tests of these skills.
Multiple choice tests are widely used, because there is a large industry devoted to making them, but they are increasingly unpopular because of their (mis)use as high stakes assessments. They are often perceived as threatening and unfair because they are often used to rank or select people, and are not helpful to the individual learner. Moreover, their relevance is often brought into question because they don't directly measure what we really care about—the ability to apply knowledge and skills in real-life contexts.
Tests that ask people to directly demonstrate their skills in (1) the real world, (2) real-world simulations, or (3) as they are applied to real-world scenarios are called performative tests. These tests usually do not have "right" answers. Instead, they employ objective criteria to evaluate performances for the level of skill demonstrated, and often play a formative role by providing feedback designed to improve performance or understanding. This is the kind of assessment you want if what you care about is deep understanding, reasoning skills, or performance in real-world contexts.
Performative tests are the most difficult tests to make, but they are the gold standard if what you want to know is the level of competence a person is likely to demonstrate in real-world conditions—and if you're interested in supporting development. Standardized performative tests are not yet widely used, because the methods and technology required to develop them are relatively new, and there is not yet a large industry devoted to making them. But they are increasingly popular because they support learning.
Unfortunately, performative tests may initially be perceived as threatening because people's attitudes toward tests of knowledge and skill have been shaped by their exposure to high stakes multiple choice tests. The idea of testing for learning is taking hold, but changing the way people think about something as ubiquitous as testing is an ongoing challenge.
Lectical Assessments are performative tests—tests for learning. They are designed to support robust learning—the kind of learning that optimizes the growth of essential real-world skills. We're the leader of the pack when it comes to the sophistication of our methods and technology, our evidence base, and the sheer number of assessments we've developed.
 Frederick P. Morgeson, et al. (2007) Are we getting fooled again? Coming to terms with limitations in the use of personality tests for personnel selection, Personnel Psychology, 60, 1029-1033.
When I was a kid, the main way school performance was measured was with letter grades. We got letter grades on almost all of our work. Getting an A meant you knew it all, a B meant you didn't quite know it all, C meant you knew enough to pass, D meant you knew so little you were on the verge of faiing, and F meant you failed. If you always got As you were one of the really smart kids, and if you always got Ds and Fs you were one of the dumb kids. Unfortunately, that's how we thought about it, plain and simple.
If I got a B, my teacher and parents told me I could do better and that I should work harder. If I got a C, I was in deep trouble, and was put on restriction until I brought my grade up. This meant more hours of homework. I suspect this was a common experience. It was certainly what happened on Father Knows Best and The Brady Bunch.
The best teachers also commented on our work, telling us where we could improve our arguments or where and how we had erred, and suggesting actions we could take to improve. In terms of feedback, this was the gold standard. It was the only way we got any real guidance about what we, as individuals, needed to work on next. Letter grades represented rank, punishment, and reward, but they weren't very useful indicators of where we were in our growth as learners. Report cards were for parents.
Usher in Lectica and DiscoTest
One of our goals here at Lectica has been to make possible a new kind of report card—one that:
delivers scores that have rich meaning for students, parents, and decision-makers,
provides the kind of personal feedback good teachers offer, and
gives students an opportunity to watch themselves grow.
This new report card—illustrated on the right—uses a single learning "ruler" for all subjects, so student growth in different subjects can be shown on the same scale. In the example shown here, each assessment is represented by a round button that links to an explanation of the student's learning edge at the time the assessment was taken.
This new report card also enables direct comparisons between growth trajectories in different subject areas.
An additional benefit of this new report card is that it delivers a rich portfolio-like account of student growth that can be employed to improve admissions and advancement decisions.
And finally, we're very curious about the potential psychological benefits of allowing students to watch how they grow. We think it's going to be a powerful motivator.
Recently, members of our team at Lectica have been discussing potential misuses of Lectical Assessments, and exploring the possibility that they could harm some students. There are serious concerns that require careful consideration and discussion, and I urge readers to pitch in.
One of the potential problems we've discussed is the possiblilty that students will compare their scores with one another, and that students with lower scores will suffer from these comparisons. Here's my current take on this issue.
Students receive scores all the time. By third grade they already know their position in the class hierarchy, and live everyday with that reality. Moreover, despite the popular notion that all students can become above average if they work hard enough, average students don't often become above average students, which means that during their entire 12 years of schooling, they rarely receive top rewards (the best grades) for the hard work they do. In fact, they often feel like they're being punished even when they try their best. To make things worse, in our current system they're further punished by being forced to memorize content they haven't been prepared to understand, a problem that worsens year by year.
Lectica's approach to assessment can't prevent students from figuring out where their scores land in the class distribution, but we can give all students an opportunity to see themselves as successful learners, no matter where their scores are in that distribution. Average or below average students may still have to live with the reality that they grow at different rates than some of their peers, but they'll be rewarded for their efforts, just the same.
I've been told by some very good teachers that it is unacceptable to use the expression "average student." While I share the instinct to protect students from the harm that can come from labels, I don't share the belief that being an average student is a bad thing. Most of us were average students—or to be more precise, 68% of us were within one standard deviation of the mean. How did being a member of the majority become a bad thing? And what harm are we doing to students by creating the illusion that we are all capable of performing above the mean?
I don't think we hurt children by serving up reality. We hurt them when we mislead them by telling them they can all be above average, or when we make them feel hopeless by insisting that they all learn at the same pace, then punishing them when they can't keep up.
I'm not saying it's not possible to raise the average. We do it by meeting the specific learning needs of every student and making sure that learning time is spent learning robustly. But we can't change the fact that there's a distribution. And we shouldn't pretend this is the case.
Lectical Assessments are tests, and are subject to the same abuses as other tests. But they have three attributes that help mitigate these abuses. First, they allow all students without severe disabilities to see themselves as learners. Second, they help teachers customize instruction to meet the needs of each student, so more kids have a chance to achieve their full potential. And finally, they reward good pedagogy—even in cases in which the assessments are being misused. After all, testing drives instruction.
DiscoTests and conventional standardized tests can be thought of as complementary. They are designed to test different kinds of skills, and research confirms that they are successful in doing so. Correlations between scores on the kind of developmental assessments made by DTS and scores on conventional multiple choice assessments is in the .40-.60 range. That means that somewhere between 16% to 36% of the kind of learning that is captured by conventional assessments is likely to overlap with the kind of learning that is captured by DiscoTests.
The table below provides a comparison of DiscoTests with conventional standardized tests on a number of dimensions.
Cognitive developmental theory, Dynamic Skill Theory, Test theory
Fischer’s Dynamic Skill Scale, an exhaustively researched general developmental scale, which is a member of a family of similar scales that were developed during the 20th century.
Statistically generated scales, different for each test (though some tests are statistically linked)
Empirical, fine-grained & precise, calibrated to the dynamic skill scale
Empirical, coarse-grained and general
Primary item type
More or less sophisticated forms of multiple choice
Reasoning with knowledge, knowledge application, making connections between new and existing knowledge, writing
Content knowledge, procedural knowledge
Carefully selected “big ideas” and the concepts and skills associated with them.
The full range of content specified in state standards for a given subject
Yes, (1) each DiscoTest focuses on ideas and skills central K-12 curricula, (2) test questions require students to thoughtfully apply new knowledge and connect it with their existing knowledge, (3) students receive reports with targeted feedback and learning suggestions, (4) teachers learn how student knowledge develops in general and on each targeted concept or skill.
Not really, though increasingly claim to be
Embeddable in curricula
Yes, DiscoTests are designed to be part of the curriculum.
Yes, statistically, calibrated to the skill scale
Yes, statistically only
Low. Selection decisions are based on performance patterns over time on many individual assessments.
High. Selection decisions are often based on single assessments.
Direct tests that focus on deepening and connecting knowledge about key concepts and ideas, while developing broad skills that are required in adult life, such as those required for reasoning, communicating, and problem-solving.
Tests of proxies, focus on ability to detect correct answers.
.91+ for a single age cohort (distinguishes 5-6 distinct levels of performance).
For high stakes tests, usually .95+ for a single age cohort (distinguishes 6-7 distinct levels of performance).