Proficiency vs. growth

We've been hearing quite a bit about the "proficiency vs. growth" debate since Betsy DeVos (Trump's candidate for Education Secretary) was asked to weigh in last week. This debate involves a disagreement about how high stakes tests should be used to evaluate educational programs. Advocates for proficiency want to reward schools when their students score higher on state tests. Advocates for growth want to reward schools when their students grow more on state tests. Readers who know about Lectica's work can guess where we'd land in this debate—we're outspokenly growth-minded. 

For us, however, the proficiency vs. growth debate is only a tiny piece of a broader issue about what counts as learning. Here's a sketch of the situation as we see it:

Getting a higher score on a state test means that you can get more correct answers on increasingly difficult questions, or that you can more accurately apply writing conventions or decode texts. But these aren't the things we really want to measure. They're "proxies"—approximations of our real learning objectives. Test developers measure proxies because they don't know how to measure what we really want to know.

What we really want to know is how well we're preparing students with the skills and knowledge they'll need to successfully navigate life and work.

Scores on conventional tests predict how well students are likely to perform, in the future, on conventional tests. But scores on these tests have not been shown to be good predictors of success in life.*  

In light of this glaring problem with conventional tests, the debate between proficiency and growth is a bit of a red herring. What we really need to be asking ourselves is a far more fundamental question:

What knowledge and skills will our children need to navigate the world of tomorrow, and how can we best nurture their development?

That's the question that frames our work here at Lectica.

 

*For information about the many problems with conventional tests, see FairTest.

 

Support from neuroscience for robust, embodied learning

Human connector, by jgmarcelino from Newcastle upon Tyne, UK, via Wikimedia Commons

Fluid intelligence Connectome

For many years, we've been arguing that learning is best viewed as a process of creating networks of connections. We've defined robust learning as a process of building knowledge networks that are so well connected they allow us to put knowledge to work in a wide range of contexts. And we've described embodied learninga way of learning that involves the whole person and is much more than the memorization of facts, terms, definitions, rules, or procedures.

New evidence from the neurosciences provides support for this way of thinking about learning. According to research recently published in Nature, people with more connected brains—specifically those with more connections across different parts of the brain—demonstrate greater intelligence than those with less connected brains—including better problem-solving skills. And this is only one of several research projects that report similar findings.

Lectica exists because we believe that if we really want to support robust, embodied learning, we need to measure it. Our assessments are the only standardized assessments that have been deliberately developed to measure and support this kind of learning. 

How to waste students’ time

During the last 20 years, children in our public schools have been required to learn important concepts earlier and earlier. This is supposed to speed up learning. But we, at Lectica, are finding that when students try to learn complex ideas too early, they don’t seem to find those ideas useful.

For example, let's look at the terms reliable, credible, and valid, which refer to different aspects of information quality. These terms used to be taught in high school, but are now taught as early as grade 3. We looked at how these terms were used by over 15,000 students in grades 4-12. These students were asked to write about what they would need to know in order to trust information from someone making a claim like, "Violent television is bad for children."

As you can see in the following graph, until grade 10, fewer than 10% of these students used the terms at all—even though they were taught them by grade 5. What is more, our research shows that when these terms are used before Lectical Level 10 (see video about Lectical Levels, below), they mean little more than “correct” or “true”, and it's not until well into Lectical Level 10 that people use these terms in a way that clearly shows they have distinct meanings.

Children aren't likely to find the words reliable, valid, or credible useful until they understand why some information is better than other information. This means they need to understand concepts like motivation, bias, scientific method, and expertise. We can get 5th graders to remember that they should apply the word "valid" instead of "true" when presented with a specific stimulus, but this is not the same as understanding.

Reliable, valid, and credible aren't the only words taught in the early grades that students don't find useful. We have hundreds of examples in our database.

Learning in the zone

The pattern above is what we see when students are taught ideas they aren't yet prepared to understand. When children learn ideas they're ready for—ideas that are in "the zone"—the pattern looks very different. Under these conditions, the use of a new word quickly goes from zero to frequent (or even constant, as parents of 4-year-olds know only too well). If you're a parent you probably remember when your child first learned the words "why," "secret," or "favorite." Suddenly, questioning why, telling and keeping secrets, or having favorites became the focus of many conversations. Children "play hard" with ideas they're prepared to understand. This rapidly integrates these new ideas into their existing knowledge networks. But they can't do this with an idea they aren't ready for, because they don't yet have a knowledge network that's ready to receive it. 

 

The curve shown in the figure above shows what it would look like if these terms were taught when students were more prepared with knowledge networks that were ready to receive them. Acquisition would be relatively rapid, and students would find the terms more useful because they would be more likely to grasp aspects of their distinct meanings. For example, they might choose to use the term "reliable" rather than "factual" because they understand that these two terms mean different things.

If you're a parent, think about how many times your child is asked to learn something that isn’t yet useful. Consider the time invested, and ask yourself if that time was well spent.

 

 

Correctness, argumentation, and Lectical Level

How correctness, argumentation, and Lectical Level work together diagnostically

In a fully developed Lectical Assessment, we include separate measures of aspects of arguments such as mechanics (spelling, punctuation, and capitalization), coherence (logic and relevance), and persuasiveness (use of evidence, argument, & psychology to persuade). (We do not evaluate correctness, primarily because most existing assessments already concern themselves primarily with correctness.) When educators use Lectical Assessments, they use information about Lectical Level, mechanics, coherence, persuasiveness, and sometimes correctness to diagnose students' learning needs. Here are some examples:

Level of skill (low, average, high) relative to expectations

  Lectical Level Mechanics Coherence Persuasiveness Correctness
Case 1 high high low average high
Case 2 high high high low low
Case 3 low average low low high

Case 1

This student has relatively high Lectical, mechanics, and correctness scores, but their performance is low in coherence and the persuasiveness of their answers is average. Because lower coherence and persuasiveness scores suggest that a student has not yet fully integrated their new knowledge, this student is likely to benefit most from participating in activities that require them to apply their existing knowledge in relevant contexts (using VCoL).

Case 2

This student's scores, with the exception of their correctness score, are high relative to expectations. This students' knowledge appears to be well integrated, but the combination of average persuasiveness and low correctness suggests that there are gaps in their content knowledge relative to targeted content. Here, we would suggest filling in the missing content knowledge in a way that integrates it into this students' well-developed knowledge network.

Case 3

The scores received by this student are high for correctness, while they are average for mechanics, and low for Lectical Level, coherence, and persuasiveness. This pattern suggests that the student is memorizing content without integrating it effectively into his or her knowledge network and has been doing this for some time. This student is most likely to benefit from applying their existing content knowledge in personally relevant contexts (using VCoL) until their coherence, persuasiveness, and Lectical scores catch up with their correctness scores.

Interpreting CLAS Demo reports

What the CLAS demo measures

The CLAS demo assessment (the LRJA) is a measure of the developmental level of people's reasoning about knowledge, evidence, deliberation, and conflict. People who score higher on this scale are able to work effectively with increasingly complex information and solve increasingly complex problems. 

CLAS is the name of our scoring system—the Computerized Lectical Assessment System. It measures the developmental level (hierarchical complexity) of responses on a scale called the Lectical Scale (also called the skill scale). 

It does not measure:

  • your use of particular vocabulary
  • writing mechanics (spelling, punctuation, capitalization)
  • coherence (quality of logic or argument)
  • relevance
  • correctness (measured by most standardized tests) 

These dimensions of performance are related to Lectical Level, but they are not the same thing. 

The reliability of the CLAS score

The Lectical Scores on CLAS demo assessments are awarded with our electronic scoring system, CLAS.

  • CLAS scores agree with human scores within 1/5 of a level about 90% of the time. That's the same level of agreement we expect between human raters. This level of agreement is more than acceptable for formative classroom use and program evaluation. It is not good enough for making high stakes decisions.
  • We don't recommend making high stakes decisions based on the results of any one assessment. Performance over time (growth trajectory) is much more reliable than an individual score.
  • CLAS is not as well calibrated above 11.5 as it is at lower levels. This is because there are fewer people in our database who perform at the highest levels. As our database grows, CLAS will get better at scoring those performances.

Benchmarks

You can find benchmarks for childhood and adulthood in our article, Lectical levels, roles, and educational level.

The figure below shows growth curves for four different kinds of K-12 schools in our database. If you want to see how an individual student's growth relates to this graph, we suggest taking at least three assessments over the course of a year or more. (The top performing school "Rainbow," is the Rainbow Community School, in North Carolina.)

 

Straw men and flawed metrics

khan_constructivistTen years ago, Kirschner, Sweller, & Clark published an article entitled, Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching.

In this article, Kirschner and his colleagues contrast outcomes for what they call "guidance instruction" (lecture and demonstration) with those from constructivism-based instruction. They conclude that constructivist approaches produce inferior outcomes.

The article suffers from at least three serious flaws

First, the authors, in making their distinction between guided instruction and constructivist approaches, have created a caricature of constructivist approaches. Very few experienced practitioners of constructivist, discovery, problem-based, experiential, or inquiry-based teaching would characterize their approach as minimally guided. "Differently guided" would be a more appropriate term. Moreover, most educators who use constructivist approaches include lecture and demonstration where these are appropriate.

Second, the research reviewed by the authors was fundamentally flawed. For the most part, the metrics employed to evaluate different styles of instruction were not reasonable measures of the kind of learning constructivist instruction aims to support—deep understanding (the ability to apply knowledge effectively in real-world contexts). They were measures of memory or attitude. Back in 2010, Stein, Fisher, and I argued that metrics can't produce valid results if they don't actually measure what we care about  (Redesigning testing: Operationalizing the new science of learning. Why isn't this a no-brainer?

And finally, the longitudinal studies Kirschner and his colleagues reviewed had short time-spans. None of them examined the long-term impacts of different forms of instruction on deep understanding or long-term development. This is a big problem for learning research—one that is often acknowledged, but rarely addressed.

Since Kirschner's article was published in 2006, we've had an opportunity to examine the difference between schools that provide different kids of instruction, using assessments that measure the depth and coherence of students' understanding. We've documented a 3 to 5 year advantage, by grade 12, for students who attend schools that emphasize constructivist methods vs. those that use more "guidance instruction". 

To learn more, see:

Are our children learning robustly?

Lectica rationale

 

Lectica basics for schools

If you are a school leader, this post is for you. Here, you'll find information about Lectica, it's mission, and our first electronically scored Lectical Assessment—the LRJA.

Background

Lectica, Inc. is a 501(c)(3) charitable corporation. It's mission is to build and deliver learning tools that help students build skills for thinking and learning. These learning tools are backed by a strong learning model—the Virtuous Cycle of Learning (VCoL+7™)—and a comprehensive vision for educational testing and learning, which you can learn more about in our white paper—Virtuous cycles of learning: Redesigning testing during the digital revolution

We have spent over 20 years developing our methods and the technology required to deliver our learning tools—known as Lectical™ Assessments or DiscoTests®—at scale. These assessments are backed by a large body of research, including ongoing investigations of their validity and reliability. Here are some links to research reports:

The following video provides an overview our research and mission:

Current offerings

In the fall of 2016, we introduced our first electronically scored Lectical Assessment—the LRJA (an assessment of reflective judgment/critical thinking). The LRJA can be used in research and program evaluation as a summative assessment, or in the classroom as a formative assessment—or both.

The best way to learn about the LRJA is to experience it first-hand at lecticalive. Just click on this link, then select the "go straight to the demo" button. On the next page, fill in the sign up form with the educational level of your choice. Click "submit", then, click on the "autofill" button (top right, under the header) to fill the responses form with an example. 

If you're interested in working with the LRJA or would like to learn more about using Lectical Assessments to optimize thinking and learning, please contact us.

Adaptive learning, big data, and the meaning of learning

Knewton defines adaptive learning as "A teaching method premised on the idea that the curriculum should adapt to each user." In a recent blog post, Knewton's COO, David Liu, expanded on this definition. Here are some extracts:

You have to understand and have real data on content… Is the instructional content teaching what it was intended to teach? Is the assessment accurate in terms of what it’s supposed to assess? Can you calibrate that content at scale so you’re putting the right thing in front of a student, once you understand the state of that student? 

On the other side of the equation, you really have to understand student proficiency… understanding and being able to predict how that student is going to perform, based upon what they’ve done and based upon that content that I talked about before. And if you understand how well the student is performing against that piece of content, then you can actually begin to understand what that student needs to be able to move forward.

The idea of putting the right thing in front of a students is very cool. That's part of what we do here at Lectica. But what does Knewton mean by learning?

Curiosity got the better of me, so I set out to do some investigating. 

What does Knewton mean by learning?

In Knewton's white paper on adaptive learning the authors do a great job describing how their technology works. 

To provide continuously adaptive learning, Knewton analyzes learning materials based on thousands of data points — including concepts, structure, difficulty level, and media format — and uses sophisticated algorithms to piece together the perfect bundle of content for each student, constantly. The system refines recommendations through network effects that harness the power of all the data collected for all students to optimize learning for each individual student.

They go on to discuss several impressive technological innovations. I have to admit, the technology is cool, but what is their learning model and how is Knewton's technology being used to improve learning and teaching?

Unfortunately, Knewton does not seem to operate with a clearly articulated learning model in mind. In any case, I couldn't find one. But based on the sample items and feedback examples shown in their white paper and on their site, what Knewton means by learning is the ability to consistently get right answers on tests and quizzes, and the way to learn (get more answers right) is to get more practice on the kind of items students are not yet consistently getting right.

In fact, Knewton appears to be a high tech application of the content-focused learning model that's dominated public education since No Child Left Behind—another example of what it looks like when we throw technology at a problem without engaging in a deep enough analysis of that problem.

We're in the middle of an education crisis, but it's not because children aren't getting enough answers right on tests and quizzes. It's because our efforts to improve education consistently fail to ask the most important questions, "Why do we educate our children?" and "What are the outcomes that would be genuine evidence of success?"

Don't get me wrong. We love technology, and we leverage it shamelessly. But we don't believe technology is the answer. The answer lies in a deep understanding of how learning works and what we need to do to support the kind of learning that produces outcomes we really care about. 

 

Lectical (CLAS) scores are subject to change

feedback_loopWe incorporate feedback loops called virtuous cycles in everything we do. And I mean everything. Our governance structure is fundamentally iterative. (We're a Sociocracy.) Our project management approach is iterative. (We use Scrum.) We develop ideas iteratively. (We use Design Thinking.) We build our learning tools iteratively. (We use developmental maieutics.) And our learning model is iterative. (We use the virtuous cycle of learning.) One important reason for using all of these iterative processes is that we want every activity in our organization to reward learning. Conveniently, all of the virtuous cycles we iterate through do double duty as virtuous cycles of learning.

All of this virtuous cycling has an interesting (and unprecedented) side effect. The score you receive on one of our assessments is subject to change. Yes, because we learn from every single assessment taken in our system, what we learn could cause your score on any assessment you take here to change. Now, it's unlikley to change very much, probably not enough to affect the feedback you receive, but the fact that scores change from time to time can really shake people up. Some people might even think we've lost the plot!

But there is method in our madness. Allowing your score to fluctuate a bit as our knowledge base grows is our way of reminding everyone that there's uncertainty in any test score, and ourselves that there's always more to learn about how learning works. 

The dark? side of Lectical Assessment

Recently, members of our team at Lectica have been discussing potential misuses of Lectical Assessments, and exploring the possibility that they could harm some students. There are serious concerns that require careful consideration and discussion, and I urge readers to pitch in.

One of the potential problems we've discussed is the possiblilty that students will compare their scores with one another, and that students with lower scores will suffer from these comparisons. Here's my current take on this issue.

Students receive scores all the time. By third grade they already know their position in the class hierarchy, and live everyday with that reality. Moreover, despite the popular notion that all students can become above average if they work hard enough, average students don't often become above average students, which means that during their entire 12 years of schooling, they rarely receive top rewards (the best grades) for the hard work they do. In fact, they often feel like they're being punished even when they try their best. To make things worse, in our current system they're further punished by being forced to memorize content they haven't been prepared to understand, a problem that worsens year by year.

Lectica's approach to assessment can't prevent students from figuring out where their scores land in the class distribution, but we can give all students an opportunity to see themselves as successful learners, no matter where their scores are in that distribution. Average or below average students may still have to live with the reality that they grow at different rates than some of their peers, but they'll be rewarded for their efforts, just the same.

I've been told by some very good teachers that it is unacceptable to use the expression "average student." While I share the instinct to protect students from the harm that can come from labels, I don't share the belief that being an average student is a bad thing. Most of us were average students—or to be more precise, 68% of us were within one standard deviation of the mean. How did being a member of the majority become a bad thing?  And what harm are we doing to students by creating the illusion that we are all capable of performing above the mean?

I don't think we hurt children by serving up reality. We hurt them when we mislead them by telling them they can all be above average, or when we make them feel hopeless by insisting that they all learn at the same pace, then punishing them when they can't keep up.

I'm not saying it's not possible to raise the average. We do it by meeting the specific learning needs of every student and making sure that learning time is spent learning robustly. But we can't change the fact that there's a distribution. And we shouldn't pretend this is the case.

Lectical Assessments are tests, and are subject to the same abuses as other tests. But they have three attributes that help mitigate these abuses. First, they allow all students without severe disabilities to see themselves as learners. Second, they help teachers customize instruction to meet the needs of each student, so more kids have a chance to achieve their full potential. And finally, they reward good pedagogy—even in cases in which the assessments are being misused. After all, testing drives instruction.