Dr. Howard Drossman—leadership in environmental education

For several years now, one of our heroes, professor Howard Drossman of Colorado College and the Catamount Center, has been working with Lectical Assessments and helping us build LESA, the Lectical Environmental Stewardship Assessment.

Dr. Drossman's areas of expertise include developmental pedagogy, environmental stewardship, and the development of reflective judgment. His teaching focuses on building knowledge, skill, and passion through deep study, hands-on experience, and reflection.

For example, Dr. Drossman and ACM (Associated Colleges of the Midwest) offered a 10-day faculty seminar on interdisciplinary learning called Contested Spaces. This physically and intellectually challenging expeditionary learning experience provided participants with multiple disciplinary perspectives on current issues of land stewardship in the Pikes Peak region of Colorado. 

A second, ongoing program is offered by Catamount Center and Colorado College is dedicated to inspiring the "next generation of ecological stewards." This program, called TREE (Teaching & Research in Environmental Education), is a 16-week, residential program for undergraduate students who have an interest in teaching and the environment. Program participants live and learn in community at the Catamount Mountain Campus, which is located in a montane forest outside of Woodland Park, Colorado. Through study and practice, they cultivate their own conceptions of environmental stewardship and respect for the natural world, while building skills for creating virtuous cycles of learning and useable knowledge in K-12 classrooms.

Dr. Drossman embeds Lectical Assessments in both of these programs, using them to customize instruction, support individual development, and measure program outcomes. He also is working closely with us on the development of the LESA, which is one of the first assessments we plan to bring online after our new platform, LecticaLive, has been completed. 

 

Correctness, argumentation, and Lectical Level

How correctness, argumentation, and Lectical Level work together diagnostically

In a fully developed Lectical Assessment, we include separate measures of aspects of arguments such as mechanics (spelling, punctuation, and capitalization), coherence (logic and relevance), and persuasiveness (use of evidence, argument, & psychology to persuade). (We do not evaluate correctness, primarily because most existing assessments already concern themselves primarily with correctness.) When educators use Lectical Assessments, they use information about Lectical Level, mechanics, coherence, persuasiveness, and sometimes correctness to diagnose students' learning needs. Here are some examples:

Level of skill (low, average, high) relative to expectations

  Lectical Level Mechanics Coherence Persuasiveness Correctness
Case 1 high high low average high
Case 2 high high high low low
Case 3 low average low low high

Case 1

This student has relatively high Lectical, mechanics, and correctness scores, but their performance is low in coherence and the persuasiveness of their answers is average. Because lower coherence and persuasiveness scores suggest that a student has not yet fully integrated their new knowledge, this student is likely to benefit most from participating in activities that require them to apply their existing knowledge in relevant contexts (using VCoL).

Case 2

This student's scores, with the exception of their correctness score, are high relative to expectations. This students' knowledge appears to be well integrated, but the combination of average persuasiveness and low correctness suggests that there are gaps in their content knowledge relative to targeted content. Here, we would suggest filling in the missing content knowledge in a way that integrates it into this students' well-developed knowledge network.

Case 3

The scores received by this student are high for correctness, while they are average for mechanics, and low for Lectical Level, coherence, and persuasiveness. This pattern suggests that the student is memorizing content without integrating it effectively into his or her knowledge network and has been doing this for some time. This student is most likely to benefit from applying their existing content knowledge in personally relevant contexts (using VCoL) until their coherence, persuasiveness, and Lectical scores catch up with their correctness scores.

Interpreting CLAS Demo reports

What the CLAS demo measures

The CLAS demo assessment (the LRJA) is a measure of the developmental level of people's reasoning about knowledge, evidence, deliberation, and conflict. People who score higher on this scale are able to work effectively with increasingly complex information and solve increasingly complex problems. 

CLAS is the name of our scoring system—the Computerized Lectical Assessment System. It measures the developmental level (hierarchical complexity) of responses on a scale called the Lectical Scale (also called the skill scale). 

It does not measure:

  • your use of particular vocabulary
  • writing mechanics (spelling, punctuation, capitalization)
  • coherence (quality of logic or argument)
  • relevance
  • correctness (measured by most standardized tests) 

These dimensions of performance are related to Lectical Level, but they are not the same thing. 

The reliability of the CLAS score

The Lectical Scores on CLAS demo assessments are awarded with our electronic scoring system, CLAS.

  • CLAS scores agree with human scores within 1/5 of a level about 90% of the time. That's the same level of agreement we expect between human raters. This level of agreement is more than acceptable for formative classroom use and program evaluation. It is not good enough for making high stakes decisions.
  • We don't recommend making high stakes decisions based on the results of any one assessment. Performance over time (growth trajectory) is much more reliable than an individual score.
  • CLAS is not as well calibrated above 11.5 as it is at lower levels. This is because there are fewer people in our database who perform at the highest levels. As our database grows, CLAS will get better at scoring those performances.

Benchmarks

You can find benchmarks for childhood and adulthood in our article, Lectical levels, roles, and educational level.

The figure below shows growth curves for four different kinds of K-12 schools in our database. If you want to see how an individual student's growth relates to this graph, we suggest taking at least three assessments over the course of a year or more. (The top performing school "Rainbow," is the Rainbow Community School, in North Carolina.)

 

Adaptive learning, big data, and the meaning of learning

Knewton defines adaptive learning as "A teaching method premised on the idea that the curriculum should adapt to each user." In a recent blog post, Knewton's COO, David Liu, expanded on this definition. Here are some extracts:

You have to understand and have real data on content… Is the instructional content teaching what it was intended to teach? Is the assessment accurate in terms of what it’s supposed to assess? Can you calibrate that content at scale so you’re putting the right thing in front of a student, once you understand the state of that student? 

On the other side of the equation, you really have to understand student proficiency… understanding and being able to predict how that student is going to perform, based upon what they’ve done and based upon that content that I talked about before. And if you understand how well the student is performing against that piece of content, then you can actually begin to understand what that student needs to be able to move forward.

The idea of putting the right thing in front of a students is very cool. That's part of what we do here at Lectica. But what does Knewton mean by learning?

Curiosity got the better of me, so I set out to do some investigating. 

What does Knewton mean by learning?

In Knewton's white paper on adaptive learning the authors do a great job describing how their technology works. 

To provide continuously adaptive learning, Knewton analyzes learning materials based on thousands of data points — including concepts, structure, difficulty level, and media format — and uses sophisticated algorithms to piece together the perfect bundle of content for each student, constantly. The system refines recommendations through network effects that harness the power of all the data collected for all students to optimize learning for each individual student.

They go on to discuss several impressive technological innovations. I have to admit, the technology is cool, but what is their learning model and how is Knewton's technology being used to improve learning and teaching?

Unfortunately, Knewton does not seem to operate with a clearly articulated learning model in mind. In any case, I couldn't find one. But based on the sample items and feedback examples shown in their white paper and on their site, what Knewton means by learning is the ability to consistently get right answers on tests and quizzes, and the way to learn (get more answers right) is to get more practice on the kind of items students are not yet consistently getting right.

In fact, Knewton appears to be a high tech application of the content-focused learning model that's dominated public education since No Child Left Behind—another example of what it looks like when we throw technology at a problem without engaging in a deep enough analysis of that problem.

We're in the middle of an education crisis, but it's not because children aren't getting enough answers right on tests and quizzes. It's because our efforts to improve education consistently fail to ask the most important questions, "Why do we educate our children?" and "What are the outcomes that would be genuine evidence of success?"

Don't get me wrong. We love technology, and we leverage it shamelessly. But we don't believe technology is the answer. The answer lies in a deep understanding of how learning works and what we need to do to support the kind of learning that produces outcomes we really care about. 

 

A new kind of report card

report_card_oldWhen I was a kid, the main way school performance was measured was with letter grades. We got letter grades on almost all of our work. Getting an A meant you knew it all, a B meant you didn't quite know it all, C meant you knew enough to pass, D meant you knew so little you were on the verge of faiing, and F meant you failed. If you always got As you were one of the really smart kids, and if you always got Ds and Fs you were one of the dumb kids. Unfortunately, that's how we thought about it, plain and simple. 

If I got a B, my teacher and parents told me I could do better and that I should work harder. If I got a C, I was in deep trouble, and was put on restriction until I brought my grade up. This meant more hours of homework. I suspect this was a common experience. It was certainly what happened on Father Knows Best and The Brady Bunch.

The best teachers also commented on our work, telling us where we could improve our arguments or where and how we had erred, and suggesting actions we could take to improve. In terms of feedback, this was the gold standard. It was the only way we got any real guidance about what we, as individuals, needed to work on next. Letter grades represented rank, punishment, and reward, but they weren't very useful indicators of where we were in our growth as learners. Report cards were for parents. 

Usher in Lectica and DiscoTest

One of our goals here at Lectica has been to make possible a new kind of report card—one that:

  1. delivers scores that have rich meaning for students, parents, and decision-makers,
  2. provides the kind of personal feedback good teachers offer, and
  3. gives students an opportunity to watch themselves grow.

report_cardThis new report card—illustrated on the right—uses a single learning "ruler" for all subjects, so student growth in different subjects can be shown on the same scale. In the example shown here, each assessment is represented by a round button that links to an explanation of the student's learning edge at the time the assessment was taken. 

This new report card also enables direct comparisons between growth trajectories in different subject areas. 

An additional benefit of this new report card is that it delivers a rich portfolio-like account of student growth that can be employed to improve admissions and advancement decisions. 

And finally, we're very curious about the potential psychological benefits of allowing students to watch how they grow. We think it's going to be a powerful motivator.

 

Lectical (CLAS) scores are subject to change

feedback_loopWe incorporate feedback loops called virtuous cycles in everything we do. And I mean everything. Our governance structure is fundamentally iterative. (We're a Sociocracy.) Our project management approach is iterative. (We use Scrum.) We develop ideas iteratively. (We use Design Thinking.) We build our learning tools iteratively. (We use developmental maieutics.) And our learning model is iterative. (We use the virtuous cycle of learning.) One important reason for using all of these iterative processes is that we want every activity in our organization to reward learning. Conveniently, all of the virtuous cycles we iterate through do double duty as virtuous cycles of learning.

All of this virtuous cycling has an interesting (and unprecedented) side effect. The score you receive on one of our assessments is subject to change. Yes, because we learn from every single assessment taken in our system, what we learn could cause your score on any assessment you take here to change. Now, it's unlikley to change very much, probably not enough to affect the feedback you receive, but the fact that scores change from time to time can really shake people up. Some people might even think we've lost the plot!

But there is method in our madness. Allowing your score to fluctuate a bit as our knowledge base grows is our way of reminding everyone that there's uncertainty in any test score, and ourselves that there's always more to learn about how learning works. 

The dark? side of Lectical Assessment

Recently, members of our team at Lectica have been discussing potential misuses of Lectical Assessments, and exploring the possibility that they could harm some students. There are serious concerns that require careful consideration and discussion, and I urge readers to pitch in.

One of the potential problems we've discussed is the possiblilty that students will compare their scores with one another, and that students with lower scores will suffer from these comparisons. Here's my current take on this issue.

Students receive scores all the time. By third grade they already know their position in the class hierarchy, and live everyday with that reality. Moreover, despite the popular notion that all students can become above average if they work hard enough, average students don't often become above average students, which means that during their entire 12 years of schooling, they rarely receive top rewards (the best grades) for the hard work they do. In fact, they often feel like they're being punished even when they try their best. To make things worse, in our current system they're further punished by being forced to memorize content they haven't been prepared to understand, a problem that worsens year by year.

Lectica's approach to assessment can't prevent students from figuring out where their scores land in the class distribution, but we can give all students an opportunity to see themselves as successful learners, no matter where their scores are in that distribution. Average or below average students may still have to live with the reality that they grow at different rates than some of their peers, but they'll be rewarded for their efforts, just the same.

I've been told by some very good teachers that it is unacceptable to use the expression "average student." While I share the instinct to protect students from the harm that can come from labels, I don't share the belief that being an average student is a bad thing. Most of us were average students—or to be more precise, 68% of us were within one standard deviation of the mean. How did being a member of the majority become a bad thing?  And what harm are we doing to students by creating the illusion that we are all capable of performing above the mean?

I don't think we hurt children by serving up reality. We hurt them when we mislead them by telling them they can all be above average, or when we make them feel hopeless by insisting that they all learn at the same pace, then punishing them when they can't keep up.

I'm not saying it's not possible to raise the average. We do it by meeting the specific learning needs of every student and making sure that learning time is spent learning robustly. But we can't change the fact that there's a distribution. And we shouldn't pretend this is the case.

Lectical Assessments are tests, and are subject to the same abuses as other tests. But they have three attributes that help mitigate these abuses. First, they allow all students without severe disabilities to see themselves as learners. Second, they help teachers customize instruction to meet the needs of each student, so more kids have a chance to achieve their full potential. And finally, they reward good pedagogy—even in cases in which the assessments are being misused. After all, testing drives instruction.

Are our children learning robustly?

There are at least four reasons why people should learn robustly:

  1. It's fun!
  2. They'll learn more quickly.
  3. They'll keep growing longer.
  4. They'll be better prepared to participate fully in adult life.

Truly, there are no downsides to learning robustly. Yet robust learning is not what's happening for most students in most American schools. We have mounting—and disturbing—evidence that this is the case. 

The data in the figure below are from our database of reflective judgment assessments. These are open-response formative assessments of how well people think about and address thorny real world problems like bullying, television violence, dietary practices, and global warming. We've been delivering these assessments for several years now and have a diverse sample of over 20,000 completed assessments to learn from. 

We wanted to know how well schools are supporting development and what kind of role learning robustly might play in their performance. (Watch the video above to learn more about what counts as evidence of robust learning.) In particular, we wanted to know why students in one school—the Rainbow Community School—are outperforming students in other schools. (To learn about the Rainbow curriculum, click here.) 

We first looked at one of the key sources of evidence for robust learning—the quality of students' arguments. In the figure below, the Y axis represents the quality or "coherence" of students' arguments and the X axis represents their Lectical phase (or developmental phase, 1/4 of a Lectical Level). The highest coherence score students can receive is a 10.

In this figure, the Rainbow Community School is the clear leader, especially when it comes to students performing in lower phases, with inner-city (primarily low socioeconomic status) public schools at the low end, and more conventional private schools and high socioeconomic status public schools in the middle. So, how does this relate to student development? Since we regard coherence of argumentation as strong evidence of robust learning, and assert that robust learning is required to support optimal development, we would expect Rainbow students to develop more rapidly than students in schools with lower coherence scores.

Coherence by phase and school type

The figure below tells the story. When it comes to students' development on the Lectical Scale, Rainbow Community School students are way ahead of the pack. And our inner city schools are way behind. In fact, the average senior in our large (over 10,000 assessments) inner city sample is 5 years behind the projected score for the average senior in the Rainbow sample. Or in other words, inner city seniors, on average, are performing at the same level as Rainbow 7th graders.   

We know socioeconomic status is a factor that contributes to this gap, but shouldn't our schools be closing it rather than allowing it to grow larger? Take a look at the figure below. This figure assumes that students in the Rainbow Community School, on average, start out at about the same developmental level as students in private and high SES public schools, yet student growth is faster. In fact, the data project that Rainbow 9th graders would perform as well as seniors in the other schools. That's a 3-year advantage! We believe this difference is due to differences in instructional practices. What if we used these same practices in our inner city schools? If we could accelerate their learning as much as the Rainbow Community School has accelerated the learning of its students, inner-city students would be doing as well as private and high SES public schools!

Although socioeconomic status is a key factor, we think the differences seen here are at least partially due to fundamentally different ways of thinking about learning and teaching. Conventional schools tend to be primarily content focused. There is an emphasis on learning as remembering. The Rainbow Community School is skill focused. Its teachers use content as a vehicle for building core life skills, such as skills for learning, inquiry, evaluating information, making connections, communicating, conflict resolution, decision making, mindfulness, compassion, and building relationships. To build these skills students continuously engage in virtuous cycles of learning—cycles of information gathering, application, reflection, and goal setting—that exercise these skills while building robust connections between new and existing knowledge. Students not only learn content, they learn to use it effectively in their everyday lives. It becomes part of them. We call this embodied learning.

We're eager to study the impact of skill-focused curricula on the learning of less advantaged students. If you know of a school that's fostering robust learning AND serving disadvantaged students, we'd like to help them show off what they're accomplishing.

Note: Not only does Rainbow Community School ensure that its students are continuously engaged in VCoLs (virtuous cycles of learning), it uses a system of governance, Sociocracy, that supports virtuous cycling for everyone on staff as well as the continuous improvement of its curriculum. 

Appendix: Sample responses from 8th graders in different schools

Examples are taken from performances of students with average scores for their school. 

The question students answered: How is it possible that the two groups [pro and anti bullying] have such different ideas?

Rainbow Community School

It could be due to different experiences. Perhaps the ones going for the argument that a little bullying can be okay were disciplined more at home and have a tougher shell for things like this. [Parents] may base their initial ideas on their own experiences or their children's. It all really depends on the person and how they were raised.

High SES public School

This because they have different ideas and reasons for thinking what they believe and you can't change that. The parents are not the same and every one of them is different so they have a right to believe what they want to believe.

Low SES public school

Many people think different and many people look at things differently. So people get different ideas and opinions about things.

DiscoTests, the common core standards, and VCoL+7

virtuous cycle with icons 4 namesAccording to the authors of the Common Core Standards, "Students need the ability to gather, comprehend, evaluate, synthesize, and report on information and ideas" (page 41).

DiscoTests, like all Lectical Assessments, are more than just tests. They are diagnostic and formative educational tools that are designed to support the development of students' reasoning and learning skills. DiscoTests help students build reasoning skills by fostering virtuous cycles of learning (VCoLs)—simple four-step learning cycles that include (a) setting learning goals that are tailored to the needs of the individual learner, (b) acquiring and evaluating information, (c) applying knowledge, (d) reflecting on performance outcomes, and then (a) recalibrating goals for the next cycle.

DiscoTests, when used as intended, support the development of +7 learning skills. These are: 

  1. reflectivity—a cultivated habit of reflecting on outcomes, information, emotions, or events 
  2. mindfulness, emotion regulation, and self-monitoring
  3. skills for seeking and evaluating information, evidence, and perspectives
  4. skills for making connections between ideas, information, perspectives, and evidence
  5. skills for applying what we know in real-world contexts
  6. skills for seeking and making use of feedback
  7. awareness of cognitive and behavioral biases, and skills for avoiding them

VCoLs and the +7 skills are components of Lectica's learning model, VCoL+7

 


1National Governors Association Center for Best Practices and Council of Chief State School Officers (2010). Common Core State Standards (for English Language Arts and Literacy in History/Social Studies, Science, and Technical Subjects). Washington D.C.

Comparison of DiscoTests with conventional tests

DiscoTests and conventional standardized tests can be thought of as complementary. They are designed to test different kinds of skills, and research confirms that they are successful in doing so. Correlations between scores on the kind of developmental assessments made by DTS and scores on conventional multiple choice assessments is in the .40-.60 range. That means that somewhere between 16% to 36% of the kind of learning that is captured by conventional assessments is likely to overlap with the kind of learning that is captured by DiscoTests.

The table below provides a comparison of DiscoTests with conventional standardized tests on a number of dimensions.

Category
DiscoTests
Conventional tests
Theoretical foundation Cognitive developmental theory, Dynamic Skill Theory, Test theory Test theory
Scale Fischer’s Dynamic Skill Scale, an exhaustively researched general developmental scale, which is a member of a family of similar scales that were developed during the 20th century. Statistically generated scales, different for each test (though some tests are statistically linked)
Learning sequences Empirical, fine-grained & precise, calibrated to the dynamic skill scale Empirical, coarse-grained and general
Primary item type Open response More or less sophisticated forms of multiple choice
Targeted skills Reasoning with knowledge, knowledge application, making connections between new and existing knowledge, writing Content knowledge, procedural knowledge
Content Carefully selected “big ideas” and the concepts and skills associated with them. The full range of content specified in state standards for a given subject
Educative/formative Yes, (1) each DiscoTest focuses on ideas and skills central K-12 curricula, (2) test questions require students to thoughtfully apply new knowledge and connect it with their existing knowledge, (3) students receive reports with targeted feedback and learning suggestions, (4) teachers learn how student knowledge develops in general and on each targeted concept or skill. Not really, though increasingly claim to be
Embeddable in curricula Yes, DiscoTests are designed to be part of the curriculum. No
Standardized Yes, statistically, calibrated to the skill scale Yes, statistically only
Stakes Low. Selection decisions are based on performance patterns over time on many individual assessments. High. Selection decisions are often based on single assessments.
Ecological validity Direct tests that focus on deepening and connecting knowledge about key concepts and ideas, while developing broad skills that are required in adult life, such as those required for reasoning, communicating, and problem-solving. Tests of proxies, focus on ability to detect correct answers.
Statistical reliability .91+ for a single age cohort (distinguishes 5-6 distinct levels of performance). For high stakes tests, usually .95+ for a single age cohort (distinguishes 6-7 distinct levels of performance).