Correctness, argumentation, and Lectical Level

How correctness, argumentation, and Lectical Level work together diagnostically

In a fully developed Lectical Assessment, we include separate measures of aspects of arguments such as mechanics (spelling, punctuation, and capitalization), coherence (logic and relevance), and persuasiveness (use of evidence, argument, & psychology to persuade). (We do not evaluate correctness, primarily because most existing assessments already concern themselves primarily with correctness.) When educators use Lectical Assessments, they use information about Lectical Level, mechanics, coherence, persuasiveness, and sometimes correctness to diagnose students' learning needs. Here are some examples:

Level of skill (low, average, high) relative to expectations

  Lectical Level Mechanics Coherence Persuasiveness Correctness
Case 1 high high low average high
Case 2 high high high low low
Case 3 low average low low high

Case 1

This student has relatively high Lectical, mechanics, and correctness scores, but their performance is low in coherence and the persuasiveness of their answers is average. Because lower coherence and persuasiveness scores suggest that a student has not yet fully integrated their new knowledge, this student is likely to benefit most from participating in activities that require them to apply their existing knowledge in relevant contexts (using VCoL).

Case 2

This student's scores, with the exception of their correctness score, are high relative to expectations. This students' knowledge appears to be well integrated, but the combination of average persuasiveness and low correctness suggests that there are gaps in their content knowledge relative to targeted content. Here, we would suggest filling in the missing content knowledge in a way that integrates it into this students' well-developed knowledge network.

Case 3

The scores received by this student are high for correctness, while they are average for mechanics, and low for Lectical Level, coherence, and persuasiveness. This pattern suggests that the student is memorizing content without integrating it effectively into his or her knowledge network and has been doing this for some time. This student is most likely to benefit from applying their existing content knowledge in personally relevant contexts (using VCoL) until their coherence, persuasiveness, and Lectical scores catch up with their correctness scores.

Straw men and flawed metrics

khan_constructivistTen years ago, Kirschner, Sweller, & Clark published an article entitled, Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching.

In this article, Kirschner and his colleagues contrast outcomes for what they call "guidance instruction" (lecture and demonstration) with those from constructivism-based instruction. They conclude that constructivist approaches produce inferior outcomes.

The article suffers from at least three serious flaws

First, the authors, in making their distinction between guided instruction and constructivist approaches, have created a caricature of constructivist approaches. Very few experienced practitioners of constructivist, discovery, problem-based, experiential, or inquiry-based teaching would characterize their approach as minimally guided. "Differently guided" would be a more appropriate term. Moreover, most educators who use constructivist approaches include lecture and demonstration where these are appropriate.

Second, the research reviewed by the authors was fundamentally flawed. For the most part, the metrics employed to evaluate different styles of instruction were not reasonable measures of the kind of learning constructivist instruction aims to support—deep understanding (the ability to apply knowledge effectively in real-world contexts). They were measures of memory or attitude. Back in 2010, Stein, Fisher, and I argued that metrics can't produce valid results if they don't actually measure what we care about  (Redesigning testing: Operationalizing the new science of learning. Why isn't this a no-brainer?

And finally, the longitudinal studies Kirschner and his colleagues reviewed had short time-spans. None of them examined the long-term impacts of different forms of instruction on deep understanding or long-term development. This is a big problem for learning research—one that is often acknowledged, but rarely addressed.

Since Kirschner's article was published in 2006, we've had an opportunity to examine the difference between schools that provide different kids of instruction, using assessments that measure the depth and coherence of students' understanding. We've documented a 3 to 5 year advantage, by grade 12, for students who attend schools that emphasize constructivist methods vs. those that use more "guidance instruction". 

To learn more, see:

Are our children learning robustly?

Lectica rationale

 

Lectica basics for schools

If you are a school leader, this post is for you. Here, you'll find information about Lectica, it's mission, and our first electronically scored Lectical Assessment—the LRJA.

Background

Lectica, Inc. is a 501(c)(3) charitable corporation. It's mission is to build and deliver learning tools that help students build skills for thinking and learning. These learning tools are backed by a strong learning model—the Virtuous Cycle of Learning (VCoL+7™)—and a comprehensive vision for educational testing and learning, which you can learn more about in our white paper—Virtuous cycles of learning: Redesigning testing during the digital revolution

We have spent over 20 years developing our methods and the technology required to deliver our learning tools—known as Lectical™ Assessments or DiscoTests®—at scale. These assessments are backed by a large body of research, including ongoing investigations of their validity and reliability. Here are some links to research reports:

The following video provides an overview our research and mission:

Current offerings

In the fall of 2016, we introduced our first electronically scored Lectical Assessment—the LRJA (an assessment of reflective judgment/critical thinking). The LRJA can be used in research and program evaluation as a summative assessment, or in the classroom as a formative assessment—or both.

The best way to learn about the LRJA is to experience it first-hand at lecticalive. Just click on this link, then select the "go straight to the demo" button. On the next page, fill in the sign up form with the educational level of your choice. Click "submit", then, click on the "autofill" button (top right, under the header) to fill the responses form with an example. 

If you're interested in working with the LRJA or would like to learn more about using Lectical Assessments to optimize thinking and learning, please contact us.

Second language learning predicts the growth of critical thinking

On November 20th, 2016, we presented a paper at the ACTFL conference in Boston. In this paper, we described the results of a 4-year research project, designed to address the question, "Does second language learning support the development of critical thinking as measured by the LRJA?". To learn more, view the presentation below.



 

Are our children learning robustly?

There are at least four reasons why people should learn robustly:

  1. It's fun!
  2. They'll learn more quickly.
  3. They'll keep growing longer.
  4. They'll be better prepared to participate fully in adult life.

Truly, there are no downsides to learning robustly. Yet robust learning is not what's happening for most students in most American schools. We have mounting—and disturbing—evidence that this is the case. 

The data in the figure below are from our database of reflective judgment assessments. These are open-response formative assessments of how well people think about and address thorny real world problems like bullying, television violence, dietary practices, and global warming. We've been delivering these assessments for several years now and have a diverse sample of over 20,000 completed assessments to learn from. 

We wanted to know how well schools are supporting development and what kind of role learning robustly might play in their performance. (Watch the video above to learn more about what counts as evidence of robust learning.) In particular, we wanted to know why students in one school—the Rainbow Community School—are outperforming students in other schools. (To learn about the Rainbow curriculum, click here.) 

We first looked at one of the key sources of evidence for robust learning—the quality of students' arguments. In the figure below, the Y axis represents the quality or "coherence" of students' arguments and the X axis represents their Lectical phase (or developmental phase, 1/4 of a Lectical Level). The highest coherence score students can receive is a 10.

In this figure, the Rainbow Community School is the clear leader, especially when it comes to students performing in lower phases, with inner-city (primarily low socioeconomic status) public schools at the low end, and more conventional private schools and high socioeconomic status public schools in the middle. So, how does this relate to student development? Since we regard coherence of argumentation as strong evidence of robust learning, and assert that robust learning is required to support optimal development, we would expect Rainbow students to develop more rapidly than students in schools with lower coherence scores.

Coherence by phase and school type

The figure below tells the story. When it comes to students' development on the Lectical Scale, Rainbow Community School students are way ahead of the pack. And our inner city schools are way behind. In fact, the average senior in our large (over 10,000 assessments) inner city sample is 5 years behind the projected score for the average senior in the Rainbow sample. Or in other words, inner city seniors, on average, are performing at the same level as Rainbow 7th graders.   

We know socioeconomic status is a factor that contributes to this gap, but shouldn't our schools be closing it rather than allowing it to grow larger? Take a look at the figure below. This figure assumes that students in the Rainbow Community School, on average, start out at about the same developmental level as students in private and high SES public schools, yet student growth is faster. In fact, the data project that Rainbow 9th graders would perform as well as seniors in the other schools. That's a 3-year advantage! We believe this difference is due to differences in instructional practices. What if we used these same practices in our inner city schools? If we could accelerate their learning as much as the Rainbow Community School has accelerated the learning of its students, inner-city students would be doing as well as private and high SES public schools!

Although socioeconomic status is a key factor, we think the differences seen here are at least partially due to fundamentally different ways of thinking about learning and teaching. Conventional schools tend to be primarily content focused. There is an emphasis on learning as remembering. The Rainbow Community School is skill focused. Its teachers use content as a vehicle for building core life skills, such as skills for learning, inquiry, evaluating information, making connections, communicating, conflict resolution, decision making, mindfulness, compassion, and building relationships. To build these skills students continuously engage in virtuous cycles of learning—cycles of information gathering, application, reflection, and goal setting—that exercise these skills while building robust connections between new and existing knowledge. Students not only learn content, they learn to use it effectively in their everyday lives. It becomes part of them. We call this embodied learning.

We're eager to study the impact of skill-focused curricula on the learning of less advantaged students. If you know of a school that's fostering robust learning AND serving disadvantaged students, we'd like to help them show off what they're accomplishing.

Note: Not only does Rainbow Community School ensure that its students are continuously engaged in VCoLs (virtuous cycles of learning), it uses a system of governance, Sociocracy, that supports virtuous cycling for everyone on staff as well as the continuous improvement of its curriculum. 

Appendix: Sample responses from 8th graders in different schools

Examples are taken from performances of students with average scores for their school. 

The question students answered: How is it possible that the two groups [pro and anti bullying] have such different ideas?

Rainbow Community School

It could be due to different experiences. Perhaps the ones going for the argument that a little bullying can be okay were disciplined more at home and have a tougher shell for things like this. [Parents] may base their initial ideas on their own experiences or their children's. It all really depends on the person and how they were raised.

High SES public School

This because they have different ideas and reasons for thinking what they believe and you can't change that. The parents are not the same and every one of them is different so they have a right to believe what they want to believe.

Low SES public school

Many people think different and many people look at things differently. So people get different ideas and opinions about things.

What is a holistic assessment?

Thirty years ago, when I was a hippy midwife, the idea of holism began to slip into the counter-culture. A few years later, this much misunderstood notion was all the rage on college campuses. By the time I was in graduate school in the nineties there was a impassable division between the trendy postmodern holists and the rigidly old fashioned modernists. You may detect a slight mocking tone, and rightly so. People with good ideas on both sides made themselves look pretty silly by refusing, for example, to use any of the tools associated with the other side. One of the more tragic outcomes of this silliness was the emergence of the holistic assessment.

Simply put, the holistic assessment is a multidimensional assessment that is designed to take a more nuanced, textured, or rich approach to assessment. Great idea. Love it.

It’s the next part that’s silly. Having collected rich information on multiple dimensions, the test designers sum up a person’s performance with a single number. Why is this silly? Because the so-called holistic score becomes pretty-much meaningless. Two people with the same score can have very little in common. For example, let’s imagine that a holistic assessment examines emotional maturity, perspective taking, and leadership thinking. Two people receive a score of 10 that may be accompanied by boilerplate descriptions of what emotional maturity, perspective taking, and leadership attitudes look like at level 10. However, person one was actually weak in perspective-taking and strongest in leadership, and person two was weak in emotional maturity and strongest in perspective taking. The score of 10, it turns out, means something quite different for these two people. I would argue that it is relatively meaningless because there is no way to know, based on the single “holistic” score, how best to support the development of these distinct individuals.

Holism has its roots in system dynamics, where measurements are used to build rich models of systems. All of the measurements are unidimensional. They are never lumped together into “holistic” measures. That would be equivalent to talking about the temperaturelength of a day or the lengthweight of an object*. It’s essential to measure time, weight, and length with appropriate metrics and then to describe their interrelationships and the outcomes of these interrelationships. The language used to describe these is the language of probability, which is sensitive to differences in the measurement of different properties.

In psychological assessment, dimensionality is a challenging issue. What constitutes a single dimension is a matter for debate. For DTS, the primary consideration is how useful an assessment will be in helping people learn and grow. So, we tend to construct individual assessments, each of which represents a fairly tightly defined content space, and we use only one metric to determine the level of a performance. The meaning of a given score is both universal (it is an order of hierarchical complexity and phase on the skill scale) and contextual (it is provided to a performance in a particular domain in a particular context, and is associated with particular content.) We independently analyze the content of the performance to determine its strengths and weaknesses—relative to its level and the known range of content associated with that level—and provide feedback about these strengths and weaknesses as well as targeted learning suggestions. We use the level score to help us tell a useful story about a particular performance, without claiming to measure “lenghtweight”. This is accomplished by the rigorous separation of structure (level) and content.

*If we described objects in terms of their lengthweight, an object that was 10 inches long and 2 lbs could have a lengthweight of 12, but so could an object that was 2 inches long and 10 lbs.

Kegan’s Subject-Object Interview and the LSUA

Before I write about the relation between Kegan's Subject-Object Interview and the LSUA (the Lectical Self-Understanding Assessment), I'd like to explain some differences between these assessments. First, the SOI is both an interview and an assessment system. It was developed by studying the interviews of a small sample of respondents (Does anyone know how many?) who were interviewed on several occasions over the course of several years (Again, does anyone know how many or how often?). The level definitions and the scoring criteria in the SOI are tied to the subject matter of the interviews in the original sample (construction sample). For this reason, the SOI is called a domain-specific assessment. Researchers would say that the levels were defined by "bootstrapping" from the longitudinal data. Critiques of this kind of assessment point to bias in their level definitions (due to their small and culturally narrow construction samples), the related conflation (confusion) of particular conceptual content with developmental levels, and a weak articulation of the lowest levels, which are not based on direct empirical evidence from appropriate-aged respondents.

With respect to the LSUA, I want to clarify that it is scored with the Lectical Assessment System (LAS), a content-independent developmental scoring system that was created, in part, by identifying the dimension that underlies all longitudinally bootstrapped developmental assessment systems*. The SOI was one of the assessment systems I studied on the way to developing the LAS. Consequently, if the LAS does what it is supposed to do, it should capture the developmental dimension that underlies Kegan's system even better than his scoring system, because the LAS is a second generation developmental scoring system that is not restrained by a content-driven scoring process (Dawson, 2002; Dawson, Xie, & Wilson, 2003: There is much written about this in our published work, available on our web site.)

What is the relation between the LSUA and the Subject-Object Interview?

This is a difficult question to answer, partly because there is no research that directly compares the SOI and the LSUA. However, because the LAS is a domain independent scoring system that can be used to score any text that includes judgments and justifications, I have used it to score the SOI scoring manual. The developmental sequence for SOI levels 3 to 5 corresponds well to the dimension captured by the LAS, and levels 3-5 correspond roughly with Lectical Levels 10-12. However, Kegan's lower levels do not match up as well, possibly because his construction sample (the sample used to define his levels), as far as we can determine, did not include young children. (Kegan's original research was never published in a form that would allow us to evaluate the approach he took to defining his levels or the reliability and validity of the SOI. All we can locate are a few very small studies of inter-rater reliability, most of which are unpublished [Kegan, 2002].)

Comparisons of the Subject-Object Interview with other developmental assessment systems

There is some research comparing the SOI with other developmental assessment systems. In general, this research finds that the SOI and these other systems are likely to tap the same developmental dimension (see Pratt, et. al., 1991).

Ideally, we would like to conduct a direct comparison of the LAS and the scoring system Kegan developed to score the SOI, as we have done with other developmental assessment systems. (We are working with a graduate student who is planning do do this kind of comparison.) In the mean time, we can point to comparisons between the LAS and several other developmental assessment systems (Kohlberg, Armon, Kitchener & King, Perry) that were developed using methods similar to those used by Kegan, and have routinely found strong correlations (above .85) between these scoring systems and the LAS, especially when they are used to score the same material (Dawson, 2000, 2001 2002a, 2004; Dawson, Xie, & Wilson, 2003 ).

Finally, some of Kegan's level definitions are almost identical to those of Kohlberg and Selman. In fact, I would argue that they are primarily an extension of Selman's original work on socio-moral perspective, which has informed most domain-based developmental assessment systems (including all of the systems mentioned here) since it was introduced in the 1960's (and was a great help to me when I was developing the LAS).

*The claim that there is a single developmental dimension that underlies these systems is NOT the same thing as a claim that an individual will be at the same level in different knowledge/skill areas.

References

Commons, M. L., Armon, C., Richards, F. A., Schrader, D. E., Farrell, E. W., Tappan, M. B., et al. (1989). A multidomain study of adult development. In D. Sinnott, F. A. Richards & C. Armon (Eds.), Adult development, Vol. 1: Comparisons and applications of developmental models. (pp. 33-56). New York: Praeger Publishers.

Dawson, T. L. (2000). Moral reasoning and evaluative reasoning about the good life. Journal of Applied Measurement, 1(4), 372-397.

Dawson, T. L. (2001). Layers of structure: A comparison of two approaches to developmental assessment. Genetic Epistemologist, 29, 1-10.

Dawson, T. L. (2002a). A comparison of three developmental stage scoring systems. Journal of Applied Measurement, 3, 146-189.

Dawson, T. L. (2002b). New tools, new insights: Kohlberg’s moral reasoning stages revisited. International Journal of Behavioral Development, 26, 154-166.

Dawson, T. L., Xie, Y., & Wilson, M. (2003). Domain-general and domain-specific developmental assessments: Do they measure the same thing? Cognitive Development, 18, 61-78.

Dawson, T. L. (2004). Assessing intellectual development: Three approaches, one sequence. Journal of Adult Development, 11, 71-85.

Kegan, R. (2002). A guide to the subject-object interview. Unpublished Scoring manual. Harvard Graduate School of Education.

King, P. M., Kitchener, K. S., Wood, P. K., & Davison, M. L. (1989). Relationships across developmental domains: A longitudinal study of intellectual, moral, and ego development. In M. L. Commons, J. D. Sinnot, F. A. Richards & C. Armon (Eds.), Adult development. Volume 1: Comparisons and applications of developmental models (pp. 57-71). New York: Praeger.

Lambert, H. V. (1972). A comparison of Jane Loevinger's theory of ego development and Lawrence Kohlberg's theory of moral development. University of Chicago, Chicago, IL.

Pratt, M. W., Diessner, R., Hunsberger, B., Pancer, S. M., & Savoy, K. (1991). Four pathways in the analysis of adult development and aging: Comparing analyses of reasoning about personal-life dilemmas. Psychology & Aging, 6, 666-675.

Sullivan, E. V., McCullough, G., & Stager, M. A. (1970). A developmental study of hte relationship between conceptual, ego, and moral development. Child Development, 41, 399-411.