If a US President thought like a teenager…

In a series of Medium articles, my colleagues and I have been examining the complexity level of national leaders’ thinking — with a newly validated electronic developmental assessment system called CLAS. Since I posted the second article in this series, which focused on the thinking of recent U. S. presidents, I’ve been asked several times to say more about what complexity scores mean and why they matter.

So, here goes. To keep it simple (or a simple as a discussion of complexity can be), I’m going to limit myself to an exploration of the complexity scores of Presidents Trump (mean score = 1054) and Obama, (mean score = 1163).

If you are unfamiliar with complexity levels, I recommend that you start by watching the short video, below. It provides a general explanation of developmental levels that will help get you oriented.

Adult complexity zones

If you’ve read the previous articles in this series (recommended), you’ve already seen the figure below. It shows the four complexity “zones” that are most common in adulthood and describes them in terms of the kinds of perspectives people performing in each zone are likely to be able to work with effectively. The first zone, advanced linear thinking, is the most common among adults in the United States. It’s also fairly common in the later years of high school—though early linear thinking (not shown here) is more common in that age range.

As development progresses, knowledge and thought move through levels of increasing complexity. Each level builds upon the previous level, which means we have to pass through all of the levels in sequence. Skipping a level is impossible, because a level can’t be built unless there is an earlier level to build upon. As we move through these levels, the evidence of earlier levels does not disappear. It leaves traces in language that can be represented as a kind of history of a person’s development. We call this a developmental profile. To produce a score, CLAS’s algorithm compares an individual’s developmental profile to the typical profiles for each possible score on the complexity scale. Right now, the CLAS algorithm is based on 20 years of rigorous research involving over 45,000 scored interviews, observations, and assessments.

President Trump

In the second article of this series, I reported that President Trump’s average score (1054) was in the advanced linear thinking zone. Thinking in this zone is abstract and linear. People performing in this zone link ideas in chains of (more or less) logical relations. Reasoning has a “black and white” quality, in the sense that there is a strong preference for simple correct or incorrect answers. Although individuals performing in this level can often see that a situation or problem involves multiple factors, the only way they can organize their thinking about these factors is in chains of logical statements, usually with an “if, then” structure. President Trump, in his interview with The Wall Street Journal on the 25th of July, 2017, provided a typical “if, then” argument when asked about trade with the UK. He argued:

…we’re going to have a very good relationship with the U.K. And we do have to talk to the European Union, because it’s not a reciprocal deal, you know. The word reciprocal, to me, is very important. For instance, we have countries that charge us 100 percent tax to sell a Harley-Davidson into that country. And yet, they sell their motorcycles to us, or their bikes, or anything comparable, and we charge them nothing. There has to be a reciprocal deal. I’m all about that.

The complexity level of an argument can be seen in its structure and the meanings embedded in that structure. This argument has an “if, then” structure, and points to the meaning of reciprocity, which for the President seems to mean an equal exchange—”If you tax at a certain level, then we should tax at that level too.”  This kind of “tit for tat” thinking is common in level 10 and below. It’s also a form of thinking that disappears above level 10. For example, in level 11, an individual would be more likely to argue, “It’s more complex than that. There are other considerations that need to be taken into account, like the impact a decision like this is is likely to have on international relations or our citizens’ buying power.” President Trump, in his response, does not even mention additional considerations. This is one of the patterns in his responses that contributed to the score awarded by CLAS.

In the results reported here, a Democrat scored higher than a Republican. We have no reason to believe that conservative thinking is inherently less complex than liberal thinking. In fact, in the past, we have identified highly complex thinking in both conservative and liberal leaders.


A couple of side notes

Upon reading President Trump’s statement above, you may have noticed that, without any framing or context, the President jumped to a discussion of reciprocity. This lack of framing is a ubiquitous feature of President Trump’s arguments. I did not mention it in my discussion of complexity because it is not a direct indicator of thinking complexity. It’s more strongly connected to logical coherence, which correlates with complexity but is not fully explained by complexity.

I’d also like to note that it was difficult to find a single argument in President Trump’s interviews that contained an actual explanation. When asked to explain a position, President Trump was far more likely to (1) tell a story, (2) deride someone, (3) point out his own fame or popularity, or (3) claim that another perspective was a lie or fake news. These were the main ways in which he “backed up” his opinions. Like the absence of framing, these behaviors are not direct indicators of thinking complexity, though they may be correlated with complexity. They are more strongly related to disposition, values, and personality.

These flaws in President Trump’s thinking, combined with the complexity level of his interview responses, should raise considerable alarm. If the President Trump we see is showing us his best thinking—and a casual examination of other examples of his thinking suggests that this is likely to be the case—he clearly lacks the thinking skills demanded by his role. In fact, mid-level management roles generally require better thinking skills than those demonstrated by President Trump.


President Obama

President Obama’s mean score (1163) was in the advanced systems thinking zone. Thinking in this zone is multivariate and non-linear. People performing in this zone link ideas in complex webs of relations, connecting these webs of relations to one another through common elements. For example, they view individuals as complex webs of traits & behaviors, and groups of individuals as complex webs that include not only the intersections of the webs of their members, but their own distinct properties. Thinking in this zone is very different from thinking in the advanced linear thinking zone. Where individuals performing in the advanced linear thinking zone are concerned about immediate outcomes and proximal causes, individuals performing in the advanced systems thinking zone concern themselves with long term outcomes and systemic causes. Here is an example from President Obama’s interview with the New York Times on March 7th, 2009, in which he explains his approach to economic recovery following the onset of the great recession:

…people have been concerned, –understandably, about the decline in the market. Well, the reason the market’s declining is because the economy’s declining and it’s generating a lot of bad news, not surprisingly. And so what I’m focused on is fixing the underlying economy. That’s ultimately what’s going to fix the markets. …in the interim you’ve got some folks who would love to see us artificially prop up the market by just putting in more taxpayer money, which in the short term could make bank balance sheets look better, make creditors and bondholders and shareholders of these financial institutions feel better and you could get a little blip. But we’d be in the exact same spot as we were six, eight, 10 months [ago]. So, what I’ve got to do is make sure that we’re focused on the underlying economy, and … if we do that well …we’re going to get this economy moving again. And I think over the long term we’re going to be much better off.

Rather than offering a pre-determined solution or focusing a single element of the economic crisis, President Obama anchors on the economic system as a system, advocating a comprehensive long-term solution rather than band-aid solutions that might offer some positive immediate results, but would be likely to backfire in the long term. Appreciating that the economic situation presents “a very complex set of problems,”  he employs a decision-making process that is “constantly… guided by evidence, facts, talking through all the best arguments, drawing from all the best perspectives, and then talking the best course of action possible.”

The complexity level of president Obama’s thinking as represented in the press interviews analyzed for our study, is a reasonable fit for high office. Of course, we were not able to determine if his scores in this context represent his full capabilities. An informal examination of some of his written work suggests that the “true” complexity level of his thinking may be even higher.

Discussion

Thinking complexity is not the only factor that plays a role in a president’s success. As president, Obama experienced both successes and failures, and as is usually the case, it’s difficult to say to what extent his solutions contributed to these successes or failures. But, even in the face of this uncertainty, isn’t it a no brainer that a complex problem that’s adequately understood is more likely to be resolved than a complex problem that’s not even recognized?

In his interview with the Wall Street Journal, President Trump claimed that Barack Obama, “didn’t know what the hell he was doing.” Our results suggest that it may be President Trump who doesn’t know what Obama was doing.

 


Other articles in this series

  1. The complexity of national leaders’ thinking: How does it measure up?
  2. The complexity of national leaders’ thinking: U.S. Presidents
Please follow and like us:

The complexity of national leaders’ thinking: How does it measure up?

Special thanks to my Australian colleague, Aiden Thornton, for his editorial and research assistance.

This is the first in a series of articles on the complexity of national leaders’ thinking. These articles will report results from research conducted with CLAS, our newly validated electronic developmental scoring system. CLAS will be used to score these leaders’ responses to questions posed by prominent journalists.

In this first article, I’ll be providing some of the context for this project, including information about how my colleagues and I think about complexity and its role in leadership. I’ve embedded lots of links to additional material for readers who have questions about our 100+ year-old research tradition, Lectica’s (the nonprofit that owns me) assessments, and other research we’ve conducted with these assessments.

Context and research questions

Lectica creates diagnostic assessments for learning that support the development of mental skills required for working with complexity. We make these learning tools for both adults and children. Our K-12 initiative—the DiscoTest Initiative—is dedicated to bringing these tools to individual K-12 teachers everywhere, free of charge. Its adult assessments are used by organizations in recruitment and training, and by colleges and universities in admissions and program evaluation.

All Lectical Assessments measure the complexity level (aka, level of vertical development) of people’s thinking in particular knowledge areas. A complexity level score on a Lectical Assessment tells us the highest level of complexity—in a problem, issue, or task—an individual is likely to be able to work with effectively.

On several occasions over the last 20 years, my colleagues and I have been asked to evaluate the complexity of national leaders’ reasoning skills. Our response has been, “We will, but only when we can score electronically—without the risk of human bias.” That time has come. Now that our electronic developmental scoring system, CLAS, has demonstrated a level of reliability and precision that is acceptable for this purpose, we’re ready to take a look.​

Evaluating the complexity of national leaders’ thinking is a challenging task for several reasons. First, it’s virtually impossible to find examples of many of these leaders’ “best work.” Their speeches are generally written for them, and speech writers usually try to keep the complexity level of these speeches low, aiming for a reading level in the 7th to 9th grade range. (Reading level is not the same thing as complexity level, but like most tests of capability, it correlates moderately with complexity level.) Second, even when national leaders respond to unscripted questions from journalists, they work hard to use language that is accessible to a wide audience. And finally, it’s difficult to identify a level playing field—one in which all leaders have the same opportunity to demonstrate the complexity of their thinking.

Given these obstacles, there’s no point in attempting to evaluate the actual thinking capabilities of national leaders. In other words, we won’t be claiming that the scores awarded by CLAS represent the true complexity level of leaders’ thinking. Instead, we will address the following questions:

  1. When asked by prominent journalists to explain their positions on complex issues, what is the average complexity level of national leaders’ responses?
  2. How does the complexity level of national leaders’ responses relate to the complexity of the issues they discuss?

Thinking complexity and leader success

At this point, you may be wondering, “What is thinking complexity and why is it important?” A comprehensive response to this question isn’t possible in a short article like this one, but I can provide a basic description of complexity as we see it at Lectica, and provide some examples that highlight its importance.

All issues faced by leaders are associated with a certain amount of built-in complexity. For example:

  1. The sheer number of factors/stakeholders that must be taken into account.
  2. Short and long-term implications/repercussions. (Will a quick fix cause problems downstream, such as global unrest or catastrophic weather?)
  3. The number and diversity of stakeholders/interest groups. (What is the best way to balance the needs of individuals, families, businesses, communities, states, nations, and the world?)
  4. The length of time it will take to implement a decision. (Will it take months, years, decades? Longer projects are inherently more complex because of changes over time.)
  5. Formal and informal rules/laws that place limits on the deliberative process. (For example, legislative and judicial processes are often designed to limit the decision making powers of presidents or prime ministers. This means that leaders must work across systems to develop decisions, which further increases the complexity of decision making.)

Over the course of childhood and adulthood, the complexity of our thinking develops through up to 13 skill levels (0–12). Each new level builds upon the previous level. The figure above shows four adult complexity “zones” — advanced linear thinking (second zone of level 10), early systems thinking (first zone of level 11), advanced systems thinking (second zone of level 11), early principles thinking (first zone of level 12). In advanced linear thinking, reasoning is often characterized as “black and white.” Individuals performing in this zone cope best with problems that have clear right or wrong answers. It is only once individuals enter early systems thinking, that we begin to work effectively with highly complex problems that do not have clear right or wrong answers.

Leadership at the national level requires exceptional skills for managing complexity, including the ability to deal with the most complex problems faced by humanity (Helbing, 2013). Needless to say, a national leader regularly faces issues at or above early principles thinking.

Complexity level and leadership—the evidence

In the workplace, the hiring managers who decide which individuals will be put in leadership roles are likely to choose leaders whose thinking complexity is a good match for their roles. Even if they have never heard the term complexity level, hiring managers generally understand, at least implicitly, that leaders who can work with the complexity inherent in the issues associated with their roles are likely to make better decisions than leaders whose thinking is less complex.

There is a strong relation between the complexity of leadership roles and the complexity level of leaders’ reasoning. In general, more complex thinkers fill more complex roles. The figure below shows how lower and senior level leaders’ complexity scores are distributed in Lectica’s database. Most senior leaders’ complexity scores are in or above advanced systems thinking, while those of lower level leaders are primarily in early systems thinking.

The strong relation between the complexity of leaders’ thinking and the complexity of their roles can also be seen in the recruitment literature. To be clear, complexity is not the only aspect of leadership decision making that affects leaders’ ability to deal effectively with complex issues. However, a large body of research, spanning over 50 years, suggests that the top predictors of workplace leader recruitment success are those that most strongly relate to thinking skills, including complexity level.

The figure below shows the predictive power of several forms of assessment employed in making hiring and promotion decisions. The cognitive assessments have been shown to have the highest predictive power. In other words, assessments of thinking skills do a better job predicting which candidates will be successful in a given role than other forms of assessment.

Predictive power graph

The match between the complexity of national leaders’ thinking and the complexity level of the problems faced in their roles is important. While we will not be able to assess the actual complexity level of the thinking of national leaders, we will be able to examine the complexity of their responses to questions posed by prominent journalists. In upcoming articles, we’ll be sharing our findings and discussing their implications.

Coming next…

In the second article in this series, we begin our examination of the complexity of national leaders’ thinking by scoring interview responses from four US Presidents—Bill Clinton, George W. Bush, Barack Obama, and Donald Trump.

 


Appendix

Predictive validity of various types of assessments used in recruitment

The following table shows average predictive validities for various forms of assessment used in recruitment contexts. The column “variance explained” is an indicator of how much of a role a particular form of assessment plays in predicting performance—it’s predictive power.

Form of assessment Source Predictive validity Variance explained  Variance explained (with GMA)
Complexity of workplace reasoning (Dawson & Stein, 2004; Stein, Dawson, Van Rossum, Hill, & Rothaizer, 2003) .53 28% n/a
Aptitude (General Mental Ability, GMA) (Hunter, 1980; Schmidt & Hunter, 1998) .51 26% n/a
Work sample tests (Hunter & Hunter, 1984; Schmidt & Hunter, 1998) .54 29% 40%
Integrity (Ones, Viswesvaran, and Schmidt, 1993; Schmidt & Hunter, 1998) .41 17% 42%
Conscientiousness (Barrick & Mount, 1995; Schmidt & Hunter, 1998). .31 10% 36%
Employment interviews (structured) (McDaniel, Whetzel, Schmidt, and Mauer, 1994; Schmidt & Hunter, 1998) .51 26% 39%
Employment interviews (unstructured) (McDaniel, Whetzel, Schmidt, and Mauer, 1994 Schmidt & Hunter, 1998) .38 14% 30%
Job knowledge tests (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .48 23% 33%
Job tryout procedure (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .44 19% 33%
Peer ratings (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .49 24% 33%
Training & experience: behavioral consistency method (McDaniel, Schmidt, and Hunter, 1988a, 1988b; Schmidt & Hunter, 1998; Schmidt, Ones, and Hunter, 1992) .45 20% 33%
Reference checks (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .26 7% 32%
Job experience (years) Hunter, 1980); McDaniel, Schmidt, and Hunter, 1988b; Schmidt & Hunter, 1998) .18 3% 29%
Biographical data measures Supervisory Profile Record Biodata Scale (Rothstein, Schmidt, Erwin, Owens, and Sparks, 1990; Schmidt & Hunter, 1998) .35 12% 27%
Assessment centers (Gaugler, Rosenthal, Thornton, and Benson, 1987; Schmidt & Hunter, 1998; Becker, Höft, Holzenkamp, & Spinath, 2011) Note: Arthur, Day, McNelly, & Edens (2003) found a predictive validity of .45 for assessment centers that included mental skills assessments. .37 14% 28%
EQ (Zeidner, Matthews, & Roberts, 2004) .24 6% n/a
360 assessments Beehr, Ivanitskaya, Hansen, Erofeev, & Gudanowski, 2001 .24 6% n/a
Training &  experience: point method (McDaniel, Schmidt, and Hunter, 1988a; Schmidt & Hunter, 1998) .11 1% 27%
Years of education (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .10 1% 27%
Interests (Schmidt & Hunter, 1998) .10 1% 27%

References

Arthur, W., Day, E. A., McNelly, T. A., & Edens, P. S. (2003). A meta‐analysis of the criterion‐related validity of assessment center dimensions. Personnel Psychology, 56(1), 125-153.

Becker, N., Höft, S., Holzenkamp, M., & Spinath, F. M. (2011). The Predictive Validity of Assessment Centers in German-Speaking Regions. Journal of Personnel Psychology, 10(2), 61-69.

Beehr, T. A., Ivanitskaya, L., Hansen, C. P., Erofeev, D., & Gudanowski, D. M. (2001). Evaluation of 360 degree feedback ratings: relationships with each other and with performance and selection predictors. Journal of Organizational Behavior, 22(7), 775-788.

Dawson, T. L., & Stein, Z. (2004). National Leadership Study results. Prepared for the U.S. Intelligence Community.

Dawson, T. L. (2017, October 20). Using technology to advance understanding: The calibration of CLAS, an electronic developmental scoring system. Proceedings from Annual Conference of the Northeastern Educational Research Association, Trumbull, CT.

Dawson, T. L., & Thornton, A. M. A. (2017, October 18). An examination of the relationship between argumentation quality and students’ growth trajectories. Proceedings from Annual Conference of the Northeastern Educational Research Association, Trumbull, CT.

Gaugler, B. B., Rosenthal, D. B., Thornton, G. C., & Bentson, C. (1987). Meta-analysis of assessment center validity. Journal of Applied Psychology, 72(3), 493-511.

Helbing, D. (2013). Globally networked risks and how to respond. Nature, 497, 51-59.

Hunter, J. E., & Hunter, R. F. (1984). The validity and utility of alternative predictors of job performance. Psychological Bulletin, 96, 72-98.

Hunter, J. E., Schmidt, F. L., & Judiesch, M. K. (1990). Individual differences in output variability as a function of job complexity. Journal of Applied Psychology, 75, 28-42.

Johnson, J. (2001). Toward a better understanding of the relationship between personality and individual job performance. In M. R. R. Barrick, Murray R. (Ed.), Personality and work: Reconsidering the role of personality in organizations (pp. 83-120).

McDaniel, M. A., Schmidt, F. L., & Hunter, J., E. (1988a). A Meta-analysis of the validity of training and experience ratings in personnel selection. Personnel Psychology, 41(2), 283-309.

McDaniel, M. A., Schmidt, F. L., & Hunter, J., E. (1988b). Job experience correlates of job performance. Journal of Applied Psychology, 73, 327-330.

McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). Validity of employment interviews. Journal of Applied Psychology, 79, 599-616.

Rothstein, H. R., Schmidt, F. L., Erwin, F. W., Owens, W. A., & Sparks, C. P. (1990). Biographical data in employment selection: Can validities be made generalizable? Journal of Applied Psychology, 75, 175-184.

Stein, Z., Dawson, T., Van Rossum, Z., Hill, S., & Rothaizer, S. (2013, July). Virtuous cycles of learning: using formative, embedded, and diagnostic developmental assessments in a large-scale leadership program. Proceedings from ITC, Berkeley, CA.

Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A meta-analytic review. Personnel Psychology, 44, 703-742.

Please follow and like us:

Lectica’s story: long, rewarding, & still unfolding


Lectica's story started in Toronto in 1976…

Identifying the problem

During the 70s and 80s I practiced midwifery. It was a great honor to be present at the births of over 500 babies, and in many cases, follow them into childhood. Every single one of those babies was a joyful, driven, and effective "every moment" learner. Regardless of difficulty and pain they all learned to walk, talk, interact with others, and manipulate many aspects of their environment. They needed few external rewards to build these skills—the excitement and suspense of striving seemed to be reward enough. I felt like I was observing the "life force" in action.

Unfortunately as many of these children approached the third grade (age 8), I noticed something else—something deeply troubling. Many of the same children seemed to have lost much of this intrinsic drive to learn. For them, learning had become a chore motivated primarily by extrinsic rewards and punishments. Because this was happening primarily to children attending conventional schools (Children receiving alternative instruction seemed to be exempt.) it appeared that something about schooling was depriving many children of the fundamental human drive required to support a lifetime of learning and development—a drive that looked to me like a key source of happiness and fulfillment.

Understanding the problem

Following upon my midwifery career, I flirted briefly with a career in advertising, but by the early 90's I was back in school—in a Ph.D. program in U. C. Berkeley's Graduate School of Education—where I found myself observing the same pattern I'd observed as a midwife. Both the research and my own lab experience exposed the early loss of students' natural love of learning. My concern was only increased by the newly emerging trend toward high stakes multiple choice testing, which my colleagues and I saw as a further threat to children's natural drive to learn.

Most of the people I've spoken to about this problem have agreed that it's a shame, but few have seen it as a problem that can be solved, and many have seen it as an inevitable consequence of either mass schooling or simple maturation. But I knew it was not inevitable. Children and those educated in a range of alternative environments did not appear to lose their drive to learn. Additionally, above average students in conventional schools appeared to be more likely to retain their love of learning.

I set out to find out why—and ended up on a long journey toward a solution.

How learning works

First, I needed to understand how learning works. At Berkeley, I studied a wide variety of learning theories in several disciplines, including developmental theories, behavioral theories, and brain-based theories. I collected a large database of longitudinal interviews and submitted them to in-depth analysis, looked closely at the relation between testing and learning, and studied psychological measurement, all in the interest of finding a way to support childrens' growth while reinforcing their love of learning.

My dissertation—which won awards from both U.C. Berkeley and the American Psychological Association—focused on the development of people's conceptions of learning from age 5 through 85, and how this kind of knowledge could be used to measure and support learning. In 1998, I received $500,000 from the Spencer Foundation to further develop the methods designed for this research. Some of my areas of expertise are human learning and development, psychometrics, metacognition, moral education, and research methods.

In the simplest possible terms, what I learned in 5 years of graduate school is that the human brain is designed to drive learning, and that preserving that natural drive requires 5 ingredients:

  1. a safe environment that is rich in learning opportunities and healthy human interaction,
  2. a teacher who understands each child's interests and level of tolerance for failure,
  3. a mechanism for determining "what comes next"—what is just challenging enough to allow for success most of the time (but not all of the time),
  4. instant actionable feedback, and 
  5. the opportunity to integrate new knowledge or skills into each learner's existing knowledge network well enough to make it useable before pushing instruction to the next level. (We call this building a "robust knowledge network"—the essential foundation for future learning.)*

Identifying the solution

Once we understood what learning should look like, we needed to decide where to intervene. The answer, when it came, was a complete surprise. Understanding what comes next—something that can only be learned by measuring what a student understands now—was an integral part of the recipe for learning. This meant that testing—which we originally saw as an obstacle to robust learning—was actually the solution—but only if we could build tests that would free students to learn the way their brains are designed to learn. These tests would have to help teachers determine "what comes next" (ingredient 3) and provide instant actionable feedback (ingredient 4), while rewarding them for helping students build robust knowledge networks (ingredient 5).

Unfortunately, conventional standardized tests were focused on "correctness" rather than robust learning, and none of them were based on the study of how targeted concepts and skills develop over time. Moreover, they were designed not to support learning, but rather to make decisions about advancement or placement, based on how many correct answers students were able to provide relative to other students. Because this form of testing did not meet the requirements of our learning recipe, we'd have to start from scratch.

Developing the solution

We knew that our solution—reinventing educational testing to serve robust learning—would require many years of research. In fact, we would be committing to possible decades of effort without a guaranteed result. It was the vision of a future educational system in which all children retained their inborn drive for learning that ultimately compelled us to move forward. 

To reinvent educational testing, we needed to:

  1. make a deep study of precisely how children build particular knowledge and skills over time in a wide range of subject areas (so these tests could accurately identify "what comes next");
  2. make tests that determine how deeply students understand what they have learned—how well they can use it to address real-world issues or problems (requires that students show how they are thinking, not just what they know—which means written responses with explanations); and
  3. produce formative feedback and resources designed to foster "robust learning" (build robust knowledge networks).

Here's what we had to invent:

  1. A learning ruler (building on Commons [1998] and Fischer [2006]);
  2. A method for studying how students learn tested concepts and skills (refining the methods developed for my dissertation);
  3. A human scoring system for determining the level of understanding exhibited in students' written explanations (building upon Commons' and Fischer's methods, refining them until measurements were precise enough for use in educational contexts); and 
  4. An electronic scoring system, so feedback and resources could be delivered in real time.

It took over 20 years (1996–2016), but we did it! And while we were doing it, we conducted research. In fact, our assessments have been used in dozens of research projects, including a 25 million dollar study of literacy conducted at Harvard, and numerous Ph.D. dissertations—with more on the way.

What we've learned

We've learned many things from this research. Here are some that took us by surprise:

  1. Students in schools that focus on building deep understanding graduate seniors that are up to 5 years ahead (on our learning ruler) of students in schools that focus on correctness (2.5 to 3 years after taking socioeconomic status into account).
  2. Students in schools that foster robust learning develop faster and continue to develop longer (into adulthood) than students in schools that focus on correctness.
  3. On average, students in schools that foster robust learning produce more coherent and persuasive arguments than students in schools that focus on correctness.
  4. On average, students in our inner-city schools, which are the schools most focused on correctness, stop developing (on our learning ruler) in grade 10. 
  5. The average student who graduates from a school that strongly focuses on correctness is likely, in adulthood, to (1) be unable to grasp the complexity and ambiguity of many common situations and problems, (2) lack the mental agility to adapt to changes in society and the workplace, and (3) dislike learning. 

From our perspective, these results point to an educational crisis that can best be addressed by allowing students to learn as their brains were designed to learn. Practically speaking, this means providing learners, parents, teachers, and schools with metrics that reward and support teaching that fosters robust learning. 

Where we are today

Lectica has created the only metrics that meet all of these requirements. Our mission is to foster greater individual happiness and fulfillment while preparing students to meet 21st century challenges. We do this by creating and delivering learning tools that encourage students to learn the way their brains were designed to learn. And we ensure that students who need our learning tools the most get them first by providing free subscriptions to individual teachers everywhere.

To realize our mission, we organized as a nonprofit. We knew this choice would slow our progress (relative to organizing as a for-profit and welcoming investors), but it was the only way to guarantee that our true mission would not be derailed by other interests.

Thus far, we've funded ourselves with work in the for-profit sector and income from grants. Our background research is rich, our methods are well-established, and our technology works even better than we thought it would. Last fall, we completed a demonstration of our electronic scoring system, CLAS, a novel technology that learns from every single assessment taken in our system. 

The groundwork has been laid, and we're ready to scale. All we need is the platform that will deliver the assessments (called DiscoTests), several of which are already in production.

After 20 years of high stakes testing, students and teachers need our solution more than ever. We feel compelled to scale a quickly as possible, so we can begin the process of reinvigorating today's students' natural love of learning, and ensure that the next generation of students never loses theirs. Lectica's story isn't finished. Instead, we find ourselves on the cusp of a new beginning! 

Please consider making a donation today.

 


A final note: There are many benefits associated with our approach to assessment that were not mentioned here. For example, because the assessment scores are all calibrated to the same learning ruler, students, teachers, and parents can easily track student growth. Even better, our assessments are designed to be taken frequently and to be embedded in low-stakes contexts. For grading purposes, teachers are encouraged to focus on growth over time rather than specific test scores. This way of using assessments pretty much eliminates concerns about cheating. And finally, the electronic scoring system we developed is backed by the world's first "taxonomy of learning," which also serves many other educational and research functions. It's already spawned a developmentally sensitive spell-checker! One day, this taxonomy of learning will be robust enough to empower teachers to create their own formative assessments on the fly. 

 


*This is the ingredient that's missing from current adaptive learning technologies.

 

Please follow and like us:

What PISA measures. What we measure.

Like the items in Lectical Assessments, PISA items involve real-world problems. PISA developers also claim, as we do here at Lectica, that their items measure how knowledge is applied. So, why do we persist in claiming that Lectical Assessments and assessments like PISA measure different things?

Part of the answer lies in questions about what's actually being measured, and in the meaning of terms like "real world problems" and "how knowledge is applied." I'll illustrate with an example from, Take the test: sample questions from OECD's PISA assessments

One of the reading comprehension items in "Take the test" involves a short story about a woman who is trapped in her home during a flood. Early in the story, a hungry panther arrives on her porch. The woman has a gun, which she keeps at her side as long as the panther is present. At first, it seems that she will kill the panther, but in the end, she offers it a ham hock instead. 

What is being measured?

There are three sources of difficulty in the story. It's Lectical phase is 10c—the third phase of four in level 10. Also, the story is challenging to interpret because it's written to be a bit ambiguous. I had to read it twice in order to appreciate the subtlety of the author's message. And it is set on the water in a rural setting, so there's lots of language that would be new to many students. How well a student will comprehend this story hinges on their level of understanding—where they are currently performing on the Lectical Scale—and how much they know about living on the water in a rural setting. Assuming they understand the content of the story, it also depends on how good they are at decoding the somewhat ambiguous message of the story.

The first question that comes up for me is whether or not this is a good story selection for the average 15-year-old. The average phase of performance for most 15-year-olds is 10a. That's their productive level. When we prescribe learning recommendations to students performing in 10a, we choose texts that are about 1 phase higher than their current productive level. We refer to this as the "Goldilocks zone", because we've found it to be the range in which material is just difficult enough to be challenging, but not so difficult that the risk of failure is too high. Some failure is good. Constant failure is bad.

But this PISA story is intended to test comprehension; it's not a learning recommendation or resource. Here, its difficulty level raises a different issue. In this context, the question that arises for me is, "What is reading comprehension, when the text students are asked to decode presents different challenges to students living in different environments and performing in different Lectical Levels?" Clearly, this story does not present the same challenge to students performing in phase 10a as it presents to students performing in 10c. Students performing in 10a or lower are struggling to understand the basic content of the story. Students performing in 10c are grappling with the subtlety of the message. And if the student lives in a city and knows nothing about living on the water, even a student performing at 10c is disadvantaged.

Real world problems

Now, let's consider what it means to present a real-world problem. When we at Lectica use this term, we usually mean that the problem is ill-structured (like the world), without a "correct" answer. (We don't even talk about correctness.) The challenges we present to learners reveal the current level of their understandings—there is always room for growth. One of our interns refers to development as a process of learning to make "better and better mistakes". This is a VERY different mindset from the "right or wrong" mindset nurtured by conventional standardized tests.

What do PISA developers mean by "real world problem"? They clearly don't mean without a "correct" answer. Their scoring rubrics show correct, partial (sometimes), and incorrect answers. And it doesn't get any more subtle than that. I think what they mean by "real world" is that their problems are contextualized; they are simply set in the real world. But this is not a fundamental change in the way PISA developers think about learning. Theirs is still a model that is primarily about the ability to get right answers.

How knowledge is applied

Let's go back to the story about the woman and the panther. After they read the story, test-takers are asked to respond to a series of multiple choice and written response questions. In one written response question they are asked, "What does the story suggest was the woman’s reason for feeding the panther?"

The scoring rubric presents a selection of potential correct answers and a set of wrong answers. (No partially correct answers here.) It's pretty clear that when PISA developers ask “how well” students' knowledge is applied, they're talking about whether or not students can provide a correct answer. That's not surprising, given what we've observed so far. What's new and troubling here is that all "correct" answers are treated as though they are equivalent. Take a look at the list of choices. Do they look equally sophisticated to you?

  •  She felt sorry for it.
  • Because she knew what it felt like to be hungry.
  • Because she’s a compassionate person.
  • To help it live. (p. 77)

“She felt sorry for it.” is considered to be just as correct as “She is a compassionate person.” But we know the ideas expressed in these two statements are not equivalent. The idea of feeling sorry for can be expressed by children as early as phase 08b (6- to 7-year-olds). The idea of compassion (as sympathy) does not appear until level 10b. And the idea of being a compassionate person does not appear until 10c—even when the concept of compassion is being explicitly taught. Given that this is a test of comprehension—defined by PISA's developers in terms of understanding and interpretation—doesn't the student who writes, "She is a compassionate person," deserve credit for arriving at a more sophisticated interpretation?

I'm not claiming that students can't learn the word compassion earlier than level 10b. And I'm certainly not claiming that there is enough evidence in students' responses to the prompt in this assessment to determine if an individual who wrote "She felt sorry for it." meant something different from an individual who wrote, "She's a compassionate person." What I am arguing is that what students mean is more important than whether or not they get a right answer. A student who has constructed the notion of compassion as sympathy is expressing a more sophisticated understanding of the story than a student who can't go further than saying the protagonist felt sorry for the panther. When we, at Lectica, talk about how well knowledge is applied, we mean, “At what level does this child appear to understand the concepts she’s working with and how they relate to one another?” 

What is reading comprehension?

All of these observations lead me back to the question, "What is reading comprehension?" PISA developers define reading comprehension in terms of understanding and interpretation, and Lectical assessments measure the sophistications of students' understanding and interpretation. It looks like our definitions are at least very similar.

We think the problem is not in the definition, but in the operationalization. PISAs items measure proxies for comprehension, not comprehension itself. Getting beyond proxies requires three ingredients.

  • First, we have to ask students to show us how they're thinking. This means asking for verbal responses that include both judgments and justifications for those judgments. 
  • Second, the questions we ask need to be more open-ended. Life is rarely about finding right answers. It's about finding increasingly adequate answers. We need to prepare students for that reality. 
  • Third, we need to engage in the careful, painstaking study of how students construct meanings over time.

This third requirement is such an ambitious undertaking that many scholars don't believe it's possible. But we've not only demonstrated that it's possible, we're doing it every day. We call the product of this work the Lectical™ Dictionary. It's the first curated developmental taxonomy of meanings. You can think of it as a developmental dictionary. Aside from making it possible to create direct tests of student understanding, the Lectical Dictionary makes it easy to describe how ideas evolve over time. We can not only tell people what their scores mean, but also what they're most likely to benefit from learning next. If you're wondering what that means in practice, check out our demo.

Please follow and like us:

Interpreting CLAS Demo reports

What the CLAS demo measures

The CLAS demo assessment (the LRJA) is a measure of the developmental level of people's reasoning about knowledge, evidence, deliberation, and conflict. People who score higher on this scale are able to work effectively with increasingly complex information and solve increasingly complex problems. 

CLAS is the name of our scoring system—the Computerized Lectical Assessment System. It measures the developmental level (hierarchical complexity) of responses on a scale called the Lectical Scale (also called the skill scale). 

It does not measure:

  • your use of particular vocabulary
  • writing mechanics (spelling, punctuation, capitalization)
  • coherence (quality of logic or argument)
  • relevance
  • correctness (measured by most standardized tests) 

These dimensions of performance are related to Lectical Level, but they are not the same thing. 

The reliability of the CLAS score

The Lectical Scores on CLAS demo assessments are awarded with our electronic scoring system, CLAS.

  • CLAS scores agree with human scores within 1/5 of a level about 90% of the time. That's the same level of agreement we expect between human raters. This level of agreement is more than acceptable for formative classroom use and program evaluation. It is not good enough for making high stakes decisions.
  • We don't recommend making high stakes decisions based on the results of any one assessment. Performance over time (growth trajectory) is much more reliable than an individual score.
  • CLAS is not as well calibrated above 11.5 as it is at lower levels. This is because there are fewer people in our database who perform at the highest levels. As our database grows, CLAS will get better at scoring those performances.

Benchmarks

You can find benchmarks for childhood and adulthood in our article, Lectical levels, roles, and educational level.

The figure below shows growth curves for four different kinds of K-12 schools in our database. If you want to see how an individual student's growth relates to this graph, we suggest taking at least three assessments over the course of a year or more. (The top performing school "Rainbow," is the Rainbow Community School, in North Carolina.)

 

Please follow and like us:

Lectica basics for schools

If you are a school leader, this post is for you. Here, you'll find information about Lectica, it's mission, and our first electronically scored Lectical Assessment—the LRJA.

Background

Lectica, Inc. is a 501(c)(3) charitable corporation. It's mission is to build and deliver learning tools that help students build skills for thinking and learning. These learning tools are backed by a strong learning model—the Virtuous Cycle of Learning (VCoL+7™)—and a comprehensive vision for educational testing and learning, which you can learn more about in our white paper—Virtuous cycles of learning: Redesigning testing during the digital revolution

We have spent over 20 years developing our methods and the technology required to deliver our learning tools—known as Lectical™ Assessments or DiscoTests®—at scale. These assessments are backed by a large body of research, including ongoing investigations of their validity and reliability. Here are some links to research reports:

The following video provides an overview our research and mission:

Current offerings

In the fall of 2016, we introduced our first electronically scored Lectical Assessment—the LRJA (an assessment of reflective judgment/critical thinking). The LRJA can be used in research and program evaluation as a summative assessment, or in the classroom as a formative assessment—or both.

The best way to learn about the LRJA is to experience it first-hand at lecticalive. Just click on this link, then select the "go straight to the demo" button. On the next page, fill in the sign up form with the educational level of your choice. Click "submit", then, click on the "autofill" button (top right, under the header) to fill the responses form with an example. 

If you're interested in working with the LRJA or would like to learn more about using Lectical Assessments to optimize thinking and learning, please contact us.

Please follow and like us: