The complexity of national leaders’ thinking: How does it measure up?

Special thanks to my Australian colleague, Aiden Thornton, for his editorial and research assistance.

This is the first in a series of articles on the complexity of national leaders’ thinking. These articles will report results from research conducted with CLAS, our newly validated electronic developmental scoring system. CLAS will be used to score these leaders’ responses to questions posed by prominent journalists.

In this first article, I’ll be providing some of the context for this project, including information about how my colleagues and I think about complexity and its role in leadership. I’ve embedded lots of links to additional material for readers who have questions about our 100+ year-old research tradition, Lectica’s (the nonprofit that owns me) assessments, and other research we’ve conducted with these assessments.

Context and research questions

Lectica creates diagnostic assessments for learning that support the development of mental skills required for working with complexity. We make these learning tools for both adults and children. Our K-12 initiative—the DiscoTest Initiative—is dedicated to bringing these tools to individual K-12 teachers everywhere, free of charge. Its adult assessments are used by organizations in recruitment and training, and by colleges and universities in admissions and program evaluation.

All Lectical Assessments measure the complexity level (aka, level of vertical development) of people’s thinking in particular knowledge areas. A complexity level score on a Lectical Assessment tells us the highest level of complexity—in a problem, issue, or task—an individual is likely to be able to work with effectively.

On several occasions over the last 20 years, my colleagues and I have been asked to evaluate the complexity of national leaders’ reasoning skills. Our response has been, “We will, but only when we can score electronically—without the risk of human bias.” That time has come. Now that our electronic developmental scoring system, CLAS, has demonstrated a level of reliability and precision that is acceptable for this purpose, we’re ready to take a look.​

Evaluating the complexity of national leaders’ thinking is a challenging task for several reasons. First, it’s virtually impossible to find examples of many of these leaders’ “best work.” Their speeches are generally written for them, and speech writers usually try to keep the complexity level of these speeches low, aiming for a reading level in the 7th to 9th grade range. (Reading level is not the same thing as complexity level, but like most tests of capability, it correlates moderately with complexity level.) Second, even when national leaders respond to unscripted questions from journalists, they work hard to use language that is accessible to a wide audience. And finally, it’s difficult to identify a level playing field—one in which all leaders have the same opportunity to demonstrate the complexity of their thinking.

Given these obstacles, there’s no point in attempting to evaluate the actual thinking capabilities of national leaders. In other words, we won’t be claiming that the scores awarded by CLAS represent the true complexity level of leaders’ thinking. Instead, we will address the following questions:

  1. When asked by prominent journalists to explain their positions on complex issues, what is the average complexity level of national leaders’ responses?
  2. How does the complexity level of national leaders’ responses relate to the complexity of the issues they discuss?

Thinking complexity and leader success

At this point, you may be wondering, “What is thinking complexity and why is it important?” A comprehensive response to this question isn’t possible in a short article like this one, but I can provide a basic description of complexity as we see it at Lectica, and provide some examples that highlight its importance.

All issues faced by leaders are associated with a certain amount of built-in complexity. For example:

  1. The sheer number of factors/stakeholders that must be taken into account.
  2. Short and long-term implications/repercussions. (Will a quick fix cause problems downstream, such as global unrest or catastrophic weather?)
  3. The number and diversity of stakeholders/interest groups. (What is the best way to balance the needs of individuals, families, businesses, communities, states, nations, and the world?)
  4. The length of time it will take to implement a decision. (Will it take months, years, decades? Longer projects are inherently more complex because of changes over time.)
  5. Formal and informal rules/laws that place limits on the deliberative process. (For example, legislative and judicial processes are often designed to limit the decision making powers of presidents or prime ministers. This means that leaders must work across systems to develop decisions, which further increases the complexity of decision making.)

Over the course of childhood and adulthood, the complexity of our thinking develops through up to 13 skill levels (0–12). Each new level builds upon the previous level. The figure above shows four adult complexity “zones” — advanced linear thinking (second zone of level 10), early systems thinking (first zone of level 11), advanced systems thinking (second zone of level 11), early principles thinking (first zone of level 12). In advanced linear thinking, reasoning is often characterized as “black and white.” Individuals performing in this zone cope best with problems that have clear right or wrong answers. It is only once individuals enter early systems thinking, that we begin to work effectively with highly complex problems that do not have clear right or wrong answers.

Leadership at the national level requires exceptional skills for managing complexity, including the ability to deal with the most complex problems faced by humanity (Helbing, 2013). Needless to say, a national leader regularly faces issues at or above early principles thinking.

Complexity level and leadership—the evidence

In the workplace, the hiring managers who decide which individuals will be put in leadership roles are likely to choose leaders whose thinking complexity is a good match for their roles. Even if they have never heard the term complexity level, hiring managers generally understand, at least implicitly, that leaders who can work with the complexity inherent in the issues associated with their roles are likely to make better decisions than leaders whose thinking is less complex.

There is a strong relation between the complexity of leadership roles and the complexity level of leaders’ reasoning. In general, more complex thinkers fill more complex roles. The figure below shows how lower and senior level leaders’ complexity scores are distributed in Lectica’s database. Most senior leaders’ complexity scores are in or above advanced systems thinking, while those of lower level leaders are primarily in early systems thinking.

The strong relation between the complexity of leaders’ thinking and the complexity of their roles can also be seen in the recruitment literature. To be clear, complexity is not the only aspect of leadership decision making that affects leaders’ ability to deal effectively with complex issues. However, a large body of research, spanning over 50 years, suggests that the top predictors of workplace leader recruitment success are those that most strongly relate to thinking skills, including complexity level.

The figure below shows the predictive power of several forms of assessment employed in making hiring and promotion decisions. The cognitive assessments have been shown to have the highest predictive power. In other words, assessments of thinking skills do a better job predicting which candidates will be successful in a given role than other forms of assessment.

Predictive power graph

The match between the complexity of national leaders’ thinking and the complexity level of the problems faced in their roles is important. While we will not be able to assess the actual complexity level of the thinking of national leaders, we will be able to examine the complexity of their responses to questions posed by prominent journalists. In upcoming articles, we’ll be sharing our findings and discussing their implications.

Coming next…

In the second article in this series, we begin our examination of the complexity of national leaders’ thinking by scoring interview responses from four US Presidents—Bill Clinton, George W. Bush, Barack Obama, and Donald Trump.

 


Appendix

Predictive validity of various types of assessments used in recruitment

The following table shows average predictive validities for various forms of assessment used in recruitment contexts. The column “variance explained” is an indicator of how much of a role a particular form of assessment plays in predicting performance—it’s predictive power.

Form of assessment Source Predictive validity Variance explained  Variance explained (with GMA)
Complexity of workplace reasoning (Dawson & Stein, 2004; Stein, Dawson, Van Rossum, Hill, & Rothaizer, 2003) .53 28% n/a
Aptitude (General Mental Ability, GMA) (Hunter, 1980; Schmidt & Hunter, 1998) .51 26% n/a
Work sample tests (Hunter & Hunter, 1984; Schmidt & Hunter, 1998) .54 29% 40%
Integrity (Ones, Viswesvaran, and Schmidt, 1993; Schmidt & Hunter, 1998) .41 17% 42%
Conscientiousness (Barrick & Mount, 1995; Schmidt & Hunter, 1998). .31 10% 36%
Employment interviews (structured) (McDaniel, Whetzel, Schmidt, and Mauer, 1994; Schmidt & Hunter, 1998) .51 26% 39%
Employment interviews (unstructured) (McDaniel, Whetzel, Schmidt, and Mauer, 1994 Schmidt & Hunter, 1998) .38 14% 30%
Job knowledge tests (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .48 23% 33%
Job tryout procedure (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .44 19% 33%
Peer ratings (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .49 24% 33%
Training & experience: behavioral consistency method (McDaniel, Schmidt, and Hunter, 1988a, 1988b; Schmidt & Hunter, 1998; Schmidt, Ones, and Hunter, 1992) .45 20% 33%
Reference checks (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .26 7% 32%
Job experience (years) Hunter, 1980); McDaniel, Schmidt, and Hunter, 1988b; Schmidt & Hunter, 1998) .18 3% 29%
Biographical data measures Supervisory Profile Record Biodata Scale (Rothstein, Schmidt, Erwin, Owens, and Sparks, 1990; Schmidt & Hunter, 1998) .35 12% 27%
Assessment centers (Gaugler, Rosenthal, Thornton, and Benson, 1987; Schmidt & Hunter, 1998; Becker, Höft, Holzenkamp, & Spinath, 2011) Note: Arthur, Day, McNelly, & Edens (2003) found a predictive validity of .45 for assessment centers that included mental skills assessments. .37 14% 28%
EQ (Zeidner, Matthews, & Roberts, 2004) .24 6% n/a
360 assessments Beehr, Ivanitskaya, Hansen, Erofeev, & Gudanowski, 2001 .24 6% n/a
Training &  experience: point method (McDaniel, Schmidt, and Hunter, 1988a; Schmidt & Hunter, 1998) .11 1% 27%
Years of education (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .10 1% 27%
Interests (Schmidt & Hunter, 1998) .10 1% 27%

References

Arthur, W., Day, E. A., McNelly, T. A., & Edens, P. S. (2003). A meta‐analysis of the criterion‐related validity of assessment center dimensions. Personnel Psychology, 56(1), 125-153.

Becker, N., Höft, S., Holzenkamp, M., & Spinath, F. M. (2011). The Predictive Validity of Assessment Centers in German-Speaking Regions. Journal of Personnel Psychology, 10(2), 61-69.

Beehr, T. A., Ivanitskaya, L., Hansen, C. P., Erofeev, D., & Gudanowski, D. M. (2001). Evaluation of 360 degree feedback ratings: relationships with each other and with performance and selection predictors. Journal of Organizational Behavior, 22(7), 775-788.

Dawson, T. L., & Stein, Z. (2004). National Leadership Study results. Prepared for the U.S. Intelligence Community.

Dawson, T. L. (2017, October 20). Using technology to advance understanding: The calibration of CLAS, an electronic developmental scoring system. Proceedings from Annual Conference of the Northeastern Educational Research Association, Trumbull, CT.

Dawson, T. L., & Thornton, A. M. A. (2017, October 18). An examination of the relationship between argumentation quality and students’ growth trajectories. Proceedings from Annual Conference of the Northeastern Educational Research Association, Trumbull, CT.

Gaugler, B. B., Rosenthal, D. B., Thornton, G. C., & Bentson, C. (1987). Meta-analysis of assessment center validity. Journal of Applied Psychology, 72(3), 493-511.

Helbing, D. (2013). Globally networked risks and how to respond. Nature, 497, 51-59.

Hunter, J. E., & Hunter, R. F. (1984). The validity and utility of alternative predictors of job performance. Psychological Bulletin, 96, 72-98.

Hunter, J. E., Schmidt, F. L., & Judiesch, M. K. (1990). Individual differences in output variability as a function of job complexity. Journal of Applied Psychology, 75, 28-42.

Johnson, J. (2001). Toward a better understanding of the relationship between personality and individual job performance. In M. R. R. Barrick, Murray R. (Ed.), Personality and work: Reconsidering the role of personality in organizations (pp. 83-120).

McDaniel, M. A., Schmidt, F. L., & Hunter, J., E. (1988a). A Meta-analysis of the validity of training and experience ratings in personnel selection. Personnel Psychology, 41(2), 283-309.

McDaniel, M. A., Schmidt, F. L., & Hunter, J., E. (1988b). Job experience correlates of job performance. Journal of Applied Psychology, 73, 327-330.

McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). Validity of employment interviews. Journal of Applied Psychology, 79, 599-616.

Rothstein, H. R., Schmidt, F. L., Erwin, F. W., Owens, W. A., & Sparks, C. P. (1990). Biographical data in employment selection: Can validities be made generalizable? Journal of Applied Psychology, 75, 175-184.

Stein, Z., Dawson, T., Van Rossum, Z., Hill, S., & Rothaizer, S. (2013, July). Virtuous cycles of learning: using formative, embedded, and diagnostic developmental assessments in a large-scale leadership program. Proceedings from ITC, Berkeley, CA.

Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A meta-analytic review. Personnel Psychology, 44, 703-742.

Please follow and like us:

Dear Sir Ken Robinson

 

This morning, I received a newsletter from Sir Ken Robinson, a popular motivational speaker who focuses on education. There was a return email address, so I wrote to him. Here's what I wrote:

Dear Sir Ken,

"I love your message. I'm one of the worker bees who's trying to leverage the kind of changes you envision.

After 20+ years of hard work, my colleagues and I have reinvented educational assessment. No multiple choice. No high stakes. Our focus is on assessment for learning—supporting students in learning joyfully and deeply in a way that facilitates skills for learning, thinking, inquiring, relating and otherwise navigating a complex world. Our assessments are scalable and standardized, but they do not homogenize. They are grounded in a deep study of the many pathways through which students learn key skills and concepts. We're documenting, in exquisite (some would say insane) detail, how concepts and skills develop over time so we can gain insight into learners' knowledge networks. We don't ask about correctness. We ask about understanding and competence and how they develop over time. And we help teachers meet students "where they're at."

We've accumulated a strong base of evidence to support these claims. But now that we're ready to scale, we're running up against hostility toward all standardized assessment. It's difficult to get to the point where we can even have a conversation with our pedagogical allies. Ouch!

Lectica is organized as a nonprofit so we can guarantee that the underprivileged are served first. We plan to offer subscriptions to our assessments (learning tools) without charge to individual teachers everywhere. 

We've kept our heads down as we've developed our methods and technology. Now we're scaling and want to be seen. We know we're part of the solution to today's educational crisis—perhaps a very big part of the solution. I'm hoping you'd like to learn more."

My email was returned with this message: "The email account that you tried to reach does not exist." How frustrating.

So, I thought I'd pen this post and ask my friends and colleagues to help me get access to Sir Ken's ear. If you know him, please forward this message. I'm certain he'll be interested in what we're doing for learning and development. Where are you Sir Ken Robinson? Can you hear me? Are you out there? 

Please follow and like us:

World Economic Forum—tomorrow’s skills

The top 10 workplace skills of the future.

Sources: Future of Jobs Report, WEF 2017

In a recent blog post—actually in several recent blog posts—I've been emphasizing the importance of building tomorrow's skills. These are the kinds of skills we all need to navigate our increasingly complex and changing world. While I may not agree that all of the top 10 skills listed in the World Economic Forum report (shown above) belong in a list of skills (Creativity is much more than a skill, and service orientation is more of a disposition than a skill.) the flavor of this list is generally in sync with the kinds of skills, dispositions, and behaviors required in a complex and rapidly changing world.

The "skills" in this list cannot be…

  • developed in learning environments focused primarily on correctness or in workplace environments that don't allow for mistakes; or
  • measured with ratings on surveys or on tests of people's ability to provide correct answers.

These "skills" are best developed through cycles of goal setting, information gathering, application, and reflection—what we call virtuous cycles of learning—or VCoLs. And they're best assessed with tests that focus on applications of skill in real-world contexts, like Lectical Assessments, which are based on a rich research tradition focused on the development of understanding and skill.

 

Please follow and like us:

From Piaget to Dawson: The evolution of adult developmental metrics

I've just added a new video about the evolution of adult developmental metrics to YouTube and LecticaLive. It traces the evolutionary history of Lectica's developmental model and metric.

If you are curious about the origins of our work, this video is a great place to start. If you'd like to see the reference list for this video, view it on LecticaLive.

 

 

Please follow and like us:

Why measure complexity?

“Why measure growth in complexity level?”* That’s the question a new acquaintance asked me recently, and it took me by surprise.

One lesson I’ve had to learn again and again is that the only way to escape the boundaries of my own perspective is to listen hard to the perspectives of others. These perspectives are often reflected in the questions people ask.

“Why measure growth?” I didn’t realize this question needed an answer. Like other questions that have surprised me, this one relates to something I take for granted — one of the fundamental assumptions that underlie my research.

Why we do it

We measure growth for three primary reasons — to make it visible, to learn how people learn, and to customize learning.

  1. By measuring growth, we make it visible. Being able to see evidence of our own growth motivates us to grow further. Contrast this with receiving grades or conventional test scores. They compare you with other people. If you get good grades, you may be motivated to strive for even better grades. But if you consistently get poor grades, you’re more likely to feel like you’re being punished for your learning efforts. By measuring growth, we provide positive motivation for every learner.
  2. A measure of growth helps us understand how people learn. At Lectica, we’re constantly asking how particular growth scores relate to specific skills and knowledge, and how current skills and knowledge relate to the skills and knowledge we observe at the next level. In other words, we’re systematically and continually documenting the development of knowledge and skills so we can answer questions like, “Is there an optimal way for people to learn this skill?”
  3. A measure of growth allows us to determine what a learner is most likely to benefit from learning next. This is important, because when we get the difficulty of the next learning challenge just right (not to easy and not too hard), we activate the brain’s inborn motivational system. This not only increases motivation in the moment, but supports a lifelong love of learning while increasing the rate of development.

These aren’t the only reasons for measuring growth in complexity level, but they are the reasons at the core of our work. Measures of complexity level can also be used to help match people to roles, as in recruitment, or deciding which political candidate is best equipped to handle the complexity of a particular office.

For a more detailed explanation of my reasons for focusing on growth, see, How I was seduced into trying to fix education.


*When we talk about measuring growth, we don't mean the ability to get more items right on a multiple choice test. We measure developmental growth—growth in the level skill with which people apply their knowledge in complex real-world contexts.

If you found this article helpful, you may also like: What PISA measures. What we measure.

 

Please follow and like us:

Lectica’s story: long, rewarding, & still unfolding


Lectica's story started in Toronto in 1976…

Identifying the problem

During the 70s and 80s I practiced midwifery. It was a great honor to be present at the births of over 500 babies, and in many cases, follow them into childhood. Every single one of those babies was a joyful, driven, and effective "every moment" learner. Regardless of difficulty and pain they all learned to walk, talk, interact with others, and manipulate many aspects of their environment. They needed few external rewards to build these skills—the excitement and suspense of striving seemed to be reward enough. I felt like I was observing the "life force" in action.

Unfortunately as many of these children approached the third grade (age 8), I noticed something else—something deeply troubling. Many of the same children seemed to have lost much of this intrinsic drive to learn. For them, learning had become a chore motivated primarily by extrinsic rewards and punishments. Because this was happening primarily to children attending conventional schools (Children receiving alternative instruction seemed to be exempt.) it appeared that something about schooling was depriving many children of the fundamental human drive required to support a lifetime of learning and development—a drive that looked to me like a key source of happiness and fulfillment.

Understanding the problem

Following upon my midwifery career, I flirted briefly with a career in advertising, but by the early 90's I was back in school—in a Ph.D. program in U. C. Berkeley's Graduate School of Education—where I found myself observing the same pattern I'd observed as a midwife. Both the research and my own lab experience exposed the early loss of students' natural love of learning. My concern was only increased by the newly emerging trend toward high stakes multiple choice testing, which my colleagues and I saw as a further threat to children's natural drive to learn.

Most of the people I've spoken to about this problem have agreed that it's a shame, but few have seen it as a problem that can be solved, and many have seen it as an inevitable consequence of either mass schooling or simple maturation. But I knew it was not inevitable. Children and those educated in a range of alternative environments did not appear to lose their drive to learn. Additionally, above average students in conventional schools appeared to be more likely to retain their love of learning.

I set out to find out why—and ended up on a long journey toward a solution.

How learning works

First, I needed to understand how learning works. At Berkeley, I studied a wide variety of learning theories in several disciplines, including developmental theories, behavioral theories, and brain-based theories. I collected a large database of longitudinal interviews and submitted them to in-depth analysis, looked closely at the relation between testing and learning, and studied psychological measurement, all in the interest of finding a way to support childrens' growth while reinforcing their love of learning.

My dissertation—which won awards from both U.C. Berkeley and the American Psychological Association—focused on the development of people's conceptions of learning from age 5 through 85, and how this kind of knowledge could be used to measure and support learning. In 1998, I received $500,000 from the Spencer Foundation to further develop the methods designed for this research. Some of my areas of expertise are human learning and development, psychometrics, metacognition, moral education, and research methods.

In the simplest possible terms, what I learned in 5 years of graduate school is that the human brain is designed to drive learning, and that preserving that natural drive requires 5 ingredients:

  1. a safe environment that is rich in learning opportunities and healthy human interaction,
  2. a teacher who understands each child's interests and level of tolerance for failure,
  3. a mechanism for determining "what comes next"—what is just challenging enough to allow for success most of the time (but not all of the time),
  4. instant actionable feedback, and 
  5. the opportunity to integrate new knowledge or skills into each learner's existing knowledge network well enough to make it useable before pushing instruction to the next level. (We call this building a "robust knowledge network"—the essential foundation for future learning.)*

Identifying the solution

Once we understood what learning should look like, we needed to decide where to intervene. The answer, when it came, was a complete surprise. Understanding what comes next—something that can only be learned by measuring what a student understands now—was an integral part of the recipe for learning. This meant that testing—which we originally saw as an obstacle to robust learning—was actually the solution—but only if we could build tests that would free students to learn the way their brains are designed to learn. These tests would have to help teachers determine "what comes next" (ingredient 3) and provide instant actionable feedback (ingredient 4), while rewarding them for helping students build robust knowledge networks (ingredient 5).

Unfortunately, conventional standardized tests were focused on "correctness" rather than robust learning, and none of them were based on the study of how targeted concepts and skills develop over time. Moreover, they were designed not to support learning, but rather to make decisions about advancement or placement, based on how many correct answers students were able to provide relative to other students. Because this form of testing did not meet the requirements of our learning recipe, we'd have to start from scratch.

Developing the solution

We knew that our solution—reinventing educational testing to serve robust learning—would require many years of research. In fact, we would be committing to possible decades of effort without a guaranteed result. It was the vision of a future educational system in which all children retained their inborn drive for learning that ultimately compelled us to move forward. 

To reinvent educational testing, we needed to:

  1. make a deep study of precisely how children build particular knowledge and skills over time in a wide range of subject areas (so these tests could accurately identify "what comes next");
  2. make tests that determine how deeply students understand what they have learned—how well they can use it to address real-world issues or problems (requires that students show how they are thinking, not just what they know—which means written responses with explanations); and
  3. produce formative feedback and resources designed to foster "robust learning" (build robust knowledge networks).

Here's what we had to invent:

  1. A learning ruler (building on Commons [1998] and Fischer [2006]);
  2. A method for studying how students learn tested concepts and skills (refining the methods developed for my dissertation);
  3. A human scoring system for determining the level of understanding exhibited in students' written explanations (building upon Commons' and Fischer's methods, refining them until measurements were precise enough for use in educational contexts); and 
  4. An electronic scoring system, so feedback and resources could be delivered in real time.

It took over 20 years (1996–2016), but we did it! And while we were doing it, we conducted research. In fact, our assessments have been used in dozens of research projects, including a 25 million dollar study of literacy conducted at Harvard, and numerous Ph.D. dissertations—with more on the way.

What we've learned

We've learned many things from this research. Here are some that took us by surprise:

  1. Students in schools that focus on building deep understanding graduate seniors that are up to 5 years ahead (on our learning ruler) of students in schools that focus on correctness (2.5 to 3 years after taking socioeconomic status into account).
  2. Students in schools that foster robust learning develop faster and continue to develop longer (into adulthood) than students in schools that focus on correctness.
  3. On average, students in schools that foster robust learning produce more coherent and persuasive arguments than students in schools that focus on correctness.
  4. On average, students in our inner-city schools, which are the schools most focused on correctness, stop developing (on our learning ruler) in grade 10. 
  5. The average student who graduates from a school that strongly focuses on correctness is likely, in adulthood, to (1) be unable to grasp the complexity and ambiguity of many common situations and problems, (2) lack the mental agility to adapt to changes in society and the workplace, and (3) dislike learning. 

From our perspective, these results point to an educational crisis that can best be addressed by allowing students to learn as their brains were designed to learn. Practically speaking, this means providing learners, parents, teachers, and schools with metrics that reward and support teaching that fosters robust learning. 

Where we are today

Lectica has created the only metrics that meet all of these requirements. Our mission is to foster greater individual happiness and fulfillment while preparing students to meet 21st century challenges. We do this by creating and delivering learning tools that encourage students to learn the way their brains were designed to learn. And we ensure that students who need our learning tools the most get them first by providing free subscriptions to individual teachers everywhere.

To realize our mission, we organized as a nonprofit. We knew this choice would slow our progress (relative to organizing as a for-profit and welcoming investors), but it was the only way to guarantee that our true mission would not be derailed by other interests.

Thus far, we've funded ourselves with work in the for-profit sector and income from grants. Our background research is rich, our methods are well-established, and our technology works even better than we thought it would. Last fall, we completed a demonstration of our electronic scoring system, CLAS, a novel technology that learns from every single assessment taken in our system. 

The groundwork has been laid, and we're ready to scale. All we need is the platform that will deliver the assessments (called DiscoTests), several of which are already in production.

After 20 years of high stakes testing, students and teachers need our solution more than ever. We feel compelled to scale a quickly as possible, so we can begin the process of reinvigorating today's students' natural love of learning, and ensure that the next generation of students never loses theirs. Lectica's story isn't finished. Instead, we find ourselves on the cusp of a new beginning! 

Please consider making a donation today.

 


A final note: There are many benefits associated with our approach to assessment that were not mentioned here. For example, because the assessment scores are all calibrated to the same learning ruler, students, teachers, and parents can easily track student growth. Even better, our assessments are designed to be taken frequently and to be embedded in low-stakes contexts. For grading purposes, teachers are encouraged to focus on growth over time rather than specific test scores. This way of using assessments pretty much eliminates concerns about cheating. And finally, the electronic scoring system we developed is backed by the world's first "taxonomy of learning," which also serves many other educational and research functions. It's already spawned a developmentally sensitive spell-checker! One day, this taxonomy of learning will be robust enough to empower teachers to create their own formative assessments on the fly. 

 


*This is the ingredient that's missing from current adaptive learning technologies.

 

Please follow and like us:

Adaptive learning. Are we there yet?

Adaptive learning technologies are touted as an advance in education and a harbinger of what's to come. But although we at Lectica agree that adaptive learning has a great deal to offer, we have some concerns about its current limitations. In an earlier article, I raised the question of how well one of these platforms, Knewton, serves "robust learning"—the kind of learning that leads to deep understanding and usable knowledge. Here are some more general observations.

The great strength of adaptive learning technologies is that they allow students to learn at their own pace. That's big. It's quite enough to be excited about, even if it changes nothing else about how people learn. But in our excitement about this advance, the educational community is in danger of ignoring important shortcomings of these technologies.

First, adaptive learning technologies are built on adaptive testing technologies. Today, these testing technologies are focused on "correctness." Students are moved to the next level of difficulty based on their ability to get correct answers. This is what today's testing technologies measure best. However, although being able to produce or select correct answers is important, it is not an adequate indication of understanding. And without real understanding, knowledge is not usable and can't be built upon effectively over the long term.

Second, today's adaptive learning technologies are focused on a narrow range of content—the kind of content psychometricians know how to build tests for—mostly math and science (with an awkward nod to literacy). In public education during the last 20 years, we've experienced a gradual narrowing of the curriculum, largely because of high stakes testing and its narrow focus. Today's adaptive learning technologies suffer from the same limitations and are likely to reinforce this trend.

Third, the success of adaptive learning technologies is measured with standardized tests of correctness. Higher scores will help more students get into college—after all, colleges use these tests to decide who will be admitted. But we have no idea how well higher scores on these tests translate into life success. Efforts to demonstrate the relevance of educational practices are few and far between. And notably, there are many examples of highly successful individuals who were poor players in the education game—including several of the worlds' most productive and influential people.

Fourth, some proponents of online adaptive learning believe that it can and should replace (or marginalize) teachers and classrooms. This is concerning. Education is more than a process of accumulating facts. For one thing, it plays an enormous role in socialization. Good teachers and classrooms offer students opportunities to build knowledge while learning how to engage and work with diverse others. Great teachers catalyze optimal learning and engagement by leveraging students' interests, knowledge, skills, and dispositions. They also encourage students to put what they're learning to work in everyday life—both on their own and in collaboration with others.

Lectica has a strong interest in adaptive learning and the technologies that deliver it. We anticipate that over the next few years, our assessment technology will be integrated into adaptive learning platforms to help expand their subject matter and ensure that students are building robust, usable knowledge. We will also be working hard to ensure that these platforms are part of a well-thought out, evidence-based approach to education—one that fosters the development of tomorrow's skills—the full range of skills and knowledge required for success in a complex and rapidly changing world.

Please follow and like us:

If you want students to develop faster, stop trying to speed up learning

During the last 20 years—since high stakes testing began to take hold—public school curricula have undergone a massive transformation. Standards have pushed material that was once taught in high school down into the 3rd and 4th grade, and the amount of content teachers are expected to cover each year has increased steadily. The theory behind this trend appears to be that learning more content and learning it earlier will help students develop faster.

But is this true? Is there any evidence at all that learning more content and learning it earlier produces more rapid development? If so, I haven't seen it.

In fact, our evidence points to the opposite conclusion. Learning more and learning it earlier may actually be interfering with the development of critical life skills—like those required for making good decisions in real-life contexts. As the graph below makes clear, students in schools that emphasize covering required content do not develop as rapidly as students in schools that focus on fostering deep understanding—even though learning for understanding generally takes more time than learning something well enough to "pass the test."

What is worse, we're finding that the average student in schools with the greatest emphasis on covering required content appears to stop developing by the end of grade 10, with an average score of 10.1. This is the same score received by the average 6th grader in schools with the greatest emphasis on fostering deep understanding.

The graphs in this post are based on data from 17,755 LRJA assessments. The LRJA asks test-takers to respond to a complex real-life dilemma. They are prompted to explore questions about:

  • finding, creating, and evaluating information and evidence,
  • perspectives, persuasion, and conflict resolution,
  • when and if it's possible to be certain, and
  • the nature of facts, truth, and reality.

Students were in grades 4-12, and attended one or more of 56 schools in the United States and Canada.

The graphs shown above represent two groups of schools—those with students who received the highest scores on the LRJA and those with students who received the lowest scores. These schools differed from one another in two other ways. First, the highest performing schools were all private schools*. Most students in these schools came from upper middle SES (socio-economic status) homes. The lowest performing schools were all public schools primarily serving low SES inner city students.

The second way in which these schools differed was in the design of their curricula. The highest performing schools featured integrated curricula with a great deal of practice-based learning and a heavy emphasis on fostering understanding and real-world competence. All of the lowest performing schools featured standards-focused curricula with a strong emphasis on learning the facts, formulas, procedures, vocabulary, and rules targeted by state tests.

Based on the results of conventional standardized tests, we expected most of the differences between student performances on the LRJA in these two groups of schools to be explained by SES. But this was not the case. Private schools with more conventional curricula and high performing public schools serving middle and upper middle SES families did indeed outperform the low SES schools, but as shown in the graph below, by grade 12, their students were still about 2.5 years behind students in the highest performing schools. At best, SES explains only about 1/2 of the difference between the best and worst schools in our database. (For more on this, see the post, "Does a focus on deep understanding accelerate growth?")

By the way, the conventional standardized test scores of students in this middle group, despite their greater emphasis on covering content, were no better than the conventional standardized test scores of students in the high performing group. Focusing on deep understanding appears to help students develop faster without interfering with their ability to learn required content.

This will not be our last word on the subject. As we scale our K-12 assessments, we'll be able to paint an increasingly clear picture of the developmental impact of a variety of curricula.


Lectica's nonprofit mission is to help educators foster deep understanding and lifelong growth. We can do it with your help! Please donate now. Your donation will help us deliver our learning tools—free—to K-12 teachers everywhere.


*None of these schools pre-selected their students based on test scores. 

See a version of this article on Medium.

Please follow and like us:

Introducing LecticaFirst: Front-line to mid-level recruitment assessment—on demand

LecticaLive logo

The world's best recruitment assessments—unlimited, auto-scored, affordable, relevant, and easy

Lectical Assessments have been used to support senior and executive recruitment for over 10 years, but the expense of human scoring has prohibited their use at scale. I'm DELIGHTED to report that this is no longer the case. Because of CLAS—our electronic developmental scoring system—this fall we plan to deliver customized assessments of workplace reasoning with real time scoring. We're calling this service LecticaFirst.

LecticaFirst is a subscription service.* It allows you to administer as many LecticaFirst assessments as you'd like, any time you'd like. It's priced to make it possible for your organization to pre-screen every candidate (up through mid-level management) before you look at a single resume or call a single reference. And we've built in several upgrade options, so you can easily obtain additional information about the candidates that capture your interest.

learn more about LecticaFirst subscriptions


The current state of recruitment assessment

"Use of hiring methods with increased predictive validity leads to substantial increases in employee performance as measured in percentage increases in output, increased monetary value of output, and increased learning of job-related skills" (Hunter, Schmidt, & Judiesch, 1990).

Most conventional workplace assessments focus on one of two broad constructs—aptitude or personality. These assessments examine factors like literacy, numeracy, role-specific competencies, leadership traits, and cultural fit, and are generally delivered through interviews, multiple choice tests, or likert-style surveys. Emotional intelligence is also sometimes measured, but thus far, is not producing results that can complete with aptitude tests (Zeidner, Matthews, & Roberts, 2004). 

Like Lectical Assessments, aptitude tests are tests of mental ability (or mental skill). High-quality tests of mental ability have the highest predictive validity for recruitment purposes, hands down. Hunter and Hunter (1984), in their systematic review of the literature, found an effective range of predictive validity for aptitude tests of .45 to .54. Translated, this means that about 20% to 29% of success on the job was predicted by mental ability. These numbers do not appear to have changed appreciably since Hunter and Hunter's 1984 review.

Personality tests come in a distant second. In their meta-analysis of the literature, Teft, Jackson, and Rothstein (1991) reported an overall relation between personality and job performance of .24 (with conscientiousness as the best predictor by a wide margin). Translated, this means that only about 6% of job performance is predicted by personality traits. These numbers do not appear to have been challenged in more recent research (Johnson, 2001).

Predictive validity of various types of assessments used in recruitment

The following table shows average predictive validities for various forms of assessment used in recruitment contexts. The column "variance explained" is an indicator of how much of a role a particular form of assessment plays in predicting performance—it's predictive power. When deciding which assessments to use in recruitment, the goal is to achieve the greatest possible predictive power with the fewest assessments. That's why I've included the last column, "variance explained with GMA." It shows what happens to the variance explained when an assessment of General Mental Ability is combined with the form of assessment in a given row. The best combinations shown here are GMA and work sample tests, GMA and Integrity, and GMA and conscientiousness.

Form of assessment Source Predictive validity Variance explained  Variance explained (with GMA)
Complexity of workplace reasoning (Dawson & Stein, 2004; Stein, Dawson, Van Rossum, Hill, & Rothaizer, 2003) .53 28% n/a
Aptitude (General Mental Ability, GMA) (Hunter, 1980; Schmidt & Hunter, 1998) .51 26% n/a
Work sample tests (Hunter & Hunter, 1984; Schmidt & Hunter, 1998) .54 29% 40%
Integrity (Ones, Viswesvaran, and Schmidt, 1993; Schmidt & Hunter, 1998) .41 17% 42%
Conscientiousness (Barrick & Mount, 1995; Schmidt & Hunter, 1998). .31 10% 36%
Employment interviews (structured) (McDaniel, Whetzel, Schmidt, and Mauer, 1994; Schmidt & Hunter, 1998) .51 26% 39%
Employment interviews (unstructured) (McDaniel, Whetzel, Schmidt, and Mauer, 1994 Schmidt & Hunter, 1998) .38 14% 30%

Job knowledge tests

(Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .48 23% 33%

Job tryout procedure

(Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .44 19% 33%
Peer ratings (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .49 24% 33%

Training & experience: behavioral consistency method

(McDaniel, Schmidt, and Hunter, 1988a, 1988b; Schmidt & Hunter, 1998; Schmidt, Ones, and Hunter, 1992) .45 20% 33%
Reference checks (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .26 7% 32%
Job experience (years)

Hunter, 1980); McDaniel, Schmidt, and Hunter, 1988b; Schmidt & Hunter, 1998) 

.18 3% 29%  
Biographical data measures

Supervisory Profile Record Biodata Scale (Rothstein, Schmidt, Erwin, Owens, and Sparks, 1990; Schmidt & Hunter, 1998)

.35 12% 27%

Assessment centers

(Gaugler, Rosenthal, Thornton, and Benson, 1987; Schmidt & Hunter, 1998; Becker, Höft, Holzenkamp, & Spinath, 2011) Note: Arthur, Day, McNelly, & Edens (2003) found a predictive validity of .45 for assessment centers that included mental skills assessments. 

.37 14% 28%
EQ (Zeidner, Matthews, & Roberts, 2004) .24 6% n/a
360 assessments Beehr, Ivanitskaya, Hansen, Erofeev, & Gudanowski, 2001 .24 6% n/a
Training &  experience: point method (McDaniel, Schmidt, and Hunter, 1988a; Schmidt & Hunter, 1998) .11 1% 27%
Years of education (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .10 1% 27%
Interests (Schmidt & Hunter, 1998) .10 1% 27%

The figure below shows the predictive power information from this table in graphical form. Assessments are color coded to indicate which are focused on mental (cognitive) skills, behavior (past or present), or personality traits. It is clear that tests of mental skills stand out as the best predictors.

Predictive power graph

Why use Lectical Assessments for recruitment?

Lectical Assessments are "next generation" assessments, made possible through a novel synthesis of developmental theory, primary research, and technology. Until now multiple choice style aptitude tests have been the most affordable option for employers. But despite being more predictive than personality tests, aptitude tests still suffer from important limitations. Lectical Assessments address these limitations. For details, take a look at the side-by-side comparison of LecticaFirst tests with conventional tests, below.

Dimension LecticaFirst Aptitude
Accuracy Level of reliability (.95–.97) makes them accurate enough for high-stakes decision-making. (Interpreting reliability statistics) Varies greatly. The best aptitude tests have levels of reliability in the .95 range. Many recruitment tests have much lower levels.
Time investment Lectical Assessments are not timed. They usually take from 45–60 minutes, depending on the individual test-taker. Varies greatly. For acceptable accuracy, tests must have many items and may take hours to administer.
Objectivity Scores are objective (Computer scoring is blind to differences in sex, body weight, ethnicity, etc.) Scores on multiple choice tests are objective. Scores on interview-based tests are subject to several sources of bias.
Expense Highly competitive subscription. (From $6 – $10) per existing employee annually Varies greatly.
Fit to role: complexity Lectica employs sophisticated developmental tools and technologies to efficiently determine the relation between role requirements and the level of reasoning skill required to meet those requirements. Lectica's approach is not directly comparable to other available approaches.
Fit to role: relevance Lectical Assessments are readily customized to fit particular jobs, and are direct measures of what's most important—whether or not candidates' actual workplace reasoning skills are a good fit for a particular job. Aptitude tests measure people's ability to select correct answers to abstract problems. It is hoped that these answers will predict how good a candidate's workplace reasoning skills are likely to be.
Predictive validity In research so far: Predict advancement (R = .53**, R2 = .28), National Leadership Study. The aptitude (IQ) tests used in published research predict performance (R = .45 to .54, R2 = .20 to .29)
Cheating The written response format makes cheating virtually impossible when assessments are taken under observation, and very difficult when taken without observation. Cheating is relatively easy and rates can be quite high.
Formative value High. LecticaFirst assessments can be upgraded after hiring, then used to inform employee development plans. None. Aptitude is a fixed attribute, so there is no room for growth. 
Continuous improvement Our assessments are developed with a 21st century learning technology that allows us to continuously improve the predictive validity of Lecticafirst assessments. Conventional aptitude tests are built with a 20th century technology that does not easily lend itself to continuous improvement.

* CLAS is not yet fully calibrated for scores above 11.5 on our scale. Scores at this level are more often seen in upper- and senior-level managers and executives. For this reason, we do not recommend using lecticafirst for recruitment above mid-level management.

**The US Department of Labor’s highest category of validity, labeled “Very Beneficial” requires regression coefficients .35 or higher (R > .34).


References

Arthur, W., Day, E. A., McNelly, T. A., & Edens, P. S. (2003). A meta‐analysis of the criterion‐related validity of assessment center dimensions. Personnel Psychology, 56(1), 125-153.

Becker, N., Höft, S., Holzenkamp, M., & Spinath, F. M. (2011). The Predictive Validity of Assessment Centers in German-Speaking Regions. Journal of Personnel Psychology, 10(2), 61-69.

Beehr, T. A., Ivanitskaya, L., Hansen, C. P., Erofeev, D., & Gudanowski, D. M. (2001). Evaluation of 360 degree feedback ratings: relationships with each other and with performance and selection predictors. Journal of Organizational Behavior, 22(7), 775-788.

Dawson, T. L., & Stein, Z. (2004). National Leadership Study results. Prepared for the U.S. Intelligence Community.

Gaugler, B. B., Rosenthal, D. B., Thornton, G. C., & Bentson, C. (1987). Meta-analysis of assessment center validity. Journal of Applied Psychology, 72(3), 493-511.

Hunter, J. E., & Hunter, R. F. (1984). The validity and utility of alterna­tive predictors of job performance. Psychological Bulletin, 96, 72-98.

Hunter, J. E., Schmidt, F. L., & Judiesch, M. K. (1990). Individual differences in output variability as a function of job complexity. Journal of Applied Psychology, 75, 28-42.

Johnson, J. (2001). Toward a better understanding of the relationship between personality and individual job performance. In M. R. R. Barrick, Murray R. (Ed.), Personality and work: Reconsidering the role of personality in organizations (pp. 83-120).

Mcdaniel, M. A., Schmidt, F. L., & Hunter, J., E. (1988a). A Meta-analysis of the validity of training and experience ratings in personnel selection. Personnel Psychology, 41(2), 283-309.

Mcdaniel, M. A., Schmidt, F. L., & Hunter, J., E. (1988b). Job experience correlates of job performance. Journal of Applied Psychology, 73, 327-330.

McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). Validity of employment interviews. Journal of Applied Psychology, 79, 599-616.

Rothstein, H. R., Schmidt, F. L., Erwin, F. W., Owens, W. A., & Sparks, C. P. (1990). Biographical data in employment selection: Can validities be made generalizable? Journal of Applied Psychology, 75, 175-184.

Stein, Z., Dawson, T., Van Rossum, Z., Hill, S., & Rothaizer, S. (2013, July). Virtuous cycles of learning: using formative, embedded, and diagnostic developmental assessments in a large-scale leadership program. Proceedings from ITC, Berkeley, CA.

Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A meta-analytic review. Personnel Psychology, 44, 703-742.

Zeidner, M., Matthews, G., & Roberts, R. D. (2004). Emotional intelligence in the workplace: A critical review. Applied psychology: An International Review, 53(3), 371-399.

Please follow and like us:

Decision making & the collaboration continuum

Lectical Scale (our developmental scale). The collaboration continuum has emerged from this research.

Many people seem to think of decision making as either top-down or collaborative, and tend to prefer one over the other. But several thousand decision-making leaders have taught us that this is a false dichotomy. We’ve learned two things. First, there is no clear-cut division between autocratic and collaborative decision making—it’s a continuum. And second, both more autocratic and more collaborative decision making processes have legitimate applications.

As it applies to decision making, the collaboration continuum is a scale that runs from fully autocratic to consensus-based. We find it helpful to divide the continuum into 7 relatively distinct levels, as shown below:


Level Basis for decision Applications Limitations

LESS COLLABORATION

Fully autocratic  personal knowledge or rules, no consideration of other perspectives everyday operational decisions where there are clear rules and no apparent conflicts quick and efficient
Autocratic personal knowledge, with some consideration of others' perspectives (no perspective seeking) operational decisions in which conflicts are already well-understood and trust is high quick and efficient, but spends trust, so should be used with care
Consulting personal knowledge, with perspective-seeking to help people feel heard operational decisions in which the perspectives of well-known stakeholders are in conflict and trust needs reinforcement time consuming, but can build trust if not abused
Inclusive personal knowledge, with perspective seeking to inform a decision operational or policy decisions in which the perspectives of stakeholders are required to formulate a decision time consuming, but improves decisions and builds engagement
Compromise-focused leverages stakeholder perspectives to develop a decision that gives everyone something they want making "deals" to which all stakeholders must agree time consuming, but necessary in deal-making situations
Consent-focused leverages stakeholder perspectives to develop a decision that everyone can consent to (even though there may be reservations) policy decisions in which the perspectives of stakeholders are required to formulate a decision can be efficient, but requires excellent facilitation skills and training for all parties
Consensus-focused leverages stakeholder perspectives to develop a decision that everyone can agree with. decisions in which complete agreement is required to formulate a decision requires strong relationships, useful primarily when decision-makers are equal partners

MORE COLLABORATION

As the table above shows, all 7 forms of decision making on the collaboration continuum have legitimate applications. And all can be learned in any adult developmental level. However, the most effective application of each successive form of decision making requires more developed skills. Inclusive, consent, and consensus decision making are particularly demanding, and generally require formal training for all participating parties.

The most developmentally advanced and accomplished leaders who have taken our assessments deftly employ all 7 forms of decision making, basing the form chosen for a particular situation on factors like timeline, decision purpose, and stakeholder characteristics.


(The feedback in our LDMA [leadership decision making] assessment report provides learning suggestions for building collaboration continuum skills. And our Certified Consultants can offer specific practices, tailored for your learning needs, that support the development of these skills.) 

 

Please follow and like us: