The complexity of national leaders’ thinking: How does it measure up?

Special thanks to my Australian colleague, Aiden Thornton, for his editorial and research assistance.

This is the first in a series of articles on the complexity of national leaders' thinking. These articles will report results from research conducted with CLAS, our newly validated electronic developmental scoring system. CLAS will be used to score these leaders’ responses to questions posed by prominent journalists. 

In this first article, I’ll be providing some of the context for this project, including information about how my colleagues and I think about complexity and its role in leadership. I’ve embedded lots of links to additional material for readers who have questions about our 100+ year-old research tradition, Lectica’s (the nonprofit that owns me) assessments, and other research we’ve conducted with these assessments.

Context and research questions

Lectica creates diagnostic assessments for learning that support the development of mental skills required for working with complexity. We make these learning tools for both adults and children. Our K-12 initiative—the DiscoTest Initiative—is dedicated to bringing these tools to individual K-12 teachers everywhere, free of charge. Its adult assessments are used by organizations in recruitment and training, and by colleges and universities in admissions and program evaluation.

All Lectical Assessments measure the complexity level (aka, level of vertical development) of people’s thinking in particular knowledge areas. A complexity level score on a Lectical Assessment tells us the highest level of complexity—in a problem, issue, or task—an individual is likely to be able to work with effectively.

On several occasions over the last 20 years, my colleagues and I have been asked to evaluate the complexity of national leaders’ reasoning skills. Our response has been, “We will, but only when we can score electronically—without the risk of human bias.” That time has come. Now that our electronic developmental scoring system, CLAS, has demonstrated a level of reliability and precision that is acceptable for this purpose, we’re ready to take a look.​

Evaluating the complexity of national leaders' thinking is a challenging task for several reasons. First, it’s virtually impossible to find examples of many of these leaders’ “best work.” Their speeches are generally written for them, and speech writers usually try to keep the complexity level of these speeches low, aiming for a reading level in the 7th to 9th grade range. (Reading level is not the same thing as complexity level, but like most tests of capability, it correlates moderately with complexity level.) Second, even when national leaders respond to unscripted questions from journalists, they work hard to use language that is accessible to a wide audience. And finally, it’s difficult to identify a level playing field—one in which all leaders have the same opportunity to demonstrate the complexity of their thinking.

Given these obstacles, there's no point in attempting to evaluate the actual thinking capabilities of national leaders. In other words, we won’t be claiming that the scores awarded by CLAS represent the true complexity level of leaders’ thinking. Instead, we will address the following questions:

  1. When asked by prominent journalists to explain their positions on complex issues, what is the average complexity level of national leaders’ responses?
  2. How does the complexity level of national leaders’ responses relate to the complexity of the issues they discuss?

Thinking complexity and leader success

At this point, you may be wondering, “What is thinking complexity and why is it important?” A comprehensive response to this question isn’t possible in a short article like this one, but I can provide a basic description of complexity as we see it at Lectica, and provide some examples that highlight its importance.

All issues faced by leaders are associated with a certain amount of built-in complexity. For example:

  1. The sheer number of factors/stakeholders that must be taken into account.
  2. Short and long-term implications/repercussions. (Will a quick fix cause problems downstream, such as global unrest or catastrophic weather?)
  3. The number and diversity of stakeholders/interest groups. (What is the best way to balance the needs of individuals, families, businesses, communities, states, nations, and the world?)
  4. The length of time it will take to implement a decision. (Will it take months, years, decades? Longer projects are inherently more complex because of changes over time.)
  5. Formal and informal rules/laws that place limits on the deliberative process. (For example, legislative and judicial processes are often designed to limit the decision making powers of presidents or prime ministers. This means that leaders must work across systems to develop decisions, which further increases the complexity of decision making.)

Over the course of childhood and adulthood, the complexity of our thinking skills develops through up to 13 skill levels. Each new level builds upon the previous level. The figure above shows four adult complexity “zones”—advanced linear thinking, early systems thinking, advanced systems thinking, early principles thinking. In the first zone, advanced linear thinking, reasoning is often characterized as “black and white.” Individuals performing in this zone cope best with problems that have clear right or wrong answers. It is only once that individuals achieve early systems thinking, that they begin to be able to work effectively with highly complex problems that do not have clear right or wrong answers.

Leadership at the national level requires exceptional skills for managing complexity, including the ability to deal with the most complex problems faced by humanity (Helbing, 2013). Needless to say, a national leader regularly faces issues at or above early principles thinking.

Complexity level and leadership—the evidence

In the workplace, the hiring managers who decide which individuals will be put in leadership roles are likely to choose leaders whose thinking complexity is a good match for their roles. Even if they have never heard the term complexity level, hiring managers generally understand, at least implicitly, that leaders who can work with the complexity inherent in the issues associated with their roles are likely to make better decisions than leaders whose thinking is less complex.

There is a strong relation between the complexity of leadership roles and the complexity level of leaders’ reasoning. In general, more complex thinkers fill more complex roles. The figure below shows how lower and senior level leaders' complexity scores are distributed in Lectica’s database. Most senior leaders’ complexity scores are in or above advanced systems thinking, while those of lower level leaders are primarily in zone 2.

The strong relation between the complexity of leaders' thinking and the complexity of their roles can also be seen in the recruitment literature. To be clear, complexity is not the only aspect of leadership decision making that affects leaders' ability to deal effectively with complex issues. However, a large body of research, spanning over 50 years, suggests that the top predictors of workplace leader recruitment success are those that most strongly relate to thinking skills, including complexity level.

The figure below shows the predictive power of several forms of assessment employed in making hiring and promotion decisions. The cognitive assessments have been shown to have the highest predictive power. In other words, assessments of thinking skills do a better job predicting which candidates will be successful in a given role than other forms of assessment.

Predictive power graph

The match between the complexity of national leaders' thinking and the complexity level of the problems faced in their roles is important. While we will not be able to assess the actual complexity level of the thinking of national leaders, we will be able to examine the complexity of their responses to questions posed by prominent journalists. In upcoming articles, we’ll be sharing our findings and discussing their implications.

Coming next…

In the second article in this series, we will begin our examination of the complexity of national leaders' thinking by scoring interview responses from four US Presidents—Bill Clinton, George W. Bush, Barack Obama, and Donald Trump.  

 


Appendix

Predictive validity of various types of assessments used in recruitment

The following table shows average predictive validities for various forms of assessment used in recruitment contexts. The column "variance explained" is an indicator of how much of a role a particular form of assessment plays in predicting performance—it's predictive power

Form of assessment Source Predictive validity Variance explained  Variance explained (with GMA)
Complexity of workplace reasoning (Dawson & Stein, 2004; Stein, Dawson, Van Rossum, Hill, & Rothaizer, 2003) .53 28% n/a
Aptitude (General Mental Ability, GMA) (Hunter, 1980; Schmidt & Hunter, 1998) .51 26% n/a
Work sample tests (Hunter & Hunter, 1984; Schmidt & Hunter, 1998) .54 29% 40%
Integrity (Ones, Viswesvaran, and Schmidt, 1993; Schmidt & Hunter, 1998) .41 17% 42%
Conscientiousness (Barrick & Mount, 1995; Schmidt & Hunter, 1998). .31 10% 36%
Employment interviews (structured) (McDaniel, Whetzel, Schmidt, and Mauer, 1994; Schmidt & Hunter, 1998) .51 26% 39%
Employment interviews (unstructured) (McDaniel, Whetzel, Schmidt, and Mauer, 1994 Schmidt & Hunter, 1998) .38 14% 30%

Job knowledge tests

(Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .48 23% 33%

Job tryout procedure

(Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .44 19% 33%
Peer ratings (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .49 24% 33%

Training & experience: behavioral consistency method

(McDaniel, Schmidt, and Hunter, 1988a, 1988b; Schmidt & Hunter, 1998; Schmidt, Ones, and Hunter, 1992) .45 20% 33%
Reference checks (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .26 7% 32%
Job experience (years)

Hunter, 1980); McDaniel, Schmidt, and Hunter, 1988b; Schmidt & Hunter, 1998) 

.18 3% 29%  
Biographical data measures

Supervisory Profile Record Biodata Scale (Rothstein, Schmidt, Erwin, Owens, and Sparks, 1990; Schmidt & Hunter, 1998)

.35 12% 27%

Assessment centers

(Gaugler, Rosenthal, Thornton, and Benson, 1987; Schmidt & Hunter, 1998; Becker, Höft, Holzenkamp, & Spinath, 2011) Note: Arthur, Day, McNelly, & Edens (2003) found a predictive validity of .45 for assessment centers that included mental skills assessments. 

.37 14% 28%
EQ (Zeidner, Matthews, & Roberts, 2004) .24 6% n/a
360 assessments Beehr, Ivanitskaya, Hansen, Erofeev, & Gudanowski, 2001 .24 6% n/a
Training &  experience: point method (McDaniel, Schmidt, and Hunter, 1988a; Schmidt & Hunter, 1998) .11 1% 27%
Years of education (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .10 1% 27%
Interests (Schmidt & Hunter, 1998) .10 1% 27%

References

Arthur, W., Day, E. A., McNelly, T. A., & Edens, P. S. (2003). A meta‐analysis of the criterion‐related validity of assessment center dimensions. Personnel Psychology, 56(1), 125-153.

Becker, N., Höft, S., Holzenkamp, M., & Spinath, F. M. (2011). The Predictive Validity of Assessment Centers in German-Speaking Regions. Journal of Personnel Psychology, 10(2), 61-69.

Beehr, T. A., Ivanitskaya, L., Hansen, C. P., Erofeev, D., & Gudanowski, D. M. (2001). Evaluation of 360 degree feedback ratings: relationships with each other and with performance and selection predictors. Journal of Organizational Behavior, 22(7), 775-788.

Dawson, T. L., & Stein, Z. (2004). National Leadership Study results. Prepared for the U.S. Intelligence Community.

Dawson, T. L. (2017, October 20). Using technology to advance understanding: The calibration of CLAS, an electronic developmental scoring system. Proceedings from Annual Conference of the Northeastern Educational Research Association, Trumbull, CT.

Dawson, T. L., & Thornton, A. M. A. (2017, October 18). An examination of the relationship between argumentation quality and students’ growth trajectories. Proceedings from Annual Conference of the Northeastern Educational Research Association, Trumbull, CT.

Gaugler, B. B., Rosenthal, D. B., Thornton, G. C., & Bentson, C. (1987). Meta-analysis of assessment center validity. Journal of Applied Psychology, 72(3), 493-511.

Helbing, D. (2013). Globally networked risks and how to respond. Nature, 497, 51-59. 

Hunter, J. E., & Hunter, R. F. (1984). The validity and utility of alternative predictors of job performance. Psychological Bulletin, 96, 72-98.

Hunter, J. E., Schmidt, F. L., & Judiesch, M. K. (1990). Individual differences in output variability as a function of job complexity. Journal of Applied Psychology, 75, 28-42.

Johnson, J. (2001). Toward a better understanding of the relationship between personality and individual job performance. In M. R. R. Barrick, Murray R. (Ed.), Personality and work: Reconsidering the role of personality in organizations (pp. 83-120).

McDaniel, M. A., Schmidt, F. L., & Hunter, J., E. (1988a). A Meta-analysis of the validity of training and experience ratings in personnel selection. Personnel Psychology, 41(2), 283-309.

McDaniel, M. A., Schmidt, F. L., & Hunter, J., E. (1988b). Job experience correlates of job performance. Journal of Applied Psychology, 73, 327-330.

McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). Validity of employment interviews. Journal of Applied Psychology, 79, 599-616.

Rothstein, H. R., Schmidt, F. L., Erwin, F. W., Owens, W. A., & Sparks, C. P. (1990). Biographical data in employment selection: Can validities be made generalizable? Journal of Applied Psychology, 75, 175-184.

Stein, Z., Dawson, T., Van Rossum, Z., Hill, S., & Rothaizer, S. (2013, July). Virtuous cycles of learning: using formative, embedded, and diagnostic developmental assessments in a large-scale leadership program. Proceedings from ITC, Berkeley, CA.

Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A meta-analytic review. Personnel Psychology, 44, 703-742.

World Economic Forum—tomorrow’s skills

The top 10 workplace skills of the future.

Sources: Future of Jobs Report, WEF 2017

In a recent blog post—actually in several recent blog posts—I've been emphasizing the importance of building tomorrow's skills. These are the kinds of skills we all need to navigate our increasingly complex and changing world. While I may not agree that all of the top 10 skills listed in the World Economic Forum report (shown above) belong in a list of skills (Creativity is much more than a skill, and service orientation is more of a disposition than a skill.) the flavor of this list is generally in sync with the kinds of skills, dispositions, and behaviors required in a complex and rapidly changing world.

The "skills" in this list cannot be…

  • developed in learning environments focused primarily on correctness or in workplace environments that don't allow for mistakes; or
  • measured with ratings on surveys or on tests of people's ability to provide correct answers.

These "skills" are best developed through cycles of goal setting, information gathering, application, and reflection—what we call virtuous cycles of learning—or VCoLs. And they're best assessed with tests that focus on applications of skill in real-world contexts, like Lectical Assessments, which are based on a rich research tradition focused on the development of understanding and skill.

 

If you want students to develop faster, stop trying to speed up learning

During the last 20 years—since high stakes testing began to take hold—public school curricula have undergone a massive transformation. Standards have pushed material that was once taught in high school down into the 3rd and 4th grade, and the amount of content teachers are expected to cover each year has increased steadily. The theory behind this trend appears to be that learning more content and learning it earlier will help students develop faster.

But is this true? Is there any evidence at all that learning more content and learning it earlier produces more rapid development? If so, I haven't seen it.

In fact, our evidence points to the opposite conclusion. Learning more and learning it earlier may actually be interfering with the development of critical life skills—like those required for making good decisions in real-life contexts. As the graph below makes clear, students in schools that emphasize covering required content do not develop as rapidly as students in schools that focus on fostering deep understanding—even though learning for understanding generally takes more time than learning something well enough to "pass the test."

What is worse, we're finding that the average student in schools with the greatest emphasis on covering required content appears to stop developing by the end of grade 10, with an average score of 10.1. This is the same score received by the average 6th grader in schools with the greatest emphasis on fostering deep understanding.

The graphs in this post are based on data from 17,755 LRJA assessments. The LRJA asks test-takers to respond to a complex real-life dilemma. They are prompted to explore questions about:

  • finding, creating, and evaluating information and evidence,
  • perspectives, persuasion, and conflict resolution,
  • when and if it's possible to be certain, and
  • the nature of facts, truth, and reality.

Students were in grades 4-12, and attended one or more of 56 schools in the United States and Canada.

The graphs shown above represent two groups of schools—those with students who received the highest scores on the LRJA and those with students who received the lowest scores. These schools differed from one another in two other ways. First, the highest performing schools were all private schools*. Most students in these schools came from upper middle SES (socio-economic status) homes. The lowest performing schools were all public schools primarily serving low SES inner city students.

The second way in which these schools differed was in the design of their curricula. The highest performing schools featured integrated curricula with a great deal of practice-based learning and a heavy emphasis on fostering understanding and real-world competence. All of the lowest performing schools featured standards-focused curricula with a strong emphasis on learning the facts, formulas, procedures, vocabulary, and rules targeted by state tests.

Based on the results of conventional standardized tests, we expected most of the differences between student performances on the LRJA in these two groups of schools to be explained by SES. But this was not the case. Private schools with more conventional curricula and high performing public schools serving middle and upper middle SES families did indeed outperform the low SES schools, but as shown in the graph below, by grade 12, their students were still about 2.5 years behind students in the highest performing schools. At best, SES explains only about 1/2 of the difference between the best and worst schools in our database. (For more on this, see the post, "Does a focus on deep understanding accelerate growth?")

By the way, the conventional standardized test scores of students in this middle group, despite their greater emphasis on covering content, were no better than the conventional standardized test scores of students in the high performing group. Focusing on deep understanding appears to help students develop faster without interfering with their ability to learn required content.

This will not be our last word on the subject. As we scale our K-12 assessments, we'll be able to paint an increasingly clear picture of the developmental impact of a variety of curricula.


Lectica's nonprofit mission is to help educators foster deep understanding and lifelong growth. We can do it with your help! Please donate now. Your donation will help us deliver our learning tools—free—to K-12 teachers everywhere.


*None of these schools pre-selected their students based on test scores. 

See a version of this article on Medium.

Learning and metacognition

Metacognition is thinking about thinking. Metacognitive skills are an interrelated set of competencies for learning and thinking, and include many of the skills required for active learning, critical thinking, reflective judgment, problem solving, and decision-making. People whose metacognitive skills are well developed are better problem-solvers, decision makers and critical thinkers, are more able and more motivated to learn, and are more likely to be able to regulate their emotions (even in difficult situations), handle complexity, and cope with conflict. Although metacognitive skills, once they are well-learned, can become habits of mind that are applied unconsciously in a wide variety of contexts, it is important for even the most advanced learners to “flex their cognitive muscles” by consciously applying appropriate metacognitive skills to new knowledge and in new situations.

Lectica's learning model, VCoL+7 (the virtuous cycle of learning and +7 skills) leverages metacognitive skills in a number of ways. For example, the fourth step in VCoL is reflection & analysis, the +7 skills include reflective dispositionself-monitoring and awareness, and awareness of cognitive and behavioral biases.

Learn more

 

Learning in the workplace occurs optimally when the learner has a reflective disposition and receives both insitutional and educational support

Introducing LecticaFirst: Front-line to mid-level recruitment assessment—on demand

LecticaLive logo

The world's best recruitment assessments—unlimited, auto-scored, affordable, relevant, and easy

Lectical Assessments have been used to support senior and executive recruitment for over 10 years, but the expense of human scoring has prohibited their use at scale. I'm DELIGHTED to report that this is no longer the case. Because of CLAS—our electronic developmental scoring system—this fall we plan to deliver customized assessments of workplace reasoning with real time scoring. We're calling this service LecticaFirst.

LecticaFirst is a subscription service.* It allows you to administer as many LecticaFirst assessments as you'd like, any time you'd like. It's priced to make it possible for your organization to pre-screen every candidate (up through mid-level management) before you look at a single resume or call a single reference. And we've built in several upgrade options, so you can easily obtain additional information about the candidates that capture your interest.

learn more about LecticaFirst subscriptions


The current state of recruitment assessment

"Use of hiring methods with increased predictive validity leads to substantial increases in employee performance as measured in percentage increases in output, increased monetary value of output, and increased learning of job-related skills" (Hunter, Schmidt, & Judiesch, 1990).

Most conventional workplace assessments focus on one of two broad constructs—aptitude or personality. These assessments examine factors like literacy, numeracy, role-specific competencies, leadership traits, and cultural fit, and are generally delivered through interviews, multiple choice tests, or likert-style surveys. Emotional intelligence is also sometimes measured, but thus far, is not producing results that can complete with aptitude tests (Zeidner, Matthews, & Roberts, 2004). 

Like Lectical Assessments, aptitude tests are tests of mental ability (or mental skill). High-quality tests of mental ability have the highest predictive validity for recruitment purposes, hands down. Hunter and Hunter (1984), in their systematic review of the literature, found an effective range of predictive validity for aptitude tests of .45 to .54. Translated, this means that about 20% to 29% of success on the job was predicted by mental ability. These numbers do not appear to have changed appreciably since Hunter and Hunter's 1984 review.

Personality tests come in a distant second. In their meta-analysis of the literature, Teft, Jackson, and Rothstein (1991) reported an overall relation between personality and job performance of .24 (with conscientiousness as the best predictor by a wide margin). Translated, this means that only about 6% of job performance is predicted by personality traits. These numbers do not appear to have been challenged in more recent research (Johnson, 2001).

Predictive validity of various types of assessments used in recruitment

The following table shows average predictive validities for various forms of assessment used in recruitment contexts. The column "variance explained" is an indicator of how much of a role a particular form of assessment plays in predicting performance—it's predictive power. When deciding which assessments to use in recruitment, the goal is to achieve the greatest possible predictive power with the fewest assessments. That's why I've included the last column, "variance explained with GMA." It shows what happens to the variance explained when an assessment of General Mental Ability is combined with the form of assessment in a given row. The best combinations shown here are GMA and work sample tests, GMA and Integrity, and GMA and conscientiousness.

Form of assessment Source Predictive validity Variance explained  Variance explained (with GMA)
Complexity of workplace reasoning (Dawson & Stein, 2004; Stein, Dawson, Van Rossum, Hill, & Rothaizer, 2003) .53 28% n/a
Aptitude (General Mental Ability, GMA) (Hunter, 1980; Schmidt & Hunter, 1998) .51 26% n/a
Work sample tests (Hunter & Hunter, 1984; Schmidt & Hunter, 1998) .54 29% 40%
Integrity (Ones, Viswesvaran, and Schmidt, 1993; Schmidt & Hunter, 1998) .41 17% 42%
Conscientiousness (Barrick & Mount, 1995; Schmidt & Hunter, 1998). .31 10% 36%
Employment interviews (structured) (McDaniel, Whetzel, Schmidt, and Mauer, 1994; Schmidt & Hunter, 1998) .51 26% 39%
Employment interviews (unstructured) (McDaniel, Whetzel, Schmidt, and Mauer, 1994 Schmidt & Hunter, 1998) .38 14% 30%

Job knowledge tests

(Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .48 23% 33%

Job tryout procedure

(Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .44 19% 33%
Peer ratings (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .49 24% 33%

Training & experience: behavioral consistency method

(McDaniel, Schmidt, and Hunter, 1988a, 1988b; Schmidt & Hunter, 1998; Schmidt, Ones, and Hunter, 1992) .45 20% 33%
Reference checks (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .26 7% 32%
Job experience (years)

Hunter, 1980); McDaniel, Schmidt, and Hunter, 1988b; Schmidt & Hunter, 1998) 

.18 3% 29%  
Biographical data measures

Supervisory Profile Record Biodata Scale (Rothstein, Schmidt, Erwin, Owens, and Sparks, 1990; Schmidt & Hunter, 1998)

.35 12% 27%

Assessment centers

(Gaugler, Rosenthal, Thornton, and Benson, 1987; Schmidt & Hunter, 1998; Becker, Höft, Holzenkamp, & Spinath, 2011) Note: Arthur, Day, McNelly, & Edens (2003) found a predictive validity of .45 for assessment centers that included mental skills assessments. 

.37 14% 28%
EQ (Zeidner, Matthews, & Roberts, 2004) .24 6% n/a
360 assessments Beehr, Ivanitskaya, Hansen, Erofeev, & Gudanowski, 2001 .24 6% n/a
Training &  experience: point method (McDaniel, Schmidt, and Hunter, 1988a; Schmidt & Hunter, 1998) .11 1% 27%
Years of education (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .10 1% 27%
Interests (Schmidt & Hunter, 1998) .10 1% 27%

The figure below shows the predictive power information from this table in graphical form. Assessments are color coded to indicate which are focused on mental (cognitive) skills, behavior (past or present), or personality traits. It is clear that tests of mental skills stand out as the best predictors.

Predictive power graph

Why use Lectical Assessments for recruitment?

Lectical Assessments are "next generation" assessments, made possible through a novel synthesis of developmental theory, primary research, and technology. Until now multiple choice style aptitude tests have been the most affordable option for employers. But despite being more predictive than personality tests, aptitude tests still suffer from important limitations. Lectical Assessments address these limitations. For details, take a look at the side-by-side comparison of LecticaFirst tests with conventional tests, below.

Dimension LecticaFirst Aptitude
Accuracy Level of reliability (.95–.97) makes them accurate enough for high-stakes decision-making. (Interpreting reliability statistics) Varies greatly. The best aptitude tests have levels of reliability in the .95 range. Many recruitment tests have much lower levels.
Time investment Lectical Assessments are not timed. They usually take from 45–60 minutes, depending on the individual test-taker. Varies greatly. For acceptable accuracy, tests must have many items and may take hours to administer.
Objectivity Scores are objective (Computer scoring is blind to differences in sex, body weight, ethnicity, etc.) Scores on multiple choice tests are objective. Scores on interview-based tests are subject to several sources of bias.
Expense Highly competitive subscription. (From $6 – $10) per existing employee annually Varies greatly.
Fit to role: complexity Lectica employs sophisticated developmental tools and technologies to efficiently determine the relation between role requirements and the level of reasoning skill required to meet those requirements. Lectica's approach is not directly comparable to other available approaches.
Fit to role: relevance Lectical Assessments are readily customized to fit particular jobs, and are direct measures of what's most important—whether or not candidates' actual workplace reasoning skills are a good fit for a particular job. Aptitude tests measure people's ability to select correct answers to abstract problems. It is hoped that these answers will predict how good a candidate's workplace reasoning skills are likely to be.
Predictive validity In research so far: Predict advancement (R = .53**, R2 = .28), National Leadership Study. The aptitude (IQ) tests used in published research predict performance (R = .45 to .54, R2 = .20 to .29)
Cheating The written response format makes cheating virtually impossible when assessments are taken under observation, and very difficult when taken without observation. Cheating is relatively easy and rates can be quite high.
Formative value High. LecticaFirst assessments can be upgraded after hiring, then used to inform employee development plans. None. Aptitude is a fixed attribute, so there is no room for growth. 
Continuous improvement Our assessments are developed with a 21st century learning technology that allows us to continuously improve the predictive validity of Lecticafirst assessments. Conventional aptitude tests are built with a 20th century technology that does not easily lend itself to continuous improvement.

* CLAS is not yet fully calibrated for scores above 11.5 on our scale. Scores at this level are more often seen in upper- and senior-level managers and executives. For this reason, we do not recommend using lecticafirst for recruitment above mid-level management.

**The US Department of Labor’s highest category of validity, labeled “Very Beneficial” requires regression coefficients .35 or higher (R > .34).


References

Arthur, W., Day, E. A., McNelly, T. A., & Edens, P. S. (2003). A meta‐analysis of the criterion‐related validity of assessment center dimensions. Personnel Psychology, 56(1), 125-153.

Becker, N., Höft, S., Holzenkamp, M., & Spinath, F. M. (2011). The Predictive Validity of Assessment Centers in German-Speaking Regions. Journal of Personnel Psychology, 10(2), 61-69.

Beehr, T. A., Ivanitskaya, L., Hansen, C. P., Erofeev, D., & Gudanowski, D. M. (2001). Evaluation of 360 degree feedback ratings: relationships with each other and with performance and selection predictors. Journal of Organizational Behavior, 22(7), 775-788.

Dawson, T. L., & Stein, Z. (2004). National Leadership Study results. Prepared for the U.S. Intelligence Community.

Gaugler, B. B., Rosenthal, D. B., Thornton, G. C., & Bentson, C. (1987). Meta-analysis of assessment center validity. Journal of Applied Psychology, 72(3), 493-511.

Hunter, J. E., & Hunter, R. F. (1984). The validity and utility of alterna­tive predictors of job performance. Psychological Bulletin, 96, 72-98.

Hunter, J. E., Schmidt, F. L., & Judiesch, M. K. (1990). Individual differences in output variability as a function of job complexity. Journal of Applied Psychology, 75, 28-42.

Johnson, J. (2001). Toward a better understanding of the relationship between personality and individual job performance. In M. R. R. Barrick, Murray R. (Ed.), Personality and work: Reconsidering the role of personality in organizations (pp. 83-120).

Mcdaniel, M. A., Schmidt, F. L., & Hunter, J., E. (1988a). A Meta-analysis of the validity of training and experience ratings in personnel selection. Personnel Psychology, 41(2), 283-309.

Mcdaniel, M. A., Schmidt, F. L., & Hunter, J., E. (1988b). Job experience correlates of job performance. Journal of Applied Psychology, 73, 327-330.

McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). Validity of employment interviews. Journal of Applied Psychology, 79, 599-616.

Rothstein, H. R., Schmidt, F. L., Erwin, F. W., Owens, W. A., & Sparks, C. P. (1990). Biographical data in employment selection: Can validities be made generalizable? Journal of Applied Psychology, 75, 175-184.

Stein, Z., Dawson, T., Van Rossum, Z., Hill, S., & Rothaizer, S. (2013, July). Virtuous cycles of learning: using formative, embedded, and diagnostic developmental assessments in a large-scale leadership program. Proceedings from ITC, Berkeley, CA.

Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A meta-analytic review. Personnel Psychology, 44, 703-742.

Zeidner, M., Matthews, G., & Roberts, R. D. (2004). Emotional intelligence in the workplace: A critical review. Applied psychology: An International Review, 53(3), 371-399.

Why we need to LEARN to think

I'm not sure I buy the argument that reason developed to support social relationships, but the body of research described in this New Yorker article clearly exposes several built-in biases that get in the way of high quality reasoning. These biases are the reason why learning to think should be a much higher priority in our schools (and in the workplace). 

How to teach critical thinking: make it a regular practice

We've argued for years that you can't really learn critical thinking by taking a critical thinking course. Critical thinking is a skill that develops through reflective practice (VCoL). Recently, a group of Stanford scientists reported that a reflective practice approach not only works in the short term, but it produces "sticky" results. Students who are routinely prompted to evaluate data get better at evaluating data—and keep evaluating it even after the prompts are removed. 

Lectica is the only test developer that creates assessments that measure and support this kind of learning.

Support from neuroscience for robust, embodied learning

Human connector, by jgmarcelino from Newcastle upon Tyne, UK, via Wikimedia Commons

Fluid intelligence Connectome

For many years, we've been arguing that learning is best viewed as a process of creating networks of connections. We've defined robust learning as a process of building knowledge networks that are so well connected they allow us to put knowledge to work in a wide range of contexts. And we've described embodied learninga way of learning that involves the whole person and is much more than the memorization of facts, terms, definitions, rules, or procedures.

New evidence from the neurosciences provides support for this way of thinking about learning. According to research recently published in Nature, people with more connected brains—specifically those with more connections across different parts of the brain—demonstrate greater intelligence than those with less connected brains—including better problem-solving skills. And this is only one of several research projects that report similar findings.

Lectica exists because we believe that if we really want to support robust, embodied learning, we need to measure it. Our assessments are the only standardized assessments that have been deliberately developed to measure and support this kind of learning. 

Correctness, argumentation, and Lectical Level

How correctness, argumentation, and Lectical Level work together diagnostically

In a fully developed Lectical Assessment, we include separate measures of aspects of arguments such as mechanics (spelling, punctuation, and capitalization), coherence (logic and relevance), and persuasiveness (use of evidence, argument, & psychology to persuade). (We do not evaluate correctness, primarily because most existing assessments already concern themselves primarily with correctness.) When educators use Lectical Assessments, they use information about Lectical Level, mechanics, coherence, persuasiveness, and sometimes correctness to diagnose students' learning needs. Here are some examples:

Level of skill (low, average, high) relative to expectations

  Lectical Level Mechanics Coherence Persuasiveness Correctness
Case 1 high high low average high
Case 2 high high high low low
Case 3 low average low low high

Case 1

This student has relatively high Lectical, mechanics, and correctness scores, but their performance is low in coherence and the persuasiveness of their answers is average. Because lower coherence and persuasiveness scores suggest that a student has not yet fully integrated their new knowledge, this student is likely to benefit most from participating in activities that require them to apply their existing knowledge in relevant contexts (using VCoL).

Case 2

This student's scores, with the exception of their correctness score, are high relative to expectations. This students' knowledge appears to be well integrated, but the combination of average persuasiveness and low correctness suggests that there are gaps in their content knowledge relative to targeted content. Here, we would suggest filling in the missing content knowledge in a way that integrates it into this students' well-developed knowledge network.

Case 3

The scores received by this student are high for correctness, while they are average for mechanics, and low for Lectical Level, coherence, and persuasiveness. This pattern suggests that the student is memorizing content without integrating it effectively into his or her knowledge network and has been doing this for some time. This student is most likely to benefit from applying their existing content knowledge in personally relevant contexts (using VCoL) until their coherence, persuasiveness, and Lectical scores catch up with their correctness scores.

What PISA measures. What we measure.

Like the items in Lectical Assessments, PISA items involve real-world problems. PISA developers also claim, as we do here at Lectica, that their items measure how knowledge is applied. So, why do we persist in claiming that Lectical Assessments and assessments like PISA measure different things?

Part of the answer lies in questions about what's actually being measured, and in the meaning of terms like "real world problems" and "how knowledge is applied." I'll illustrate with an example from, Take the test: sample questions from OECD's PISA assessments

One of the reading comprehension items in "Take the test" involves a short story about a woman who is trapped in her home during a flood. Early in the story, a hungry panther arrives on her porch. The woman has a gun, which she keeps at her side as long as the panther is present. At first, it seems that she will kill the panther, but in the end, she offers it a ham hock instead. 

What is being measured?

There are three sources of difficulty in the story. It's Lectical phase is 10c—the third phase of four in level 10. Also, the story is challenging to interpret because it's written to be a bit ambiguous. I had to read it twice in order to appreciate the subtlety of the author's message. And it is set on the water in a rural setting, so there's lots of language that would be new to many students. How well a student will comprehend this story hinges on their level of understanding—where they are currently performing on the Lectical Scale—and how much they know about living on the water in a rural setting. Assuming they understand the content of the story, it also depends on how good they are at decoding the somewhat ambiguous message of the story.

The first question that comes up for me is whether or not this is a good story selection for the average 15-year-old. The average phase of performance for most 15-year-olds is 10a. That's their productive level. When we prescribe learning recommendations to students performing in 10a, we choose texts that are about 1 phase higher than their current productive level. We refer to this as the "Goldilocks zone", because we've found it to be the range in which material is just difficult enough to be challenging, but not so difficult that the risk of failure is too high. Some failure is good. Constant failure is bad.

But this PISA story is intended to test comprehension; it's not a learning recommendation or resource. Here, its difficulty level raises a different issue. In this context, the question that arises for me is, "What is reading comprehension, when the text students are asked to decode presents different challenges to students living in different environments and performing in different Lectical Levels?" Clearly, this story does not present the same challenge to students performing in phase 10a as it presents to students performing in 10c. Students performing in 10a or lower are struggling to understand the basic content of the story. Students performing in 10c are grappling with the subtlety of the message. And if the student lives in a city and knows nothing about living on the water, even a student performing at 10c is disadvantaged.

Real world problems

Now, let's consider what it means to present a real-world problem. When we at Lectica use this term, we usually mean that the problem is ill-structured (like the world), without a "correct" answer. (We don't even talk about correctness.) The challenges we present to learners reveal the current level of their understandings—there is always room for growth. One of our interns refers to development as a process of learning to make "better and better mistakes". This is a VERY different mindset from the "right or wrong" mindset nurtured by conventional standardized tests.

What do PISA developers mean by "real world problem"? They clearly don't mean without a "correct" answer. Their scoring rubrics show correct, partial (sometimes), and incorrect answers. And it doesn't get any more subtle than that. I think what they mean by "real world" is that their problems are contextualized; they are simply set in the real world. But this is not a fundamental change in the way PISA developers think about learning. Theirs is still a model that is primarily about the ability to get right answers.

How knowledge is applied

Let's go back to the story about the woman and the panther. After they read the story, test-takers are asked to respond to a series of multiple choice and written response questions. In one written response question they are asked, "What does the story suggest was the woman’s reason for feeding the panther?"

The scoring rubric presents a selection of potential correct answers and a set of wrong answers. (No partially correct answers here.) It's pretty clear that when PISA developers ask “how well” students' knowledge is applied, they're talking about whether or not students can provide a correct answer. That's not surprising, given what we've observed so far. What's new and troubling here is that all "correct" answers are treated as though they are equivalent. Take a look at the list of choices. Do they look equally sophisticated to you?

  •  She felt sorry for it.
  • Because she knew what it felt like to be hungry.
  • Because she’s a compassionate person.
  • To help it live. (p. 77)

“She felt sorry for it.” is considered to be just as correct as “She is a compassionate person.” But we know the ideas expressed in these two statements are not equivalent. The idea of feeling sorry for can be expressed by children as early as phase 08b (6- to 7-year-olds). The idea of compassion (as sympathy) does not appear until level 10b. And the idea of being a compassionate person does not appear until 10c—even when the concept of compassion is being explicitly taught. Given that this is a test of comprehension—defined by PISA's developers in terms of understanding and interpretation—doesn't the student who writes, "She is a compassionate person," deserve credit for arriving at a more sophisticated interpretation?

I'm not claiming that students can't learn the word compassion earlier than level 10b. And I'm certainly not claiming that there is enough evidence in students' responses to the prompt in this assessment to determine if an individual who wrote "She felt sorry for it." meant something different from an individual who wrote, "She's a compassionate person." What I am arguing is that what students mean is more important than whether or not they get a right answer. A student who has constructed the notion of compassion as sympathy is expressing a more sophisticated understanding of the story than a student who can't go further than saying the protagonist felt sorry for the panther. When we, at Lectica, talk about how well knowledge is applied, we mean, “At what level does this child appear to understand the concepts she’s working with and how they relate to one another?” 

What is reading comprehension?

All of these observations lead me back to the question, "What is reading comprehension?" PISA developers define reading comprehension in terms of understanding and interpretation, and Lectical assessments measure the sophistications of students' understanding and interpretation. It looks like our definitions are at least very similar.

We think the problem is not in the definition, but in the operationalization. PISAs items measure proxies for comprehension, not comprehension itself. Getting beyond proxies requires three ingredients.

  • First, we have to ask students to show us how they're thinking. This means asking for verbal responses that include both judgments and justifications for those judgments. 
  • Second, the questions we ask need to be more open-ended. Life is rarely about finding right answers. It's about finding increasingly adequate answers. We need to prepare students for that reality. 
  • Third, we need to engage in the careful, painstaking study of how students construct meanings over time.

This third requirement is such an ambitious undertaking that many scholars don't believe it's possible. But we've not only demonstrated that it's possible, we're doing it every day. We call the product of this work the Lectical™ Dictionary. It's the first curated developmental taxonomy of meanings. You can think of it as a developmental dictionary. Aside from making it possible to create direct tests of student understanding, the Lectical Dictionary makes it easy to describe how ideas evolve over time. We can not only tell people what their scores mean, but also what they're most likely to benefit from learning next. If you're wondering what that means in practice, check out our demo.