National leaders’ thinking: What we’ve learned so far…

In this article, I’ll be providing a summary of results from each group of leaders observed as part of Lectica’s National Leaders’ Study. Each time my colleagues and I complete a round of research for a particular group of national leaders, the results will first be presented in a special article, then summarized here. This article will be written and rewritten over several months, with regular updates. If at any point you want to get a quick sense of what we’ve learned so far, just come back to this article for an overview.

Summary of quantitative results

The following table compares the scores received by the leaders of countries included in the National Leaders’ Study so far. (If you don’t yet know what I mean by complexity level, see the first article in this series.

Country

Complexity score range

Complexity score difference

Leader average

Media average

Leader average – media average

USA

1054–1163

109

1116

1137 (without P. Trump

1124

-8

13

Australia

1111–1133

22

1125

1111

14

Key observations

  1. Lowest score: The average complexity level of President Trump’s interviews was 1054—near the average score received by 12th graders in a good high school.
  2. Highest score: The mean score for President Obama’s first two interviews was 1193. This is well above the average score received by CEOs in Lectica’s database and is in the ideal range for a national leader, who must be able to comprehend and work with issues that have a complexity level of 1200 and above.
  3. Fit-to-role: With the exception of Barack Obama, none of the leaders so far has demonstrated (in their interviews)  a level of complexity that is a good match for the complexity level of many of the problems faced in office (1200+).
  4. Third interview scores: The scores of three out of 5 leaders whose scores at time 1 were above the level of average media scores—Barack Obama, Tony Abbott, and Malcolm Turnbull—dropped closer to media averages in their third interviews. We’re monitoring this potential trend.
  5. Media score comparison: The mean score for sampled U. S. media was 13 points higher than the mean score for Australian media.
  6. Leader score comparison: If we exclude President Trump as an extreme outlier, the average score for U. S. Presidents was 9 points higher than the average score for Australian prime ministers.

Emerging concerns

  1. Difficulty evaluating candidates: In the interest of accessibility, voters are systematically being deprived of the evidence required to evaluate the competence of candidates. High-profile interview responses of national leaders are often the only place to observe anything like the actual thinking of candidates for office, yet it is well known that candidates and leaders are trained to simplify responses to interview questions. Moreover, national leaders’ speeches are written in language that simplifies issues to make them more accessible to the general public, and many candidates have not produced written works that can be relied upon as evidence of current capacity.
  2. Danger of electing incompetent candidates: When all candidates produce responses and read speeches in which issues are systematically simplified, it becomes very difficult to distinguish between different candidates’ level of understanding. This makes it easier to elect candidates that lack the level of understanding and skill required to cope with highly complex national and international issues.

Other articles in this series

National Leaders’ thinking: Australian Prime Ministers

How complex are the interview responses of the last four Australian prime ministers? How does the complexity of their responses compare to the complexity of the U.S. presidents’ responses?

Special thanks to my Australian colleague, Aiden M. A. Thornton, PhD. Cand., for his editorial and research assistance.

This is the 4th in a series of articles on the complexity of national leaders’ thinking, as measured with CLAS, a newly validated electronic developmental scoring system. This article will make more sense if you begin with the first article in the series.

Just in case you choose not to read or revisit the first article, here are a few things to keep in mind:

  • I am an educational researcher and the CEO of a nonprofit that specializes in measuring the complexity level of people’s thinking skills and supporting the development of their capacity to work with complexity.
  • The complexity level of leaders’ thinking is one of the strongest predictors of leader advancement and success. See the National Leaders Intro for evidence.
  • Many of the issues faced by national leaders require principles thinking (level 12 on the skill scale/LecticalScale), illustrated in the figure below). See the National Leaders Intro for the rationale.
  • To accurately measure the complexity level of someone’s thinking (on a given topic), we need examples of their best thinking. In this case, that kind of evidence wasn’t available. As an alternative, my colleagues and I have chosen to examine the complexity level of prime ministers’ responses to interviews with prominent journalists.

Benchmarks for complexity scores

  • Most high school graduates perform somewhere in the middle of level 10.
  • The average complexity score of American adults is in the upper end of level 10, somewhere in the range of 1050–1080.
  • The average complexity score for senior leaders in large corporations or government institutions is in the upper end of level 11, in the range of 1150–1180.
  • The average complexity score (reported in our National Leaders Study) for the three U. S. presidents that preceded President Trump was 1137.
  • The average complexity score (reported in our National Leaders Study) for President Trump was 1053.
  • The difference between 1053 and 1137 generally represents a decade or more of sustained learning. (If you’re a new reader and don’t yet know what a complexity level is, check out the National Leaders’ Series introductory article.)

The data

In this article, we examine the thinking of the four most recent prime ministers of Australia—Julia Gillard, Kevin Rudd, Tony Abbott, and Malcolm Turnbull. For each prime minister, we selected 3 interviews, based on the following criteria: They

  1. were conducted by prominent journalists representing respected news media;
  2. included questions that requested explanations of the Prime Minister’s perspective; and
  3. were either conducted within the Prime Minister’s first year in office or were the earliest interviews we could locate that met the first two criteria.

As noted in the introductory article of this series, we do not imagine that the responses provided in these interviews necessarily represent competence. It is common knowledge* that prime ministers and other leaders typically attempt to tailor messages to their audiences, so even when responding to interview questions, they may not show off their own best thinking. Media also tailor writing for their audiences, so to get a sense of what a typical complexity level target for top media might be, we used CLAS to score 11 articles from Australian news media on topics similar to those discussed by the four presidents in their interviews. We selected these articles at random—literally selecting the first ones that came to hand—from recent issues of the Canberra Times, The Age, the Sydney Morning Herald, and Adelaide Now. Articles from all of these newspapers landed in the lower range of the early systems thinking zone, with a mean score of 1109 (15 points lower than the mean for the U.S. media sample) and a range of 45 points.

Hypothesis

Based on the mean media score, and understanding that politicians generally attempt, like media, to tailor messages for their audience, we hypothesized that prime ministers would aim for a similar range. Since the mean score for the Australian media sample was lower by 15 points than the mean score for the U. S. media sample, we anticipated that the average score received by Australian prime ministers would be a bit lower than the average score received by U. S. presidents.

The results

The Table below shows the complexity scores received by the four prime ministers. (Contact us if you would like a copy of the interviews.) Complexity level scores are shown in the same order as interview listings.

All of the scores received by Australian prime ministers fell well below the complexity level of many of the problems faced by national leaders. Although we cannot assume that the interview responses we scored are representative of these leaders’ best thinking, we can assert that we can see no evidence in these interviews that these prime ministers have the capacity to grasp the full complexity of many of the issues they faced (or are currently facing) in office. Instead, their scores suggest levels of skill that are more appropriate for mid- to upper-level managers in large organizations.

Prime minister

Interview by date

Complexity level scores

Mean complexity level

Mean zone

Julia Gillard (2010-2013)

Laurie Oakes, Weekend Today, 6/27/2010; Jon Faine, ABC 774, 6/29/2010; Deborah Cameron, ABC Sydney, 7/07/2010

1108, 1113, 1113

1111

Early systems thinking

Kevin Rudd (2013-2013)

Kerry O’Brien, ABC AM, 4/24/2008; Lyndal Curtis, ABC AM, 5/30/2008; Jon Faine, ABC 774 Brisbane, 6/06/2008

1133, 1138, 1129

1133

Early systems thinking

Tony Abbott (2013-2015)

Alison Carabine, ABC Radio National, 12/16/2013; Ray Hadley, 1/29/2014; Chris Uhlman, ABC AM, 9/26/2014

1133, 1129, 1117

1126

Early systems thinking

Malcolm Turnbull (2015-)

Michael Brissendon, ABC AM, 9/21/2015; Several journalists, 12/1/2015; Steve Austin, ABC Radio Brisbane, 1/17/2017

1133, 1138, 1113

1128

Early systems thinking

Comparison of U.S. and Australian results

There was less variation in the complexity scores of Australian prime ministers than in the complexity scores of U. S. presidents. Mean scores for the U. S. presidents ranged from 1054–1163 (109 points), whereas the range for Australian prime ministers was 1111–1133 (22 points). If we exclude President Trump as an extreme outlier, the mean score for U. S. Presidents was 12 points higher than for Australian prime ministers.

You may notice that the scores of two of the prime ministers who received a score of 1133 on their first interview, had dropped by the time of their third interview. This is reminiscent of the pattern we observed for President Obama.

The mean score for all four prime ministers was 14 points higher than the mean for sampled media. Interestingly, if we exclude President Trump as an extreme outlier, the difference between the average score received by U. S. presidents is almost identical at 13 points. Almost all of the difference between the mean scores of prime ministers and presidents (excluding President Trump) could be explained by media scores.

Country

Complexity score range

Complexity score difference

Leader average

Media average

Leader average – media average

USA

1054–1163

109

1116

1137 (without P. Trump

1124

-8

13

Australia

1111–1133

22

1125

1111

14

The sample sizes here are too small to support a statistical analysis, but once we have conducted our analyses of the British and Canadian prime ministers, we will be able to examine these trends statistically—and find out if they look like more than a coincidence.

Discussion

In the first article of this series, I discussed the importance of attempting to “hire” leaders whose complexity level scores are a good match for the complexity level of the issues they face in their roles. I then posed two questions:

  • When asked by prominent journalists to explain their positions on complex issues, what is the average complexity level of national leaders’ responses?
  • How does the complexity level of national leaders’ responses relate to the complexity of the issues they discuss?”

We now have a third question to add:

  • What is the relation between the complexity level of National Leaders’ interview responses and the complexity level of respected media?

So far, we have learned that when national leaders explain their positions on complex issues, they do not — with the possible exception of President Obama — demonstrate that they are capable of grasping the full complexity of these issues. On average, their explanations do not rise to the mean level demonstrated by executive leaders in Lectica’s database.

We have also learned that when national leaders explained their positions on complex issues to the press, their explanations were 13–14 points higher on the Lectical Scale than the average complexity level of sampled media articles. We will be following this possible trend in upcoming articles about the British and Canadian leaders.

Interestingly, the Lectical Scores of two prime ministers whose average scores were above the media average dropped closer to the media average in their third interviews. We observed the same pattern for President Obama. It’s too soon to declare this to be a trend, but we’ll be watching.

As noted in the article about the thinking of U. S. presidents, the world needs leaders who understand and can work with highly complex issues, and particularly in democracies, we also need leaders whose messages are accessible to the general public. Unfortunately, the drive toward accessibility seems to have led to a situation in which candidates are persuaded to simplify their messages, leaving voters with one less way to evaluate the competence of our future leaders. How are we to differentiate between candidates whose capacity to comprehend complex issues is only as complex as that of a mid-level manager and candidates who have a high capacity to comprehend and work with these issues but feel compelled to simplify their messages? And in a world in which people increasingly seem to believe that one opinion is as good as any other, how do we convince voters of the critical importance of complex thinking and the expertise it represents?


*The speeches of presidents are generally written to be accessible to a middle school audience. The metrics used to determine reading level are not measures of complexity level. They are measures of sentence, word length, and sometimes the commonness of words. For more on reading level see: How to interpret reading level scores.


 Other articles in this series

Fit-to-role, well-being, & productivity

How to recruit the brain’s natural motivational cycle—the power of fit-to-role.

People learn and work better when the challenges they face in their roles are just right—when there is good fit-to-role. Improving fit-to-role requires achieving an optimal balance between an individual’s level of skill and role requirements. When employers get this balance right, they increase engagement, happiness (satisfaction), quality of communication, productivity, and even cultural health.

video version

Here’s how it works.

In the workplace, the challenges we’re expected to face should be just big enough to allow for success most of the time, but not so big that frequent failure is inevitable. My colleagues and I call this balance-point the Goldilocks zone, because it’s where the level of challenge is just right. Identifying the Goldilocks zone is important for three reasons:

First, and most obviously, it’s not good for business if people make too many mistakes.

Second, if the distance between employees’ levels of understanding and the difficulty of the challenges they face is too great, employees are less likely to understand and learn from their mistakes. This kind of gap can lead to a vicious cycle, in which, instead of improving or staying the same, performance gradually deteriorates.

Third, when a work challenge is just right we’re more likely to enjoy ourselves—and feel motivated to work even harder. This is because challenges in the Goldilocks zone, allow us to succeed just often enough to stimulate our brains to release pleasure hormones called opioids. Opioids give us a sense of satisfaction and pleasure. And they have a second effect. They also trigger the release of dopamine—the striving hormone—which motivates us to reach for the next challenge (so we can experience the satisfaction of success once again).

The dopamine-opioid cycle will repeat indefinitely in a virtuous cycle, but only when enough of our learning challenges are in the zone—not too easy and not too hard. As long as the dopamine-opioid cycle keeps cycling, we feel engaged. Engaged people are happy people—they tend to feel satisfied, competent, and motivated. [1]

People are also happier when they feel they can communicate effectively and build understanding with those around them. When organizations get fit-to-role right for every member of a team, they’re also building a team with members who are more likely to understand one another. This is because the complexity level of role requirements for different team members are likely to be very similar. So, getting fit to role right for one team member means building a team in which members are performing within a complexity range that makes it relatively—but not too—easy for members to understand one another. Team members are happiest when they can be confident that—most of the time and with reasonable effort—they will be able to achieve a shared understanding with other members.

A team representing a diversity of perspectives and skills, composed of individuals performing within a complexity range of 10–20 points on the Lectical Scale is likely to function optimally.

Getting fit-to-role right, also ensures that line managers are slightly more complex thinkers than their direct reports. People tend to prefer leaders they can look up to, and most of us intuitively look up to people who think a little more complexly than we do. [2] When it comes to line managers, If we’re as skilled as they are, we tend to wonder why they’re leading us. If we’re more skilled than they are, we are likely to feel frustrated. And if they’re way more skilled than we are, we may not understand them fully. In other words, we’re happiest when our line managers challenge us—but not too much. (Sound familiar?)

Most people work better with line managers who perform 15–25 points higher on the Lectical Scale than they do.

Unsurprisingly, all this engagement and happiness has an impact on productivity. Individuals work more productively when they’re happily engaged. And teams work more productively when their members communicate well with one another.[2]

The moral of the story

The moral of this story is that employee happiness and organizational effectiveness are driven by the same thing—fit-to-role. We don’t have to compromise one to achieve the other. Quite the contrary. We can’t achieve either without achieving fit-to-role.

Summing up

To sum up, when we get fit to role right—in other words, ensure that every employee is in the zone—we support individual engagement & happiness, quality communication in teams, and leadership effectiveness. Together, these outcomes contribute to productivity and cultural health.

Getting fit-to-role right requires top-notch recruitment and people development practices, starting with the ability to measure the complexity of (1) role requirements and (2) people skills.

When my colleagues and I think about the future of recruitment and people development, we envision healthy, effective organizations characterized by engaged, happy, productive, and constantly developing employees & teams. We help organizations achieve this vision by…

  • reducing the cost of recruitment so that best practices can be employed at every level in an organization;
  • improving predictions of fit-to- role;
  • broadening the definition of fit-to-role to encompasses the role, the team, and the position of a role in the organizational hierarchy; and
  • promoting the seamless integration of recruitment with employee development strategy and practice.

[1] Csikszentmihalyi, M., Flow, the psychology of happiness. (2008) Harper-Collins.

[2] Oishi, S., Koo, M., & Akimoto, S. (2015) Culture, interpersonal perceptions, and happiness in social interactions, Pers Soc Psychol Bull, 34, 307–320.

[3] Oswald, A. J., Proto, E., & Sgroi, D. (2015). Happiness and productivity. Journal of labor economics, 33, 789-822.

How to interpret reading level scores

Fleisch Kincaid and other reading level metrics are sometimes employed to compare the arguments made by politicians in their speeches, interviews, and writings. What are these metrics and what do they actually tell us about these verbal performances?

Fleisch Kincaid examines sentence, word length, and syllable number. Texts are considered “harder” when they have longer sentences and use words with more letters, and “easier” when they have shorter sentences and use words with fewer letters. For decades, Fleisch Kincaid and other reading level metrics have been used in word processors. When you are advised by a grammar checker that the reading level of your article is too high, it’s likely that this warning is based on word and sentence length.

Other reading level indicators, like Lexiles, use the commonness of words as an indicator. Texts are considered to be easier when the words they contain are more common, and more difficult when the words they contain are less common.

Because reading-level metrics are embedded in most grammar checkers, writers are continuously being encouraged to write shorter sentences with fewer, more common words. Writers for news media, advertisers, and politicians, all of whom care deeply about market share, work hard to create texts that meet specific “grade level” requirements. And if we are to judge by analyses of recent political speeches, this has considerably “dumbed down” political messages.

Weaknesses of reading level indicators

Reading level indicators look only at easy-to-measure things like length and frequency. But length and frequency are proxies for what they purport to measure—how easy it is to understand the meaning intended by the author.

Let’s start with word length. Words of the same length or number of syllables can have meanings that are more or less difficult to understand. The word, information has 4 syllables and 12 letters. The word, validity has 4 syllables and 8 letters. Which concept, information or validity, do you think is easier to understand? (Hint, one concept can’t be understood without a pretty rich understanding of the other.)

How about sentence length? These two sentences express the same meaning. “He was on fire.” “He was so angry that he felt as hot as a fire inside.” In this case, the short sentence is more difficult because it requires the reader to understand that it should be read within a context presented in an earlier sentence—”She really knew how to push his buttons.”

Finally, what about commonness? Well, there are many words that are less common but no more difficult to understand than other words. Take “giant” and “enormous.” The word, enormous doesn’t necessarily add meaning, it’s just used less often. It’s not harder, just less popular. And some relatively common words are more difficult to understand than less common words. For example, evolution is a common word with a complex meaning that’s quite difficult to understand, and onerous is an uncommon word that’s relatively easy to understand.

I’m not arguing that reducing sentence and word length and using more common words don’t make prose easier to understand, but metrics that use these proxies don’t actually measure understandability—or at least they don’t do it very well.

How reading level indicators relate to complexity level

When my colleagues and I analyze the complexity level of a text, we’re asking ourselves, “At what level does this person understand these concepts?” We’re looking for meaning, not word length or popularity. Level of complexity directly represents level of understanding.

Reading level indicators do correlate with complexity level. Correlations are generally within the range of .40 to .60, depending on the sample and reading level indicator. These are strong enough correlations to suggest that 16% to 36% of what reading-level indicators measure is the same thing we measure. In other words, they are weak measures of meaning.[1] They are stronger measures of factors that impact readability, but are not related directly to meaning—sentence and word length and/or commonness.

Here’s an example of how all of this plays out in the real world: The New York Times is said to have a grade 7 Fleisch Kincaid reading level, on average. But complexity analyses of their articles yield scores of 1100-1145. In other words, these articles express meanings that we don’t see in assessment responses until college and beyond. This would explain why the New York Times audience tends to be college educated.

We would say that by reducing sentence and word length, New York Times writers avoid making complex ideas harder to understand.

Summing up

Reading level indicators are flawed measures of understanding. They are also dinosaurs. When these tools were developed, we couldn’t do any better. But advances in technology, research methods, and the science of learning have taken us beyond proxies for understanding to direct measures of understanding. The next challenge is figuring out how to ensure that these new tools are used responsibly—for the good of all.

President Trump passed the Montreal Cognitive Assessment

Shortly after the President passed the Montreal Cognitive Assessment, a reader emailed with two questions:

  1. Does this mean that the President has the cognitive capacity required of a national leader?
  2. How does a score on this test relate to the complexity level scores you have been describing in recent posts?

Question 1

A high score on the Montreal Cognitive Assessment dos not mean that the President has the cognitive capacity required of a national leader. This test result simply means there is a high probability that the President is not suffering from mild cognitive impairment. (The test has been shown to detect existing cognitive impairment 88% of the time [1].) In order to determine if the President has the mental capacity to understand the complex issues he faces as a National Leader, we need to know how complexly he thinks about those issues.

Question 2

The answer to the second question is that there is little relation between scores on the Montreal Cognitive Assessment and the complexity level of a person’s thinking. A test like the Montreal Cognitive Assessment does not require the kind of thinking a President needs to understand highly complex issues like climate change or the economy. Teenagers can easily pass this test.

Related articles


Benchmarks for complexity scores

  • Most high school graduates perform somewhere in the middle of level 10.
  • The average complexity score of American adults is in the upper end of level 10, somewhere in the range of 1050–1080.
  • The average complexity score for senior leaders in large corporations or government institutions is in the upper end of level 11, in the range of 1150–1180.
  • The average complexity score (reported in our National Leaders Study) for the three U. S. presidents that preceded President Trump was 1137.
  • The average complexity score (reported in our National Leaders Study) for President Trump was 1053.
  • The difference between 1053 and 1137 generally represents a decade or more of sustained learning. (If you’re a new reader and don’t yet know what a complexity level is, check out the National Leaders Series introductory article.)

[1] JAMA Intern Med. 2015 Sep;175(9):1450-8. doi: 10.1001/jamainternmed.2015.2152. Cognitive Tests to Detect Dementia: A Systematic Review and Meta-analysis. Tsoi KK, Chan JY, Hirai HW, Wong SY, Kwok TC.

 

President Trump on climate change

How complex are the ideas about climate change expressed in President Trump’s tweets? The answer is, they are even less complex than ideas he has expressed about intelligence, international trade, and immigration—landing squarely in level 10. (See the benchmarks, below, to learn more about what it means to perform in level 10.)

The President’s climate change tweets

It snowed over 4 inches this past weekend in New York City. It is still October. So much for Global Warming.
2:43 PM – Nov 1, 2011

 

It’s freezing in New York—where the hell is global warming?
2:37 PM – Apr 23, 2013

 

Record low temperatures and massive amounts of snow. Where the hell is GLOBAL WARMING?
11:23 PM – Feb 14, 2015

 

In the East, it could be the COLDEST New Year’s Eve on record. Perhaps we could use a little bit of that good old Global Warming…!
7:01 PM – Dec 28, 2017

Analysis

In all of these tweets President Trump appears to assume that unusually cold weather is proof that climate change (a.k.a., global warming) is not real. The argument is an example of simple level 10, linear causal logic that can be represented as an “if,then” statement. “If the temperature right now is unusually low, then global warming isn’t happening.” Moreover, in these comments the President relies exclusively on immediate (proximal) evidence, “It’s unusually cold outside.” We see the same use of immediate evidence when climate change believers claim that a warm weather event is proof that climate change is real.

Let’s use some examples of students’ reasoning to get a fix on the complexity level of President Trump’s tweets. Here is a statement from an 11th grade student who took our assessment of environmental stewardship (complexity score = 1025):

“I do think that humans are adding [gases] to the air, causing climate change, because of everything around us. The polar ice caps are melting.”

The argument is an example of simple level 10, linear causal logic that can be represented as an “if,then” statement. “If the polar ice caps are melting, then global warming is real.” There is a difference between this argument and President Trump’s argument, however. The student is describing a trend rather than a single event.

Here is an argument made by an advanced 5th grader (complexity score = 1013):

“I think that fumes, coals, and gasses we use for things such as cars…cause global warming. I think this because all the heat and smoke is making the years warmer and warmer.”

This argument is also an example of simple level 10, linear causal logic that can be represented as an “if,then” statement. “If the years are getting warmer and warmer, then global warming is real.” Again, the difference between this argument and President Trump’s argument is that the student is describing a trend rather than a single event.

I offer one more example, this time of a 12th grade student making a somewhat more complex argument (complexity score = 1035).

“The temperature has increased over the years and studies show that the ice is melting in the north and south pole, so, yes humans are causing climate change.”

This argument is also an example of level 10, linear causal logic that can be represented as an “if,then” statement. “If the temperature has increased and studies show that the ice at the north and south poles are melting, then humans are causing climate change. But in this case, the student has mentioned two trends (warming and melting) and explicitly uses scientific evidence to support her conclusion.

Based on these comparisons, it seems clear that President Trump’s Tweets about climate change represent reasoning at the lower end of level 10.

“Humans have caused a lot of green house gasses…and these have caused global warming. The temperature has increased over the years and studies show that the ice is melting in the north and south pole, so, yes humans are causing climate change.

This argument is also an example of level 10, linear causal logic that can be represented as an “if,then” statement. “If the temperature has increased and studies show that the ice at the north and south poles are melting, then humans are causing climate change. In this case, the student’s argument is a bit more complex than in previous examples. She has mentioned two variables (warming and melting) and explicitly uses scientific evidence to support her conclusion.

Based on these comparisons, it seems clear that President Trump’s Tweets about climate change represent reasoning at the lower end of level 10.

Reasoning in level 11

Individuals performing in level 11 recognize that climate is an enormously complex phenomenon that involves many interacting variables. They understand that any single event or trend may be part of the bigger story, but is not, on its own, evidence for or against climate change.

Summing up

It concerns me greatly that someone who does not demonstrate any understanding of the complexity of climate is in a position to make major decisions related to climate change.


Benchmarks for complexity scores

  • Most high school graduates perform somewhere in the middle of level 10.
  • The average complexity score of American adults is in the upper end of level 10, somewhere in the range of 1050–1080.
  • The average complexity score for senior leaders in large corporations or government institutions is in the upper end of level 11, in the range of 1150–1180.
  • The average complexity score (reported in our National Leaders Study) for the three U. S. presidents that preceded President Trump was 1137.
  • The average complexity score (reported in our National Leaders Study) for President Trump was 1053.
  • The difference between 1053 and 1137 generally represents a decade or more of sustained learning. (If you’re a new reader and don’t yet know what a complexity level is, check out the National Leaders Series introductory article.)

 

President Trump on immigration

How complex are the ideas about immigration expressed in President Trump’s recent comments to congress?

On January 9th, 2018, President Trump spoke to members of Congress about immigration reform. In his comments, the President stressed the need for bipartisan immigration reform, and laid out three goals.

  1. secure our border with Mexico
  2. end chain migration
  3. close the visa lottery program

I have analyzed President Trump’s comments in detail, looking at each goal in turn. But first, his full comments were submitted to CLAS (an electronic developmental assessment system) for an analysis of their complexity level. The CLAS score was 1046. This score is in what we call level 10, and is a few points lower than the average score of 1053 awarded to President Trump’s arguments in our earlier research.


Here are some benchmarks for complexity scores:

  • The average complexity score of American adults is in the upper end of level 10, somewhere in the range of 1050-1080.
  • The average complexity score for senior leaders in large corporations or government institutions is in the upper end of level 11, in the range of 1150-1180.
  • The average complexity score (reported in our National Leaders Study) for the three U. S. presidents that preceded President Trump was 1137.
  • The difference between 1046 and 1137 represents a decade or more of sustained learning. (If you’re a new reader and don’t yet know what a complexity level is, check out the National Leaders Series introductory article.)

Border security

President Trump’s first goal was to increase border security.

Drugs are pouring into our country at a record pace and a lot of people are coming in that we can’t have… we have tremendous numbers of people and drugs pouring into our country. So, in order to secure it, we need a wall.  We…have to close enforcement loopholes. Give immigration officers — and these are tremendous people, the border security agents, the ICE agents — we have to give them the equipment they need, we have to close loopholes, and this really does include a very strong amount of different things for border security.”

This is a good example of a level 10, if-then, linear argument. The gist of this argument is, “If we want to keep drugs and people we don’t want from coming across the border, then we need to build a wall and give border agents the equipment and other things they need to protect the border.”

As is also typical of level 10 arguments, this argument offers immediate concrete causes and solutions. The cause of our immigration problems is that bad people are getting into our country. The physical act of keeping people out of the country is a solution to the this problem.

Individuals performing in level 11 would not be satisfied with this line of reasoning. They would want to consider underlying or root causes such as poverty, political upheaval, or trade imbalances—and would be likely to try to formulate solutions that addressed these more systemic causes.

Side note: It’s not clear exactly what President Trump means by loopholes. In the past, he has used this term to mean “a law that lets people do things that I don’t think they should be allowed to do.” The dictionary meaning of the term would be more like, “a law that unintentionally allows people to do things it was meant to keep them from doing.”

Chain migration

President Trump’s second goal was to end chain migration. According to Wikipedia, Chain migration (a.k.a., family reunification) is a social phenomenon in which immigrants from a particular family or town are followed by others from that family or town. In other words, family members and friends often join friends and loved ones who have immigrated to a new country. Like many U. S. Citizens, I’m a product of chain migration. The first of my relatives who arrived in this country in the 17th century, later helped other relatives to immigrate.

President Trump wants to end chain migration, because…

“Chain migration is bringing in many, many people with one, and often it doesn’t work out very well.  Those many people are not doing us right.”

I believe that what the President is saying here is that chain migration is when one person immigrates to a new country and lots of other people known (or related to?) that person are allowed to immigrate too. He is concerned that the people who follow the first immigrant aren’t behaving properly.

To support this claim, President Trump provides an example of the harm caused by chain migration.

“…we have a recent case along the West Side Highway, having to do with chain migration, where a man ran over — killed eight people and many people injured badly.  Loss of arms, loss of legs.  Horrible thing happened, and then you look at the chain and all of the people that came in because of him.  Terrible situation.”

The perpetrator—Sayfullo Saipov—of the attack Trump appears to be referring to, was a Diversity Visa immigrant. Among other things, this means he was not sponsored, so he cannot be a chain immigrant. On November 21, 2017, President Trump claimed that Saipov had been listed as the primary contact of 23 people who attempted to immigrate following his arrival in 2010, suggesting that Saipov was the first in a chain of immigrants. According to Buzzfeed, federal authorities have been unable to confirm this claim.

Like the border security example, Trump’s argument about chain migration is a good example of a level 10, if-then, linear argument. Here, the gist of his argument is that, If we don’t stop chain migration, then bad people like Sayfullo Saipov will come into the country and do horrible things to us. (I’m intentionally ignoring President Trump’s mistaken assertion that Saipov was a chain immigrant.)

Individuals performing in level 11 would not regard a single example of violent behavior as adequate evidence that chain immigration is a bad thing. Before deciding that eliminating chain migration was a wise decision, they they would want to know, for example, whether or not chain immigrants are more likely to behave violently (or become terrorists) than natural born citizens.

The visa lottery (Diversity Visa Program)

The visa lottery was created as part of the Immigration Act of 1990, and signed into law by President George H. W. Bush. Application for this program is free, The only way to apply is to enter your data into a form on the State Department’s website. Individuals who win the lottery must undergo background checks and vetting before being admitted into the United States. (If you are interested in learning more, the Wikipedia article on this program is comprehensive and well-documented.)

President Trump wants to cancel the lottery program

“…countries come in and they put names in a hopper.  They’re not giving you their best names; common sense means they’re not giving you their best names.  They’re giving you people that they don’t want.  And then we take them out of the lottery.  And when they do it by hand — where they put the hand in a bowl — they’re probably — what’s in their hand are the worst of the worst.”

Here, President Trump seems to misunderstand the nature of the visa lottery program. He claims that countries put forward names and that these are the names of people they do not want in their own countries. That is simply not the way the Diversity Visa Program works.

To support his anti-lottery position, Trump again appears to mention the case of Sayfullo Saipov (“that same person who came in through the lottery program).”

But they put people that they don’t want into a lottery and the United States takes those people.  And again, they’re going back to that same person who came in through the lottery program. They went — they visited his neighborhood and the people in the neighborhood said, “oh my God, we suffered with this man — the rudeness, the horrible way he treated us right from the beginning.”  So we don’t want the lottery system or the visa lottery system.  We want it ended.”

I think that what President Trump is saying here is that Sayfullo Saipov was one of the outcasts put into our lottery program by a country that did not want him, and that his new neighbors in the U. S. had complained about his behavior from the start.

This is not a good example of a level 10 argument. This is not a good example of an argument. President Trump completely misrepresents the Diversity Immigrant Visa Program, leaving him with no basis for a sensible argument.

Summing up

The results from this analysis of President Trump’s statements about immigration provides additional evidence that he tends to perform in the middle of level 10, and his arguments generally have a simple if, then structure. It also reveals some apparent misunderstanding of the law and other factual information.

It is a matter for concern when a President of the United States does not appear to understand a law he wants to change.

 

President Trump on intelligence

How complex are the ideas about intelligence expressed in President Trump’s tweets?

President Trump recently tweeted about his intelligence. The media has already had quite a bit to say about these tweets. So, if you’re suffering from Trump tweet trauma this may not be the article for you.

But you might want to hang around if you’re interested in looking at these tweets from a different angle. I thought it would be interesting to examine their complexity level, and consider what they suggest about the President’s conception of intelligence.

In the National Leaders Study, we’ve been using CLAS — Lectica, Inc.’s electronic developmental scoring system—to score the complexity level of several national leaders’ responses to questions posed by respected journalists. Unfortunately, I can’t use CLAS to score tweets. They’re too short. Instead, I’m going to use the Lectical Dictionary to examine the complexity of ideas being expressed in them.


If you aren’t familiar with the National Leaders series, you may find this article a bit difficult to follow.


The Lectical Dictionary is a developmentally curated list of about 200,000 words or short phrases (terms) that represent particular meanings. (The dictionary does not include entries for people, places, or physical things.) Each term in the dictionary has been assigned to one of 30 developmental phases, based on its least complex possible meaning. The 30 developmental phases span first speech (in infancy) to the highest adult developmental phase Lectica has observed in human performance. Each phase represents 1/4 a level (a, b, c, or d). Levels range from 5 (first speech) to 12 (the most complex level Lectica measures). Phase scores are named as follows: 09d, 10a, 10b, 10c, 10d, 11a, etc. Levels 10 through 12 are considered to be “adult levels,” but the earliest phase of level 10 is often observed in middle school students, and the average high school student performs in the 10b to10c range.

In the following analysis, I’ll be identifying the highest-phase Lectical Dictionary terms in the President’s statements, showing each item’s phase. Where possible, I’ll also be looking at the form of thinking—black-and-white, if-then logic (10a–10d) versus shades-of-gray, nuanced logic (11a–11d)—these terms are embedded in.

The President’s statements

The first two statements are tweets made on 01–05–2018.

“…throughout my life, my two greatest assets have been mental stability and being, like, really smart.

The two most complex ideas in this statement are the notion of having personal assets (10c), and the notion of mental stability (10b).

“I went from VERY successful businessman, to top T.V. Star…to President of the United States (on my first try). I think that would qualify as not smart, but genius…and a very stable genius at that!”

This statement presents an argument for the President’s belief that he is not only smart, but a stable genius (10b-10c). The evidence offered consists of a list of accomplishments—being a successful (09c) businessman, being a top star, and being elected (09b) president. (Stable genius is not in the Lectical Dictionary, but it is a reference back to the previous notion of mental stability, which is in the dictionary at 10b.)

The kind of thinking demonstrated in this argument is simple if-then linear logic. “If I did these things, then I must be a stable genius.”

Later, at Camp David, when asked about these Tweeted comments, President Trump explained further…

“I had a situation where I was a very excellent student, came out, made billions and billions of dollars, became one of the top business people, went to television and for 10 years was a tremendous success, which you’ve probably heard.”

This argument provides more detail about the President’s accomplishments—being an excellent (08a) student, making billions and billions of dollars, becoming a top business person, and being a tremendous success (10b) in television. Here the president demonstrates the same if-then linear logic observed in the second tweet, above.

Summing up

The President has spoken about his intelligence on numerous occasions. Across all of the instances I’ve identified, he makes a strong connection between intelligence and concrete accomplishments — most often wealth, fame, or performance (for example in school or in negotiations). I could not find a single instance in which he attributed any part of these accomplishments to external or mitigating factors — for example, luck, being born into a wealthy family, having access to expert advice, or good employees. (I’d be very interested in seeing any examples readers can send my way!)

President Trump’s statements represent the same kind of logic and meaning-making my colleagues and I observed in the interview responses analysed for the National Leaders’ series. President Trump’s logic in these statements has a simple, if-then structure and the most complex ideas he expresses are in the 10b to10c range. As yet, I have seen no evidence of reasoning above this range.

The average score of a US adult is in the 10c–10d range.

 

Statistics for all: significance vs. significance

There’s a battle out there no one’s tweeting about. It involves a tension between statistical significance and practical significance. If you make decisions that involve evaluating evidence—in other words, if you are human—understanding the distinction between these two types of significance will significantly improve your decisions (both practically and statistically).

Statistical significance

Statistical significance (a.k.a. “p”) is a calculation made to determine how confident we can be that a relationship between two factors (variables) is real. The lower a p value, the more confident we can be. Most of the time, we want p to be less than .05.

Don’t be misled! A low p value tells us nothing about the size of a relationship between two variables. When someone says that statistical significance is high, all this means is that we can be more confident that the relationship is real.

Replication

Once we know we can be confident that a relationship between two variables is real, we should check to see if the research has been replicated. That’s because we can’t be sure a statistically significant relationship found in a single study is really real. After we’ve determined that a relationship is statistically significant and replicable, it’s time to consider practical significance. Practical significance has to do with the size of the relationship.

Practical significance

To figure out how practically significant a relationship is, we need to know how big it is. The size of a relationship, or effect size, is evaluated independently of p. For a plain English discussion of effect size, check out this article, Statistics for all: prediction.

Importance

The greater the size of a relationship between two variables, the more likely the relationship is to be important — but that’s not enough. To have real importance, a relationship must also matter. And it is the decision-maker who decides what matters.

Examples

Let’s look at one of my favorite examples. The results of high stakes tests like the SAT and GRE — college entrance exams made by ETS — have been shown to predict college success. Effect sizes tend to be small, but the effects are statistically significant — we can have confidence that they are real. And evidence for these effects have come from numerous studies, so we know they are really real.

If you’re the president of a college, there is little doubt that these test scores have practical significance. Improving prediction of student success, even a little, can have a big impact on the bottom line.

If you’re an employer, you’re more likely to care about how well a student did in college than how they did prior to college, so SAT and GRE scores are likely to be less important to you than college success.

If you’re a student, the size of the effect isn’t important at all. You don’t make the decision about whether or not the school is going to use the SAT or GRE to filter students. Whether or not these assessments are used is out of your control. What’s important to you is how a given college is likely to benefit you.

If you’re me, the size of the effect isn’t very important either. My perspective is that of someone who wants to see major changes in the educational system. I don’t think we’re doing our students any favors by focusing on the kind of learning that can be measured by tests like the GRE and SAT. I think our entire educational system leans toward the wrong goal—transmitting more and more “correct” information. I think we need to ask if what students are learning in school is preparing them for life.

Another thing to consider when evaluating practical significance is whether or not a relationship between two variables tells us only part of a more complex story. For example, the relationship between ethnicity and the rate of developmental growth (what my colleagues and I specialize in measuring) is highly statistically significant (real) and fairly strong (moderate effect size). But, this relationship completely disappears once socioeconomic status (wealth) is taken into account. The first relationship is misleading (spurious). The real culprit is poverty. It’s a social problem, not an ethnic problem.

Summing up

Most discussions of practical significance stop with effect size. From a statistical perspective, this makes sense. Statistics can’t be used to determine which outcomes matter. People have to do that part, but statistics, when good ones are available, should come first. Here’s my recipe:

  1. Find out if the relationship is real (p < .05).
  2. Find out if it is really real (replication).
  3. Consider the effect size.
  4. Decide how much it matters.

My organization, Lectica, Inc., is a 501(c)3 nonprofit corporation. Part of our mission is to share what we learn with the world. One of the things we’ve learned is that many assessment buyers don’t seem to know enough about statistics to make the best choices. The Statistics for all series is designed to provide assessment buyers with the knowledge they need most to become better assessment shoppers.

 

Statistics for all: Prediction

Why you might want to reconsider using 360s and EQ assessments to predict recruitment success


Measurements are often used to make predictions. For example, they can help predict how tall a 4-year-old is likely to be in adulthood, which students are likely to do better in an academic program, or which candidates are most likely to succeed in a particular job.

Some of the attributes we measure are strong predictors, others are weaker. For example, a child’s height at age 4 is a pretty strong predictor of adult height. Parental height is a weaker predictor. The complexity of a person’s workplace decision making, on its own, is a moderate predictor of success in the workplace. But the relation between the complexly of their workplace decision making and the complexity of their role is a strong predictor.

How do we determine the strength or a predictor? In statistics, the strength of predictions is represented by an effect size. Most effect size indicators are expressed as decimals and range from .00 –1.00, with 1.00 representing 100% accuracy of prediction. The effect size indicator you’ll see most often is r-square. If you’ve ever been forced to take a statistics course—;)—you may remember that r represents the strength of a correlation. Before I explain r-square, let’s look at some correlation data.

The four figures below represent 4 different correlations, from weakest (.30) to strongest (.90). Let’s say the vertical axis (40 –140) represents the level of success in college, and the horizontal axis (50 –150) represents scores on one of 4 college entrance exams. The dots represent students. If you were trying to predict success in college, you would be wise to choose the college entrance exam that delivered an r of .90.

Why is an r of .90 preferable? Well, take a look at the next set of figures. I’ve drawn lines through the clouds of dots (students) to show regression lines. These lines represent the prediction we would make about how successful a student will be, given a particular score. It’s clear that in the case of the first figure (r =.30), this prediction is likely to be pretty inaccurate. Many students perform better or worse than predicted by the regression line. But as the correlations increase in size, prediction improves. In the case of the fourth figure (r =.90), the prediction is most accurate.

What does a .90 correlation mean in practical terms? That’s where r-square comes in. If we multiply .90 by .90 (calculate the square), we get an r-square of .81. Statisticians would say that the predictor (test score), explains 81% of the variance in college success. The 19% of the variance that’s not explained (1.00 -.81 =.19) represents the percent of the variance that is due to error (unexplained variance). The square root of 19% is the amount of error (.44).

Even when r = .90, error accounts for 19% of the variance.

Correlations of .90 are very rare in the social sciences—but even correlations this strong are associated with a significant amount of error. It’s important to keep error in mind when we use tests to make big decisions—like who gets hired or who gets to go to college. When we use tests to make decisions like these, the business or school is likely to benefit—slightly better prediction can result in much better returns. But there are always rejected individuals who would have performed well, and there are always accepted individuals who will perform badly.

For references, see: The complexity of national leaders’ thinking: How does it measure up?

Let’s get realistic. As I mentioned earlier, correlations of .90 are very rare. In recruitment contexts, the most predictive assessments (shown above) correlate with hire success in the range of .50 –.54, predicting from 25% – 29% of the variance in hire success. That leaves a whopping 71% – 75% of the variance unexplained, which is why the best hiring processes not only use the most predictive assessments, but also consider multiple predictive criteria.

On the other end of the spectrum, there are several common forms of assessment that explain less than 9% of the variance in recruitment success. Their correlations with recruitment success are lower than .30. Yet some of these, like 360s, reference checks, and EQ, are wildly popular. In the context of hiring, the size of the variance explained by error in these cases (more than 91%) means there is a very big risk of being unfair to a large percentage of candidates. (I’m pretty certain assessment buyers aren’t intentionally being unfair. They probably just don’t know about effect size.)

If you’ve read my earlier article about replication, you know that the power-posing research could not be replicated. You also might be interested to learn that the correlations reported in the original research were also lower than .30. If power-posing had turned out to be a proven predictor of presentation quality, the question I’d be asking myself is, “How much effort am I willing to put into power-posing when the variance explained is lower than 9%?”

If we were talking about something other than power-posing, like reducing even a small risk that my child would die of a contagious disease, I probably wouldn’t hesitate to make a big effort. But I’m not so sure about power-posing before a presentation. Practicing my presentation or getting feedback might be a better use of my time.

Summing up (for now)

A basic understanding of prediction is worth cultivating. And it’s pretty simple. You don’t even have to do any fancy calculations. Most importantly, it can save you time and tons of wasted effort by giving you a quick way to estimate the likelihood that an activity is worth doing (or product is worth having). Heck, it can even increase fairness. What’s not to like?


My organization, Lectica, Inc., is a 501(c)3 nonprofit corporation. Part of our mission is to share what we learn with the world. One of the things we’ve learned is that many assessment buyers don’t seem to know enough about statistics to make the best choices. The Statistics for all series is designed to provide assessment buyers with the knowledge they need most to become better assessment shoppers.

Statistics for all: Replication

Statistics for all: What the heck is confidence?

Statistics for all: Estimating confidence