From Piaget to Dawson: The evolution of adult developmental metrics

I've just added a new video about the evolution of adult developmental metrics to YouTube and LecticaLive. It traces the evolutionary history of Lectica's developmental model and metric.

If you are curious about the origins of our work, this video is a great place to start. If you'd like to see the reference list for this video, view it on LecticaLive.

 

 

Please follow and like us:

Adaptive learning. Are we there yet?

Adaptive learning technologies are touted as an advance in education and a harbinger of what's to come. But although we at Lectica agree that adaptive learning has a great deal to offer, we have some concerns about its current limitations. In an earlier article, I raised the question of how well one of these platforms, Knewton, serves "robust learning"—the kind of learning that leads to deep understanding and usable knowledge. Here are some more general observations.

The great strength of adaptive learning technologies is that they allow students to learn at their own pace. That's big. It's quite enough to be excited about, even if it changes nothing else about how people learn. But in our excitement about this advance, the educational community is in danger of ignoring important shortcomings of these technologies.

First, adaptive learning technologies are built on adaptive testing technologies. Today, these testing technologies are focused on "correctness." Students are moved to the next level of difficulty based on their ability to get correct answers. This is what today's testing technologies measure best. However, although being able to produce or select correct answers is important, it is not an adequate indication of understanding. And without real understanding, knowledge is not usable and can't be built upon effectively over the long term.

Second, today's adaptive learning technologies are focused on a narrow range of content—the kind of content psychometricians know how to build tests for—mostly math and science (with an awkward nod to literacy). In public education during the last 20 years, we've experienced a gradual narrowing of the curriculum, largely because of high stakes testing and its narrow focus. Today's adaptive learning technologies suffer from the same limitations and are likely to reinforce this trend.

Third, the success of adaptive learning technologies is measured with standardized tests of correctness. Higher scores will help more students get into college—after all, colleges use these tests to decide who will be admitted. But we have no idea how well higher scores on these tests translate into life success. Efforts to demonstrate the relevance of educational practices are few and far between. And notably, there are many examples of highly successful individuals who were poor players in the education game—including several of the worlds' most productive and influential people.

Fourth, some proponents of online adaptive learning believe that it can and should replace (or marginalize) teachers and classrooms. This is concerning. Education is more than a process of accumulating facts. For one thing, it plays an enormous role in socialization. Good teachers and classrooms offer students opportunities to build knowledge while learning how to engage and work with diverse others. Great teachers catalyze optimal learning and engagement by leveraging students' interests, knowledge, skills, and dispositions. They also encourage students to put what they're learning to work in everyday life—both on their own and in collaboration with others.

Lectica has a strong interest in adaptive learning and the technologies that deliver it. We anticipate that over the next few years, our assessment technology will be integrated into adaptive learning platforms to help expand their subject matter and ensure that students are building robust, usable knowledge. We will also be working hard to ensure that these platforms are part of a well-thought out, evidence-based approach to education—one that fosters the development of tomorrow's skills—the full range of skills and knowledge required for success in a complex and rapidly changing world.

Please follow and like us:

Introducing LecticaFirst: Front-line to mid-level recruitment assessment—on demand

LecticaLive logo

The world's best recruitment assessments—unlimited, auto-scored, affordable, relevant, and easy

Lectical Assessments have been used to support senior and executive recruitment for over 10 years, but the expense of human scoring has prohibited their use at scale. I'm DELIGHTED to report that this is no longer the case. Because of CLAS—our electronic developmental scoring system—this fall we plan to deliver customized assessments of workplace reasoning with real time scoring. We're calling this service LecticaFirst.

LecticaFirst is a subscription service.* It allows you to administer as many LecticaFirst assessments as you'd like, any time you'd like. It's priced to make it possible for your organization to pre-screen every candidate (up through mid-level management) before you look at a single resume or call a single reference. And we've built in several upgrade options, so you can easily obtain additional information about the candidates that capture your interest.

learn more about LecticaFirst subscriptions


The current state of recruitment assessment

"Use of hiring methods with increased predictive validity leads to substantial increases in employee performance as measured in percentage increases in output, increased monetary value of output, and increased learning of job-related skills" (Hunter, Schmidt, & Judiesch, 1990).

Most conventional workplace assessments focus on one of two broad constructs—aptitude or personality. These assessments examine factors like literacy, numeracy, role-specific competencies, leadership traits, and cultural fit, and are generally delivered through interviews, multiple choice tests, or likert-style surveys. Emotional intelligence is also sometimes measured, but thus far, is not producing results that can complete with aptitude tests (Zeidner, Matthews, & Roberts, 2004). 

Like Lectical Assessments, aptitude tests are tests of mental ability (or mental skill). High-quality tests of mental ability have the highest predictive validity for recruitment purposes, hands down. Hunter and Hunter (1984), in their systematic review of the literature, found an effective range of predictive validity for aptitude tests of .45 to .54. Translated, this means that about 20% to 29% of success on the job was predicted by mental ability. These numbers do not appear to have changed appreciably since Hunter and Hunter's 1984 review.

Personality tests come in a distant second. In their meta-analysis of the literature, Teft, Jackson, and Rothstein (1991) reported an overall relation between personality and job performance of .24 (with conscientiousness as the best predictor by a wide margin). Translated, this means that only about 6% of job performance is predicted by personality traits. These numbers do not appear to have been challenged in more recent research (Johnson, 2001).

Predictive validity of various types of assessments used in recruitment

The following table shows average predictive validities for various forms of assessment used in recruitment contexts. The column "variance explained" is an indicator of how much of a role a particular form of assessment plays in predicting performance—it's predictive power. When deciding which assessments to use in recruitment, the goal is to achieve the greatest possible predictive power with the fewest assessments. That's why I've included the last column, "variance explained with GMA." It shows what happens to the variance explained when an assessment of General Mental Ability is combined with the form of assessment in a given row. The best combinations shown here are GMA and work sample tests, GMA and Integrity, and GMA and conscientiousness.

Form of assessment Source Predictive validity Variance explained  Variance explained (with GMA)
Complexity of workplace reasoning (Dawson & Stein, 2004; Stein, Dawson, Van Rossum, Hill, & Rothaizer, 2003) .53 28% n/a
Aptitude (General Mental Ability, GMA) (Hunter, 1980; Schmidt & Hunter, 1998) .51 26% n/a
Work sample tests (Hunter & Hunter, 1984; Schmidt & Hunter, 1998) .54 29% 40%
Integrity (Ones, Viswesvaran, and Schmidt, 1993; Schmidt & Hunter, 1998) .41 17% 42%
Conscientiousness (Barrick & Mount, 1995; Schmidt & Hunter, 1998). .31 10% 36%
Employment interviews (structured) (McDaniel, Whetzel, Schmidt, and Mauer, 1994; Schmidt & Hunter, 1998) .51 26% 39%
Employment interviews (unstructured) (McDaniel, Whetzel, Schmidt, and Mauer, 1994 Schmidt & Hunter, 1998) .38 14% 30%

Job knowledge tests

(Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .48 23% 33%

Job tryout procedure

(Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .44 19% 33%
Peer ratings (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .49 24% 33%

Training & experience: behavioral consistency method

(McDaniel, Schmidt, and Hunter, 1988a, 1988b; Schmidt & Hunter, 1998; Schmidt, Ones, and Hunter, 1992) .45 20% 33%
Reference checks (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .26 7% 32%
Job experience (years)

Hunter, 1980); McDaniel, Schmidt, and Hunter, 1988b; Schmidt & Hunter, 1998) 

.18 3% 29%  
Biographical data measures

Supervisory Profile Record Biodata Scale (Rothstein, Schmidt, Erwin, Owens, and Sparks, 1990; Schmidt & Hunter, 1998)

.35 12% 27%

Assessment centers

(Gaugler, Rosenthal, Thornton, and Benson, 1987; Schmidt & Hunter, 1998; Becker, Höft, Holzenkamp, & Spinath, 2011) Note: Arthur, Day, McNelly, & Edens (2003) found a predictive validity of .45 for assessment centers that included mental skills assessments. 

.37 14% 28%
EQ (Zeidner, Matthews, & Roberts, 2004) .24 6% n/a
360 assessments Beehr, Ivanitskaya, Hansen, Erofeev, & Gudanowski, 2001 .24 6% n/a
Training &  experience: point method (McDaniel, Schmidt, and Hunter, 1988a; Schmidt & Hunter, 1998) .11 1% 27%
Years of education (Hunter and Hunter, 1984; Schmidt & Hunter, 1998) .10 1% 27%
Interests (Schmidt & Hunter, 1998) .10 1% 27%

The figure below shows the predictive power information from this table in graphical form. Assessments are color coded to indicate which are focused on mental (cognitive) skills, behavior (past or present), or personality traits. It is clear that tests of mental skills stand out as the best predictors.

Predictive power graph

Why use Lectical Assessments for recruitment?

Lectical Assessments are "next generation" assessments, made possible through a novel synthesis of developmental theory, primary research, and technology. Until now multiple choice style aptitude tests have been the most affordable option for employers. But despite being more predictive than personality tests, aptitude tests still suffer from important limitations. Lectical Assessments address these limitations. For details, take a look at the side-by-side comparison of LecticaFirst tests with conventional tests, below.

Dimension LecticaFirst Aptitude
Accuracy Level of reliability (.95–.97) makes them accurate enough for high-stakes decision-making. (Interpreting reliability statistics) Varies greatly. The best aptitude tests have levels of reliability in the .95 range. Many recruitment tests have much lower levels.
Time investment Lectical Assessments are not timed. They usually take from 45–60 minutes, depending on the individual test-taker. Varies greatly. For acceptable accuracy, tests must have many items and may take hours to administer.
Objectivity Scores are objective (Computer scoring is blind to differences in sex, body weight, ethnicity, etc.) Scores on multiple choice tests are objective. Scores on interview-based tests are subject to several sources of bias.
Expense Highly competitive subscription. (From $6 – $10) per existing employee annually Varies greatly.
Fit to role: complexity Lectica employs sophisticated developmental tools and technologies to efficiently determine the relation between role requirements and the level of reasoning skill required to meet those requirements. Lectica's approach is not directly comparable to other available approaches.
Fit to role: relevance Lectical Assessments are readily customized to fit particular jobs, and are direct measures of what's most important—whether or not candidates' actual workplace reasoning skills are a good fit for a particular job. Aptitude tests measure people's ability to select correct answers to abstract problems. It is hoped that these answers will predict how good a candidate's workplace reasoning skills are likely to be.
Predictive validity In research so far: Predict advancement (R = .53**, R2 = .28), National Leadership Study. The aptitude (IQ) tests used in published research predict performance (R = .45 to .54, R2 = .20 to .29)
Cheating The written response format makes cheating virtually impossible when assessments are taken under observation, and very difficult when taken without observation. Cheating is relatively easy and rates can be quite high.
Formative value High. LecticaFirst assessments can be upgraded after hiring, then used to inform employee development plans. None. Aptitude is a fixed attribute, so there is no room for growth. 
Continuous improvement Our assessments are developed with a 21st century learning technology that allows us to continuously improve the predictive validity of Lecticafirst assessments. Conventional aptitude tests are built with a 20th century technology that does not easily lend itself to continuous improvement.

* CLAS is not yet fully calibrated for scores above 11.5 on our scale. Scores at this level are more often seen in upper- and senior-level managers and executives. For this reason, we do not recommend using lecticafirst for recruitment above mid-level management.

**The US Department of Labor’s highest category of validity, labeled “Very Beneficial” requires regression coefficients .35 or higher (R > .34).


References

Arthur, W., Day, E. A., McNelly, T. A., & Edens, P. S. (2003). A meta‐analysis of the criterion‐related validity of assessment center dimensions. Personnel Psychology, 56(1), 125-153.

Becker, N., Höft, S., Holzenkamp, M., & Spinath, F. M. (2011). The Predictive Validity of Assessment Centers in German-Speaking Regions. Journal of Personnel Psychology, 10(2), 61-69.

Beehr, T. A., Ivanitskaya, L., Hansen, C. P., Erofeev, D., & Gudanowski, D. M. (2001). Evaluation of 360 degree feedback ratings: relationships with each other and with performance and selection predictors. Journal of Organizational Behavior, 22(7), 775-788.

Dawson, T. L., & Stein, Z. (2004). National Leadership Study results. Prepared for the U.S. Intelligence Community.

Gaugler, B. B., Rosenthal, D. B., Thornton, G. C., & Bentson, C. (1987). Meta-analysis of assessment center validity. Journal of Applied Psychology, 72(3), 493-511.

Hunter, J. E., & Hunter, R. F. (1984). The validity and utility of alterna­tive predictors of job performance. Psychological Bulletin, 96, 72-98.

Hunter, J. E., Schmidt, F. L., & Judiesch, M. K. (1990). Individual differences in output variability as a function of job complexity. Journal of Applied Psychology, 75, 28-42.

Johnson, J. (2001). Toward a better understanding of the relationship between personality and individual job performance. In M. R. R. Barrick, Murray R. (Ed.), Personality and work: Reconsidering the role of personality in organizations (pp. 83-120).

Mcdaniel, M. A., Schmidt, F. L., & Hunter, J., E. (1988a). A Meta-analysis of the validity of training and experience ratings in personnel selection. Personnel Psychology, 41(2), 283-309.

Mcdaniel, M. A., Schmidt, F. L., & Hunter, J., E. (1988b). Job experience correlates of job performance. Journal of Applied Psychology, 73, 327-330.

McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). Validity of employment interviews. Journal of Applied Psychology, 79, 599-616.

Rothstein, H. R., Schmidt, F. L., Erwin, F. W., Owens, W. A., & Sparks, C. P. (1990). Biographical data in employment selection: Can validities be made generalizable? Journal of Applied Psychology, 75, 175-184.

Stein, Z., Dawson, T., Van Rossum, Z., Hill, S., & Rothaizer, S. (2013, July). Virtuous cycles of learning: using formative, embedded, and diagnostic developmental assessments in a large-scale leadership program. Proceedings from ITC, Berkeley, CA.

Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A meta-analytic review. Personnel Psychology, 44, 703-742.

Zeidner, M., Matthews, G., & Roberts, R. D. (2004). Emotional intelligence in the workplace: A critical review. Applied psychology: An International Review, 53(3), 371-399.

Please follow and like us:

Support from neuroscience for robust, embodied learning

Human connector, by jgmarcelino from Newcastle upon Tyne, UK, via Wikimedia Commons

Fluid intelligence Connectome

For many years, we've been arguing that learning is best viewed as a process of creating networks of connections. We've defined robust learning as a process of building knowledge networks that are so well connected they allow us to put knowledge to work in a wide range of contexts. And we've described embodied learninga way of learning that involves the whole person and is much more than the memorization of facts, terms, definitions, rules, or procedures.

New evidence from the neurosciences provides support for this way of thinking about learning. According to research recently published in Nature, people with more connected brains—specifically those with more connections across different parts of the brain—demonstrate greater intelligence than those with less connected brains—including better problem-solving skills. And this is only one of several research projects that report similar findings.

Lectica exists because we believe that if we really want to support robust, embodied learning, we need to measure it. Our assessments are the only standardized assessments that have been deliberately developed to measure and support this kind of learning. 

Please follow and like us:

Correctness, argumentation, and Lectical Level

How correctness, argumentation, and Lectical Level work together diagnostically

In a fully developed Lectical Assessment, we include separate measures of aspects of arguments such as mechanics (spelling, punctuation, and capitalization), coherence (logic and relevance), and persuasiveness (use of evidence, argument, & psychology to persuade). (We do not evaluate correctness, primarily because most existing assessments already concern themselves primarily with correctness.) When educators use Lectical Assessments, they use information about Lectical Level, mechanics, coherence, persuasiveness, and sometimes correctness to diagnose students' learning needs. Here are some examples:

Level of skill (low, average, high) relative to expectations

  Lectical Level Mechanics Coherence Persuasiveness Correctness
Case 1 high high low average high
Case 2 high high high low low
Case 3 low average low low high

Case 1

This student has relatively high Lectical, mechanics, and correctness scores, but their performance is low in coherence and the persuasiveness of their answers is average. Because lower coherence and persuasiveness scores suggest that a student has not yet fully integrated their new knowledge, this student is likely to benefit most from participating in activities that require them to apply their existing knowledge in relevant contexts (using VCoL).

Case 2

This student's scores, with the exception of their correctness score, are high relative to expectations. This students' knowledge appears to be well integrated, but the combination of average persuasiveness and low correctness suggests that there are gaps in their content knowledge relative to targeted content. Here, we would suggest filling in the missing content knowledge in a way that integrates it into this students' well-developed knowledge network.

Case 3

The scores received by this student are high for correctness, while they are average for mechanics, and low for Lectical Level, coherence, and persuasiveness. This pattern suggests that the student is memorizing content without integrating it effectively into his or her knowledge network and has been doing this for some time. This student is most likely to benefit from applying their existing content knowledge in personally relevant contexts (using VCoL) until their coherence, persuasiveness, and Lectical scores catch up with their correctness scores.

Please follow and like us:

What PISA measures. What we measure.

Like the items in Lectical Assessments, PISA items involve real-world problems. PISA developers also claim, as we do here at Lectica, that their items measure how knowledge is applied. So, why do we persist in claiming that Lectical Assessments and assessments like PISA measure different things?

Part of the answer lies in questions about what's actually being measured, and in the meaning of terms like "real world problems" and "how knowledge is applied." I'll illustrate with an example from, Take the test: sample questions from OECD's PISA assessments

One of the reading comprehension items in "Take the test" involves a short story about a woman who is trapped in her home during a flood. Early in the story, a hungry panther arrives on her porch. The woman has a gun, which she keeps at her side as long as the panther is present. At first, it seems that she will kill the panther, but in the end, she offers it a ham hock instead. 

What is being measured?

There are three sources of difficulty in the story. It's Lectical phase is 10c—the third phase of four in level 10. Also, the story is challenging to interpret because it's written to be a bit ambiguous. I had to read it twice in order to appreciate the subtlety of the author's message. And it is set on the water in a rural setting, so there's lots of language that would be new to many students. How well a student will comprehend this story hinges on their level of understanding—where they are currently performing on the Lectical Scale—and how much they know about living on the water in a rural setting. Assuming they understand the content of the story, it also depends on how good they are at decoding the somewhat ambiguous message of the story.

The first question that comes up for me is whether or not this is a good story selection for the average 15-year-old. The average phase of performance for most 15-year-olds is 10a. That's their productive level. When we prescribe learning recommendations to students performing in 10a, we choose texts that are about 1 phase higher than their current productive level. We refer to this as the "Goldilocks zone", because we've found it to be the range in which material is just difficult enough to be challenging, but not so difficult that the risk of failure is too high. Some failure is good. Constant failure is bad.

But this PISA story is intended to test comprehension; it's not a learning recommendation or resource. Here, its difficulty level raises a different issue. In this context, the question that arises for me is, "What is reading comprehension, when the text students are asked to decode presents different challenges to students living in different environments and performing in different Lectical Levels?" Clearly, this story does not present the same challenge to students performing in phase 10a as it presents to students performing in 10c. Students performing in 10a or lower are struggling to understand the basic content of the story. Students performing in 10c are grappling with the subtlety of the message. And if the student lives in a city and knows nothing about living on the water, even a student performing at 10c is disadvantaged.

Real world problems

Now, let's consider what it means to present a real-world problem. When we at Lectica use this term, we usually mean that the problem is ill-structured (like the world), without a "correct" answer. (We don't even talk about correctness.) The challenges we present to learners reveal the current level of their understandings—there is always room for growth. One of our interns refers to development as a process of learning to make "better and better mistakes". This is a VERY different mindset from the "right or wrong" mindset nurtured by conventional standardized tests.

What do PISA developers mean by "real world problem"? They clearly don't mean without a "correct" answer. Their scoring rubrics show correct, partial (sometimes), and incorrect answers. And it doesn't get any more subtle than that. I think what they mean by "real world" is that their problems are contextualized; they are simply set in the real world. But this is not a fundamental change in the way PISA developers think about learning. Theirs is still a model that is primarily about the ability to get right answers.

How knowledge is applied

Let's go back to the story about the woman and the panther. After they read the story, test-takers are asked to respond to a series of multiple choice and written response questions. In one written response question they are asked, "What does the story suggest was the woman’s reason for feeding the panther?"

The scoring rubric presents a selection of potential correct answers and a set of wrong answers. (No partially correct answers here.) It's pretty clear that when PISA developers ask “how well” students' knowledge is applied, they're talking about whether or not students can provide a correct answer. That's not surprising, given what we've observed so far. What's new and troubling here is that all "correct" answers are treated as though they are equivalent. Take a look at the list of choices. Do they look equally sophisticated to you?

  •  She felt sorry for it.
  • Because she knew what it felt like to be hungry.
  • Because she’s a compassionate person.
  • To help it live. (p. 77)

“She felt sorry for it.” is considered to be just as correct as “She is a compassionate person.” But we know the ideas expressed in these two statements are not equivalent. The idea of feeling sorry for can be expressed by children as early as phase 08b (6- to 7-year-olds). The idea of compassion (as sympathy) does not appear until level 10b. And the idea of being a compassionate person does not appear until 10c—even when the concept of compassion is being explicitly taught. Given that this is a test of comprehension—defined by PISA's developers in terms of understanding and interpretation—doesn't the student who writes, "She is a compassionate person," deserve credit for arriving at a more sophisticated interpretation?

I'm not claiming that students can't learn the word compassion earlier than level 10b. And I'm certainly not claiming that there is enough evidence in students' responses to the prompt in this assessment to determine if an individual who wrote "She felt sorry for it." meant something different from an individual who wrote, "She's a compassionate person." What I am arguing is that what students mean is more important than whether or not they get a right answer. A student who has constructed the notion of compassion as sympathy is expressing a more sophisticated understanding of the story than a student who can't go further than saying the protagonist felt sorry for the panther. When we, at Lectica, talk about how well knowledge is applied, we mean, “At what level does this child appear to understand the concepts she’s working with and how they relate to one another?” 

What is reading comprehension?

All of these observations lead me back to the question, "What is reading comprehension?" PISA developers define reading comprehension in terms of understanding and interpretation, and Lectical assessments measure the sophistications of students' understanding and interpretation. It looks like our definitions are at least very similar.

We think the problem is not in the definition, but in the operationalization. PISAs items measure proxies for comprehension, not comprehension itself. Getting beyond proxies requires three ingredients.

  • First, we have to ask students to show us how they're thinking. This means asking for verbal responses that include both judgments and justifications for those judgments. 
  • Second, the questions we ask need to be more open-ended. Life is rarely about finding right answers. It's about finding increasingly adequate answers. We need to prepare students for that reality. 
  • Third, we need to engage in the careful, painstaking study of how students construct meanings over time.

This third requirement is such an ambitious undertaking that many scholars don't believe it's possible. But we've not only demonstrated that it's possible, we're doing it every day. We call the product of this work the Lectical™ Dictionary. It's the first curated developmental taxonomy of meanings. You can think of it as a developmental dictionary. Aside from making it possible to create direct tests of student understanding, the Lectical Dictionary makes it easy to describe how ideas evolve over time. We can not only tell people what their scores mean, but also what they're most likely to benefit from learning next. If you're wondering what that means in practice, check out our demo.

Please follow and like us:

Straw men and flawed metrics

khan_constructivistTen years ago, Kirschner, Sweller, & Clark published an article entitled, Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching.

In this article, Kirschner and his colleagues contrast outcomes for what they call "guidance instruction" (lecture and demonstration) with those from constructivism-based instruction. They conclude that constructivist approaches produce inferior outcomes.

The article suffers from at least three serious flaws

First, the authors, in making their distinction between guided instruction and constructivist approaches, have created a caricature of constructivist approaches. Very few experienced practitioners of constructivist, discovery, problem-based, experiential, or inquiry-based teaching would characterize their approach as minimally guided. "Differently guided" would be a more appropriate term. Moreover, most educators who use constructivist approaches include lecture and demonstration where these are appropriate.

Second, the research reviewed by the authors was fundamentally flawed. For the most part, the metrics employed to evaluate different styles of instruction were not reasonable measures of the kind of learning constructivist instruction aims to support—deep understanding (the ability to apply knowledge effectively in real-world contexts). They were measures of memory or attitude. Back in 2010, Stein, Fisher, and I argued that metrics can't produce valid results if they don't actually measure what we care about  (Redesigning testing: Operationalizing the new science of learning. Why isn't this a no-brainer?

And finally, the longitudinal studies Kirschner and his colleagues reviewed had short time-spans. None of them examined the long-term impacts of different forms of instruction on deep understanding or long-term development. This is a big problem for learning research—one that is often acknowledged, but rarely addressed.

Since Kirschner's article was published in 2006, we've had an opportunity to examine the difference between schools that provide different kids of instruction, using assessments that measure the depth and coherence of students' understanding. We've documented a 3 to 5 year advantage, by grade 12, for students who attend schools that emphasize constructivist methods vs. those that use more "guidance instruction". 

To learn more, see:

Are our children learning robustly?

Lectica rationale

 

Please follow and like us:

Adaptive learning, big data, and the meaning of learning

Knewton defines adaptive learning as "A teaching method premised on the idea that the curriculum should adapt to each user." In a recent blog post, Knewton's COO, David Liu, expanded on this definition. Here are some extracts:

You have to understand and have real data on content… Is the instructional content teaching what it was intended to teach? Is the assessment accurate in terms of what it’s supposed to assess? Can you calibrate that content at scale so you’re putting the right thing in front of a student, once you understand the state of that student? 

On the other side of the equation, you really have to understand student proficiency… understanding and being able to predict how that student is going to perform, based upon what they’ve done and based upon that content that I talked about before. And if you understand how well the student is performing against that piece of content, then you can actually begin to understand what that student needs to be able to move forward.

The idea of putting the right thing in front of a students is very cool. That's part of what we do here at Lectica. But what does Knewton mean by learning?

Curiosity got the better of me, so I set out to do some investigating. 

What does Knewton mean by learning?

In Knewton's white paper on adaptive learning the authors do a great job describing how their technology works. 

To provide continuously adaptive learning, Knewton analyzes learning materials based on thousands of data points — including concepts, structure, difficulty level, and media format — and uses sophisticated algorithms to piece together the perfect bundle of content for each student, constantly. The system refines recommendations through network effects that harness the power of all the data collected for all students to optimize learning for each individual student.

They go on to discuss several impressive technological innovations. I have to admit, the technology is cool, but what is their learning model and how is Knewton's technology being used to improve learning and teaching?

Unfortunately, Knewton does not seem to operate with a clearly articulated learning model in mind. In any case, I couldn't find one. But based on the sample items and feedback examples shown in their white paper and on their site, what Knewton means by learning is the ability to consistently get right answers on tests and quizzes, and the way to learn (get more answers right) is to get more practice on the kind of items students are not yet consistently getting right.

In fact, Knewton appears to be a high tech application of the content-focused learning model that's dominated public education since No Child Left Behind—another example of what it looks like when we throw technology at a problem without engaging in a deep enough analysis of that problem.

We're in the middle of an education crisis, but it's not because children aren't getting enough answers right on tests and quizzes. It's because our efforts to improve education consistently fail to ask the most important questions, "Why do we educate our children?" and "What are the outcomes that would be genuine evidence of success?"

Don't get me wrong. We love technology, and we leverage it shamelessly. But we don't believe technology is the answer. The answer lies in a deep understanding of how learning works and what we need to do to support the kind of learning that produces outcomes we really care about. 

 

Please follow and like us:

A new kind of report card

report_card_oldWhen I was a kid, the main way school performance was measured was with letter grades. We got letter grades on almost all of our work. Getting an A meant you knew it all, a B meant you didn't quite know it all, C meant you knew enough to pass, D meant you knew so little you were on the verge of faiing, and F meant you failed. If you always got As you were one of the really smart kids, and if you always got Ds and Fs you were one of the dumb kids. Unfortunately, that's how we thought about it, plain and simple. 

If I got a B, my teacher and parents told me I could do better and that I should work harder. If I got a C, I was in deep trouble, and was put on restriction until I brought my grade up. This meant more hours of homework. I suspect this was a common experience. It was certainly what happened on Father Knows Best and The Brady Bunch.

The best teachers also commented on our work, telling us where we could improve our arguments or where and how we had erred, and suggesting actions we could take to improve. In terms of feedback, this was the gold standard. It was the only way we got any real guidance about what we, as individuals, needed to work on next. Letter grades represented rank, punishment, and reward, but they weren't very useful indicators of where we were in our growth as learners. Report cards were for parents. 

Usher in Lectica and DiscoTest

One of our goals here at Lectica has been to make possible a new kind of report card—one that:

  1. delivers scores that have rich meaning for students, parents, and decision-makers,
  2. provides the kind of personal feedback good teachers offer, and
  3. gives students an opportunity to watch themselves grow.

report_cardThis new report card—illustrated on the right—uses a single learning "ruler" for all subjects, so student growth in different subjects can be shown on the same scale. In the example shown here, each assessment is represented by a round button that links to an explanation of the student's learning edge at the time the assessment was taken. 

This new report card also enables direct comparisons between growth trajectories in different subject areas. 

An additional benefit of this new report card is that it delivers a rich portfolio-like account of student growth that can be employed to improve admissions and advancement decisions. 

And finally, we're very curious about the potential psychological benefits of allowing students to watch how they grow. We think it's going to be a powerful motivator.

 

Please follow and like us:

Lectical (CLAS) scores are subject to change

feedback_loopWe incorporate feedback loops called virtuous cycles in everything we do. And I mean everything. Our governance structure is fundamentally iterative. (We're a Sociocracy.) Our project management approach is iterative. (We use Scrum.) We develop ideas iteratively. (We use Design Thinking.) We build our learning tools iteratively. (We use developmental maieutics.) And our learning model is iterative. (We use the virtuous cycle of learning.) One important reason for using all of these iterative processes is that we want every activity in our organization to reward learning. Conveniently, all of the virtuous cycles we iterate through do double duty as virtuous cycles of learning.

All of this virtuous cycling has an interesting (and unprecedented) side effect. The score you receive on one of our assessments is subject to change. Yes, because we learn from every single assessment taken in our system, what we learn could cause your score on any assessment you take here to change. Now, it's unlikley to change very much, probably not enough to affect the feedback you receive, but the fact that scores change from time to time can really shake people up. Some people might even think we've lost the plot!

But there is method in our madness. Allowing your score to fluctuate a bit as our knowledge base grows is our way of reminding everyone that there's uncertainty in any test score, and ourselves that there's always more to learn about how learning works. 

Please follow and like us: