Statistics for all: significance vs. significance

There’s a battle out there no one’s tweeting about. It involves a tension between statistical significance and practical significance. If you make decisions that involve evaluating evidence—in other words, if you are human—understanding the distinction between these two types of significance will significantly improve your decisions (both practically and statistically).

Statistical significance

Statistical significance (a.k.a. “p”) is a calculation made to determine how confident we can be that a relationship between two factors (variables) is real. The lower a p value, the more confident we can be. Most of the time, we want p to be less than .05.

Don’t be misled! A low p value tells us nothing about the size of a relationship between two variables. When someone says that statistical significance is high, all this means is that we can be more confident that the relationship is real.


Once we know we can be confident that a relationship between two variables is real, we should check to see if the research has been replicated. That’s because we can’t be sure a statistically significant relationship found in a single study is really real. After we’ve determined that a relationship is statistically significant and replicable, it’s time to consider practical significance. Practical significance has to do with the size of the relationship.

Practical significance

To figure out how practically significant a relationship is, we need to know how big it is. The size of a relationship, or effect size, is evaluated independently of p. For a plain English discussion of effect size, check out this article, Statistics for all: prediction.


The greater the size of a relationship between two variables, the more likely the relationship is to be important — but that’s not enough. To have real importance, a relationship must also matter. And it is the decision-maker who decides what matters.


Let’s look at one of my favorite examples. The results of high stakes tests like the SAT and GRE — college entrance exams made by ETS — have been shown to predict college success. Effect sizes tend to be small, but the effects are statistically significant — we can have confidence that they are real. And evidence for these effects have come from numerous studies, so we know they are really real.

If you’re the president of a college, there is little doubt that these test scores have practical significance. Improving prediction of student success, even a little, can have a big impact on the bottom line.

If you’re an employer, you’re more likely to care about how well a student did in college than how they did prior to college, so SAT and GRE scores are likely to be less important to you than college success.

If you’re a student, the size of the effect isn’t important at all. You don’t make the decision about whether or not the school is going to use the SAT or GRE to filter students. Whether or not these assessments are used is out of your control. What’s important to you is how a given college is likely to benefit you.

If you’re me, the size of the effect isn’t very important either. My perspective is that of someone who wants to see major changes in the educational system. I don’t think we’re doing our students any favors by focusing on the kind of learning that can be measured by tests like the GRE and SAT. I think our entire educational system leans toward the wrong goal—transmitting more and more “correct” information. I think we need to ask if what students are learning in school is preparing them for life.

Another thing to consider when evaluating practical significance is whether or not a relationship between two variables tells us only part of a more complex story. For example, the relationship between ethnicity and the rate of developmental growth (what my colleagues and I specialize in measuring) is highly statistically significant (real) and fairly strong (moderate effect size). But, this relationship completely disappears once socioeconomic status (wealth) is taken into account. The first relationship is misleading (spurious). The real culprit is poverty. It’s a social problem, not an ethnic problem.

Summing up

Most discussions of practical significance stop with effect size. From a statistical perspective, this makes sense. Statistics can’t be used to determine which outcomes matter. People have to do that part, but statistics, when good ones are available, should come first. Here’s my recipe:

  1. Find out if the relationship is real (p < .05).
  2. Find out if it is really real (replication).
  3. Consider the effect size.
  4. Decide how much it matters.

My organization, Lectica, Inc., is a 501(c)3 nonprofit corporation. Part of our mission is to share what we learn with the world. One of the things we’ve learned is that many assessment buyers don’t seem to know enough about statistics to make the best choices. The Statistics for all series is designed to provide assessment buyers with the knowledge they need most to become better assessment shoppers.


Please follow and like us:

Statistics for all: Replication

(Why you should have been suspicious of power-posing from the start!)

I’ve got a free, low-tech life hack for you that will save significant time and money — and maybe even improve your health. All you need to do is one little thing. Before you let the latest research results change your behavior, check to see if the research has been replicated!

One of the hallmarks of modern science is the notion that one study of a new phenomenon—especially a single small study—proves nothing. Most of the time, the results of such studies can do little more than suggest possibilities. To arrive at proof, results have to be replicated—again and again, usually in a variety of contexts. This is important, especially in the social sciences, where phenomena are difficult to measure and the results of many new studies cannot be replicated.

Researchers used to be trained to avoid even implying that findings from a new study were proven facts. But when Amy Cuddy set out to share the results of her and her colleagues’ power-posing research, she didn’t simply imply that her results could be generalized. She unabashedly announced to an enthralled Ted Talk audience that she’d discovered a “Free, no-tech life hack…that could significantly change how your life unfolds.”

Thanks to this talk, many thousands—perhaps millions—of people-hours have been spent power-posing. But it’s not the power-posers whose lives have changed. Unfortunately, as it turns out, it’s Dr. Cuddy’s life that changed significantly—when other researchers were unable to replicate her results. In fact, because she had made such strong unwarranted claims, Dr. Cuddy became the focus of severe criticism.

Although she was singled out, Dr. Cuddy is far from alone. She’s got lots of company. Many fads have begun just like Power Posing did. Here’s how it goes: A single small study produces results that have “novelty appeal,” the Today Show picks up the story, and thousands jump on the bandwagon! Sometimes, as in the case of power-posing, the negative impact is no worse than a bit of wasted time. But in other cases, such as when our heath or pocketbooks are at stake, the impacts can be much greater.

“But it worked for me!” If you tried power-posing and believe it was responsible for your success in achieving an important goal, you may be right. The scientific method isn’t perfect — especially in the social sciences — and future studies with better designs may support your belief. However, I recommend caution in relying on personal experience. Humans have powerful built-in mental biases that lead us to conclude that positive outcomes are caused by something we did to induce them. This makes it very difficult for us to distinguish between coincidence and cause. And it’s one reason we need the scientific method, which is designed to help us reduce the impact of these biases.

Replication matters in assessment development, too

Over the last couple of decades, I’ve looked at the reliability & validity evidence for many assessments. The best assessment developers set a pretty high replication standard, conducting several validity & reliability studies for each assessment they offer. But many assessment providers—especially those serving businesses—are much more lax. In fact, many can point to only a single study of reliability and validity. To make matters worse, in some cases, that study has not been peer reviewed.

Be wary of assessments that aren’t backed by several studies of reliability and validity.

Please follow and like us:

The assessment triangle: correctness, coherence, & complexity

How to use the assessment triangle diagnostically

An ideal educational assessment strategy—represented above in the assessment triangle—includes three indicators of learning—correctness (content knowledge), complexity (developmental level of understanding), and coherence (quality of argumentation). Lectical Assessments focus primarily on two areas of the triangle—complexity and coherence. Complexity is measured with the Lectical Assessment System, and coherence is measured with a set of argumentation rubrics focused on mechanics, logic, and persuasiveness. We do not focus on correctness, primarily because most assessments already target correctness.

At the center of the assessment triangle is a hazy area. This represents the Goldilocks Zone—the range in which the difficulty of learning tasks is just right for a particular student. To diagnose the Goldilocks Zone, educators evaluate correctness, coherence, and complexity, plus a given learner’s level of interest and tolerance for failure.

When educators work with Lectical Assessments, they use the assessment triangle to diagnose students’ learning needs. Here are some examples:

Level of skill (low, average, high) relative to expectations
Case 1highlowhigh
Case 2highhighlow
Case 3lowlowhigh
Case 4highhighhigh

Case 1

This student has relatively high complexity and correctness scores, but his performance is low in coherence. Because lower coherence scores suggest that he has not yet fully integrated his existing knowledge, he is likely to benefit most from participating in interesting activities that require applying existing knowledge in relevant contexts (using VCoL).

Case 2

This student’s scores are high relative to expectations. Her knowledge appears to be well integrated, but the low correctness suggests that there are gaps in her content knowledge relative to targeted content. Here, we would suggest filling in the missing content knowledge in a way that engages the learner and allows her to integrate it into her well-developed knowledge network.

Case 3

The scores received by this student are high for correctness, while they are low for complexity and coherence. This pattern suggests that the student is memorizing content without integrating it effectively into his or her knowledge network—and may have been doing this for some time. This student is most likely to benefit from applying their existing content knowledge in personally relevant contexts (using VCoL) until their coherence and complexity scores catch up with their correctness scores.

Case 4

The scores received by this student are high for correctness, complexity, and coherence. This pattern suggests that the student has a high level of proficiency. Here, we would suggest introducing new knowledge that’s just challenging enough to keep her in her personal Goldilocks zone.

Summing up

The assessment triangle helps educators optimize learning by ensuring that students are always learning in the Goldilocks Zone. This is a good thing, because students who spend more time in the Goldilocks Zone not only enjoy learning more, they learn better and faster.

Please follow and like us:

Straw men and flawed metrics

khan_constructivistTen years ago, Kirschner, Sweller, & Clark published an article entitled, Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching.

In this article, Kirschner and his colleagues contrast outcomes for what they call "guidance instruction" (lecture and demonstration) with those from constructivism-based instruction. They conclude that constructivist approaches produce inferior outcomes.

The article suffers from at least three serious flaws

First, the authors, in making their distinction between guided instruction and constructivist approaches, have created a caricature of constructivist approaches. Very few experienced practitioners of constructivist, discovery, problem-based, experiential, or inquiry-based teaching would characterize their approach as minimally guided. "Differently guided" would be a more appropriate term. Moreover, most educators who use constructivist approaches include lecture and demonstration where these are appropriate.

Second, the research reviewed by the authors was fundamentally flawed. For the most part, the metrics employed to evaluate different styles of instruction were not reasonable measures of the kind of learning constructivist instruction aims to support—deep understanding (the ability to apply knowledge effectively in real-world contexts). They were measures of memory or attitude. Back in 2010, Stein, Fisher, and I argued that metrics can't produce valid results if they don't actually measure what we care about  (Redesigning testing: Operationalizing the new science of learning. Why isn't this a no-brainer?

And finally, the longitudinal studies Kirschner and his colleagues reviewed had short time-spans. None of them examined the long-term impacts of different forms of instruction on deep understanding or long-term development. This is a big problem for learning research—one that is often acknowledged, but rarely addressed.

Since Kirschner's article was published in 2006, we've had an opportunity to examine the difference between schools that provide different kids of instruction, using assessments that measure the depth and coherence of students' understanding. We've documented a 3 to 5 year advantage, by grade 12, for students who attend schools that emphasize constructivist methods vs. those that use more "guidance instruction". 

To learn more, see:

Are our children learning robustly?

Lectica rationale


Please follow and like us:

Lectica basics for schools

If you are a school leader, this post is for you. Here, you'll find information about Lectica, it's mission, and our first electronically scored Lectical Assessment—the LRJA.


Lectica, Inc. is a 501(c)(3) charitable corporation. It's mission is to build and deliver learning tools that help students build skills for thinking and learning. These learning tools are backed by a strong learning model—the Virtuous Cycle of Learning (VCoL+7™)—and a comprehensive vision for educational testing and learning, which you can learn more about in our white paper—Virtuous cycles of learning: Redesigning testing during the digital revolution

We have spent over 20 years developing our methods and the technology required to deliver our learning tools—known as Lectical™ Assessments or DiscoTests®—at scale. These assessments are backed by a large body of research, including ongoing investigations of their validity and reliability. Here are some links to research reports:

The following video provides an overview our research and mission:

Current offerings

In the fall of 2016, we introduced our first electronically scored Lectical Assessment—the LRJA (an assessment of reflective judgment/critical thinking). The LRJA can be used in research and program evaluation as a summative assessment, or in the classroom as a formative assessment—or both.

The best way to learn about the LRJA is to experience it first-hand at lecticalive. Just click on this link, then select the "go straight to the demo" button. On the next page, fill in the sign up form with the educational level of your choice. Click "submit", then, click on the "autofill" button (top right, under the header) to fill the responses form with an example. 

If you're interested in working with the LRJA or would like to learn more about using Lectical Assessments to optimize thinking and learning, please contact us.

Please follow and like us:

Second language learning predicts the growth of critical thinking

On November 20th, 2016, we presented a paper at the ACTFL conference in Boston. In this paper, we described the results of a 4-year research project, designed to address the question, "Does second language learning support the development of critical thinking as measured by the LRJA?". To learn more, view the presentation below.


Please follow and like us:

Are our children learning robustly?

There are at least four reasons why people should learn robustly:

  1. It's fun!
  2. They'll learn more quickly.
  3. They'll keep growing longer.
  4. They'll be better prepared to participate fully in adult life.

Truly, there are no downsides to learning robustly. Yet robust learning is not what's happening for most students in most American schools. We have mounting—and disturbing—evidence that this is the case. 

The data in the figure below are from our database of reflective judgment assessments. These are open-response formative assessments of how well people think about and address thorny real world problems like bullying, television violence, dietary practices, and global warming. We've been delivering these assessments for several years now and have a diverse sample of over 20,000 completed assessments to learn from. 

We wanted to know how well schools are supporting development and what kind of role learning robustly might play in their performance. (Watch the video above to learn more about what counts as evidence of robust learning.) In particular, we wanted to know why students in one school—the Rainbow Community School—are outperforming students in other schools. (To learn about the Rainbow curriculum, click here.) 

We first looked at one of the key sources of evidence for robust learning—the quality of students' arguments. In the figure below, the Y axis represents the quality or "coherence" of students' arguments and the X axis represents their Lectical phase (or developmental phase, 1/4 of a Lectical Level). The highest coherence score students can receive is a 10.

In this figure, the Rainbow Community School is the clear leader, especially when it comes to students performing in lower phases, with inner-city (primarily low socioeconomic status) public schools at the low end, and more conventional private schools and high socioeconomic status public schools in the middle. So, how does this relate to student development? Since we regard coherence of argumentation as strong evidence of robust learning, and assert that robust learning is required to support optimal development, we would expect Rainbow students to develop more rapidly than students in schools with lower coherence scores.

Coherence by phase and school type

The figure below tells the story. When it comes to students' development on the Lectical Scale, Rainbow Community School students are way ahead of the pack. And our inner city schools are way behind. In fact, the average senior in our large (over 10,000 assessments) inner city sample is 5 years behind the projected score for the average senior in the Rainbow sample. Or in other words, inner city seniors, on average, are performing at the same level as Rainbow 7th graders.   

We know socioeconomic status is a factor that contributes to this gap, but shouldn't our schools be closing it rather than allowing it to grow larger? Take a look at the figure below. This figure assumes that students in the Rainbow Community School, on average, start out at about the same developmental level as students in private and high SES public schools, yet student growth is faster. In fact, the data project that Rainbow 9th graders would perform as well as seniors in the other schools. That's a 3-year advantage! We believe this difference is due to differences in instructional practices. What if we used these same practices in our inner city schools? If we could accelerate their learning as much as the Rainbow Community School has accelerated the learning of its students, inner-city students would be doing as well as private and high SES public schools!

Although socioeconomic status is a key factor, we think the differences seen here are at least partially due to fundamentally different ways of thinking about learning and teaching. Conventional schools tend to be primarily content focused. There is an emphasis on learning as remembering. The Rainbow Community School is skill focused. Its teachers use content as a vehicle for building core life skills, such as skills for learning, inquiry, evaluating information, making connections, communicating, conflict resolution, decision making, mindfulness, compassion, and building relationships. To build these skills students continuously engage in virtuous cycles of learning—cycles of information gathering, application, reflection, and goal setting—that exercise these skills while building robust connections between new and existing knowledge. Students not only learn content, they learn to use it effectively in their everyday lives. It becomes part of them. We call this embodied learning.

We're eager to study the impact of skill-focused curricula on the learning of less advantaged students. If you know of a school that's fostering robust learning AND serving disadvantaged students, we'd like to help them show off what they're accomplishing.

Note: Not only does Rainbow Community School ensure that its students are continuously engaged in VCoLs (virtuous cycles of learning), it uses a system of governance, Sociocracy, that supports virtuous cycling for everyone on staff as well as the continuous improvement of its curriculum. 

Appendix: Sample responses from 8th graders in different schools

Examples are taken from performances of students with average scores for their school. 

The question students answered: How is it possible that the two groups [pro and anti bullying] have such different ideas?

Rainbow Community School

It could be due to different experiences. Perhaps the ones going for the argument that a little bullying can be okay were disciplined more at home and have a tougher shell for things like this. [Parents] may base their initial ideas on their own experiences or their children's. It all really depends on the person and how they were raised.

High SES public School

This because they have different ideas and reasons for thinking what they believe and you can't change that. The parents are not the same and every one of them is different so they have a right to believe what they want to believe.

Low SES public school

Many people think different and many people look at things differently. So people get different ideas and opinions about things.

Please follow and like us:

What is a holistic assessment?

Thirty years ago, when I was a hippy midwife, the idea of holism began to slip into the counter-culture. A few years later, this much misunderstood notion was all the rage on college campuses. By the time I was in graduate school in the nineties there was a impassable division between the trendy postmodern holists and the rigidly old fashioned modernists. You may detect a slight mocking tone, and rightly so. People with good ideas on both sides made themselves look pretty silly by refusing, for example, to use any of the tools associated with the other side. One of the more tragic outcomes of this silliness was the emergence of the holistic assessment.

Simply put, the holistic assessment is a multidimensional assessment that is designed to take a more nuanced, textured, or rich approach to assessment. Great idea. Love it.

It’s the next part that’s silly. Having collected rich information on multiple dimensions, the test designers sum up a person’s performance with a single number. Why is this silly? Because the so-called holistic score becomes pretty-much meaningless. Two people with the same score can have very little in common. For example, let’s imagine that a holistic assessment examines emotional maturity, perspective taking, and leadership thinking. Two people receive a score of 10 that may be accompanied by boilerplate descriptions of what emotional maturity, perspective taking, and leadership attitudes look like at level 10. However, person one was actually weak in perspective-taking and strongest in leadership, and person two was weak in emotional maturity and strongest in perspective taking. The score of 10, it turns out, means something quite different for these two people. I would argue that it is relatively meaningless because there is no way to know, based on the single “holistic” score, how best to support the development of these distinct individuals.

Holism has its roots in system dynamics, where measurements are used to build rich models of systems. All of the measurements are unidimensional. They are never lumped together into “holistic” measures. That would be equivalent to talking about the temperaturelength of a day or the lengthweight of an object*. It’s essential to measure time, weight, and length with appropriate metrics and then to describe their interrelationships and the outcomes of these interrelationships. The language used to describe these is the language of probability, which is sensitive to differences in the measurement of different properties.

In psychological assessment, dimensionality is a challenging issue. What constitutes a single dimension is a matter for debate. For DTS, the primary consideration is how useful an assessment will be in helping people learn and grow. So, we tend to construct individual assessments, each of which represents a fairly tightly defined content space, and we use only one metric to determine the level of a performance. The meaning of a given score is both universal (it is an order of hierarchical complexity and phase on the skill scale) and contextual (it is provided to a performance in a particular domain in a particular context, and is associated with particular content.) We independently analyze the content of the performance to determine its strengths and weaknesses—relative to its level and the known range of content associated with that level—and provide feedback about these strengths and weaknesses as well as targeted learning suggestions. We use the level score to help us tell a useful story about a particular performance, without claiming to measure “lenghtweight”. This is accomplished by the rigorous separation of structure (level) and content.

*If we described objects in terms of their lengthweight, an object that was 10 inches long and 2 lbs could have a lengthweight of 12, but so could an object that was 2 inches long and 10 lbs.

Please follow and like us:

Kegan’s Subject-Object Interview and the LSUA

Before I write about the relation between Kegan's Subject-Object Interview and the LSUA (the Lectical Self-Understanding Assessment), I'd like to explain some differences between these assessments. First, the SOI is both an interview and an assessment system. It was developed by studying the interviews of a small sample of respondents (Does anyone know how many?) who were interviewed on several occasions over the course of several years (Again, does anyone know how many or how often?). The level definitions and the scoring criteria in the SOI are tied to the subject matter of the interviews in the original sample (construction sample). For this reason, the SOI is called a domain-specific assessment. Researchers would say that the levels were defined by "bootstrapping" from the longitudinal data. Critiques of this kind of assessment point to bias in their level definitions (due to their small and culturally narrow construction samples), the related conflation (confusion) of particular conceptual content with developmental levels, and a weak articulation of the lowest levels, which are not based on direct empirical evidence from appropriate-aged respondents.

With respect to the LSUA, I want to clarify that it is scored with the Lectical Assessment System (LAS), a content-independent developmental scoring system that was created, in part, by identifying the dimension that underlies all longitudinally bootstrapped developmental assessment systems*. The SOI was one of the assessment systems I studied on the way to developing the LAS. Consequently, if the LAS does what it is supposed to do, it should capture the developmental dimension that underlies Kegan's system even better than his scoring system, because the LAS is a second generation developmental scoring system that is not restrained by a content-driven scoring process (Dawson, 2002; Dawson, Xie, & Wilson, 2003: There is much written about this in our published work, available on our web site.)

What is the relation between the LSUA and the Subject-Object Interview?

This is a difficult question to answer, partly because there is no research that directly compares the SOI and the LSUA. However, because the LAS is a domain independent scoring system that can be used to score any text that includes judgments and justifications, I have used it to score the SOI scoring manual. The developmental sequence for SOI levels 3 to 5 corresponds well to the dimension captured by the LAS, and levels 3-5 correspond roughly with Lectical Levels 10-12. However, Kegan's lower levels do not match up as well, possibly because his construction sample (the sample used to define his levels), as far as we can determine, did not include young children. (Kegan's original research was never published in a form that would allow us to evaluate the approach he took to defining his levels or the reliability and validity of the SOI. All we can locate are a few very small studies of inter-rater reliability, most of which are unpublished [Kegan, 2002].)

Comparisons of the Subject-Object Interview with other developmental assessment systems

There is some research comparing the SOI with other developmental assessment systems. In general, this research finds that the SOI and these other systems are likely to tap the same developmental dimension (see Pratt, et. al., 1991).

Ideally, we would like to conduct a direct comparison of the LAS and the scoring system Kegan developed to score the SOI, as we have done with other developmental assessment systems. (We are working with a graduate student who is planning do do this kind of comparison.) In the mean time, we can point to comparisons between the LAS and several other developmental assessment systems (Kohlberg, Armon, Kitchener & King, Perry) that were developed using methods similar to those used by Kegan, and have routinely found strong correlations (above .85) between these scoring systems and the LAS, especially when they are used to score the same material (Dawson, 2000, 2001 2002a, 2004; Dawson, Xie, & Wilson, 2003 ).

Finally, some of Kegan's level definitions are almost identical to those of Kohlberg and Selman. In fact, I would argue that they are primarily an extension of Selman's original work on socio-moral perspective, which has informed most domain-based developmental assessment systems (including all of the systems mentioned here) since it was introduced in the 1960's (and was a great help to me when I was developing the LAS).

*The claim that there is a single developmental dimension that underlies these systems is NOT the same thing as a claim that an individual will be at the same level in different knowledge/skill areas.


Commons, M. L., Armon, C., Richards, F. A., Schrader, D. E., Farrell, E. W., Tappan, M. B., et al. (1989). A multidomain study of adult development. In D. Sinnott, F. A. Richards & C. Armon (Eds.), Adult development, Vol. 1: Comparisons and applications of developmental models. (pp. 33-56). New York: Praeger Publishers.

Dawson, T. L. (2000). Moral reasoning and evaluative reasoning about the good life. Journal of Applied Measurement, 1(4), 372-397.

Dawson, T. L. (2001). Layers of structure: A comparison of two approaches to developmental assessment. Genetic Epistemologist, 29, 1-10.

Dawson, T. L. (2002a). A comparison of three developmental stage scoring systems. Journal of Applied Measurement, 3, 146-189.

Dawson, T. L. (2002b). New tools, new insights: Kohlberg’s moral reasoning stages revisited. International Journal of Behavioral Development, 26, 154-166.

Dawson, T. L., Xie, Y., & Wilson, M. (2003). Domain-general and domain-specific developmental assessments: Do they measure the same thing? Cognitive Development, 18, 61-78.

Dawson, T. L. (2004). Assessing intellectual development: Three approaches, one sequence. Journal of Adult Development, 11, 71-85.

Kegan, R. (2002). A guide to the subject-object interview. Unpublished Scoring manual. Harvard Graduate School of Education.

King, P. M., Kitchener, K. S., Wood, P. K., & Davison, M. L. (1989). Relationships across developmental domains: A longitudinal study of intellectual, moral, and ego development. In M. L. Commons, J. D. Sinnot, F. A. Richards & C. Armon (Eds.), Adult development. Volume 1: Comparisons and applications of developmental models (pp. 57-71). New York: Praeger.

Lambert, H. V. (1972). A comparison of Jane Loevinger's theory of ego development and Lawrence Kohlberg's theory of moral development. University of Chicago, Chicago, IL.

Pratt, M. W., Diessner, R., Hunsberger, B., Pancer, S. M., & Savoy, K. (1991). Four pathways in the analysis of adult development and aging: Comparing analyses of reasoning about personal-life dilemmas. Psychology & Aging, 6, 666-675.

Sullivan, E. V., McCullough, G., & Stager, M. A. (1970). A developmental study of hte relationship between conceptual, ego, and moral development. Child Development, 41, 399-411.

Please follow and like us: