Grade-level assessments reveal only a small slice of student learning.

Two decades ago, the national conversation about how to improve our public education system was full of energy and new ideas. Policy makers across the political spectrum were eager to adopt reforms that would raise the quality of the American educational experience while boosting student outcomes and ensuring our kids would remain competitive in what was then an already globalizing economy.

The momentum around public education reform ultimately generated significant improvements in two key areas: standards and accountability. No Child Left Behind (NCLB), which President George W. Bush signed into law in 2001, set new federal expectations for school performance and introduced financial consequences for chronic substandard outcomes. The law shone a spotlight on struggling schools and districts that needed more help and resources.

However, NCLB possessed at least one major blind spot: It did little to measure students’ actual academic growth. By punishing and rewarding schools on the basis of a narrow definition of proficiency — whether their students were mastering grade-level content — the law neglected other key indicators of student learning, such how much progress students were making on other standards. As a result, the entire educational ecosystem, including classroom instruction, was oriented around a single and very limited measure of student performance.

Because assessments and accountability focus narrowly on grade-level proficiency, teachers tend to focus their instruction only on grade-level proficiency, instead of meeting students where they are.

President Barack Obama’s modified version of NCLB, the Every Student Succeeds Act (ESSA), took a step to correct this problem by directing states to capture student “growth or another valid or reliable measure” as part of their accountability systems. For many, this was a positive development. In the stakeholder meetings held throughout the country to help guide the decision making for ESSA, representatives from 48 states told the Obama administration that it was important to measure growth more accurately.

And yet, as my colleagues and I at New Classrooms Innovation Partners demonstrate in a new report, The Iceberg Problem, current assessment and accountability systems still fail to measure how much students actually learn. We found that all states are still testing students and measuring instructional effectiveness on the basis of grade-level performance. As a result, these assessments measure only a slice of student learning and cannot precisely demonstrate how much or how little a student has learned. It’s like an iceberg, where only a very small amount of information is visible, while the bulk of the information remains hidden from view. Our accountability and assessment systems unfortunately measure just one tiny portion of students’ knowledge. Meanwhile, the real truth of what students are learning — or not learning — remains hidden. Because assessments and accountability focus narrowly on grade-level proficiency, teachers tend to focus their instruction only on grade-level proficiency, instead of meeting students where they are.

The trouble with relying on grade-level proficiency

Imagine a student who enters the 5th grade achieving at a 2nd-grade level. She works hard, and by the end of the school year, she performs at a 4th-grade level — that’s a lot of academic growth, a full two years’ worth in the span of one. However, the state test she takes at the end of the year is designed only to measure how close she is to the 5th-grade proficiency targets, and it doesn’t ask her to demonstrate her mastery of 3rd- and 4th-grade material. Thus, it fails to pick up on the remarkable progress she made. Her performance is simply determined to be “below proficient” for the 5th grade. Much like the underwater portion of the iceberg, the bulk of her achievement remains invisible.

When it comes to advanced students, too, tests that measure grade-level proficiency are blind to a lot of academic progress. For instance, say that another 5th grader races ahead and finishes the year performing at a 6th-grade level. Since the state’s summative assessment includes only items pegged to the 5th-grade standards, it will not reveal this student’s impressive mastery of 6th-grade material. His extra year’s growth, like that of the other student, will be invisible.

As a teacher and principal, I saw firsthand how these narrow state assessments failed to reflect the complete picture of how much or how little our students were learning. At times, my colleagues and I attempted to design our own interim assessments, but they were clunky and did not fully capture the learning we were seeking to measure. The only way to solve this problem, I realized, would be for the state to make a systematic effort to improve the sensitivity of its tests.

How accountability systems fail to measure growth

As we note in our paper, ESSA has placed much greater importance than NCLB on efforts to measure student growth on state tests, so as to reward the actual progress students make rather than just checking to see whether they reached a particular benchmark. In response, states have adopted a variety of methods of doing so. Some use criterion-based growth metrics to measure the degree to which students are closer to meeting grade-level expectations than they were the previous year. Others use normative approaches to measure how students’ test scores compare to those of students with similar past performance. Others use a hybrid of the two.

The strengths and weaknesses of these metrics have been widely debated in education policy circles. However, these debates have largely overlooked the elephant in the room: All of these new metrics are designed to record student growth on traditional grade-level assessments. No matter how sophisticated they are, they cannot account for students’ learning of material that the tests themselves do not assess — namely, content above or below grade level. The new metrics have become more sensitive to, say, 5th graders’ growing mastery of 5th-grade material, but they’re still blind to 5th graders’ progress in learning 4th- and 6th-grade material.

I was faced with the limitations of growth measures when I served as chief academic officer for the state of Delaware from 2013 to 2018. Even as our students were making major strides in college and career readiness, and even though our state won a competitive $119-million grant under President Obama’s Race to the Top initiative, our teachers were telling us that our assessments lacked the precision they needed to truly measure their students’ growth. While the state had designed new adaptive assessments for math during my tenure, those assessments were still grade-level focused, as required under federal law. So in the end, we were left with an adaptive assessment that could not provide a comprehensive picture of each student’s growth.

Some state policy makers may disagree with me, arguing that they have in fact figured out a way to measure student growth using grade-level assessments, even when students are below or above grade-level. However, it turns out that such claims are dubious, especially in math.

Often, we find that school leaders and policy makers alike do not fully understand the limitations of an annual summative grade-level assessment and speak about state growth metrics as if they were synonymous with student learning progress. In fact, the “growth” that districts and states generally report on could more accurately be described as what my colleague Joel Rose calls “changes in relative performance,” because the content of each grade’s assessment can be quite different. Similarly, state and district leaders may be under the misimpression that scale scores from summative assessments that are linked to one another (known as vertical scaling) can be used to precisely measure student learning growth. While such comparisons are strongly supported when tests are comparable to one another, comparing grade-level summative assessments — which have varied content from one year to the next — is far weaker and inappropriate for use in higher-stakes contexts (Patz, 2007). Recent guidance published by the National Education Policy Center was explicit about the misuse of vertically aligned instruments to measure student growth in high-stakes contexts (Chatterji, 2019). Behind closed doors — and even in the technical advisory meetings that ESSA requires for every state — most will agree that these systems are not precise enough to measure individual, teacher, or school performance. Sadly, this knowledge rarely gets funneled up to the appropriate policy makers who are ultimately responsible for assessment and accountability systems.

The problems with growth measures are especially evident when it comes to efforts to measure growth in mathematics. Math is cumulative. Mathematical concepts and skills build upon one another as students advance through school. The instruction that students receive reﬂects a coherent body of knowledge made up of interconnected concepts and designed around coherent progressions from grade to grade, with students building their new understanding on top of foundations laid in previous years. Thus, an 8th grader performing at the 5th-grade level in math cannot simply leap over the material from grades 6 and 7 and start performing at the 8th-grade level. Yet, any assessments given at the beginning and end of the year to determine growth will remain focused on what is expected of students in the 8th grade. They will provide little help in determining where the student is at the beginning of the year, making it difficult for the teacher to adapt instruction accordingly. And any skills from the 6th and 7th grade that the student is able to master during the year will not appear on the assessment.

Flawed assessments lead to flawed instruction

The problem here goes deeper than the fact that states are not accurately measuring the totality of a student’s academic growth. The focus on grade-level proficiency can cause skill gaps to grow in ways that make it harder for some students to achieve college and career readiness.

The focus on grade-level proficiency can cause skill gaps to grow in ways that make it harder for some students to achieve college and career readiness.

When 6th-grade students are taught 6th-grade material, some of those skills will be learned and some will go unlearned for a variety of reasons — such as lack of prerequisite knowledge, uneven teacher quality, or student absences. The next year, as the focus of accountability shifts to the 7th-grade assessment, many of the unlearned skills from 6th grade remain unaddressed, even though those very skills may be essential to mastering 7th-grade content. By 8th grade, even more learning gaps accumulate, so that by the time students enter high school, they are simply unprepared for more advanced mathematical topics.

While policy makers focus on how to ensure that students are performing relative to grade-level assessments, learning gaps continue to accumulate below the surface, making longer-term success harder to achieve. To unlock students’ full potential requires seeing them more as individuals than as a homogeneous group enrolled in a particular grade level. Yet the policies that undergird statewide assessments send an unmistakable signal to middle-grade math teachers: Focus your instruction on the grade-level standards. This emphasis may be at odds with what is truly best for each student, given their possible unfinished learning from prior years. Policy makers may have intended for tests to serve as an educational “dipstick,” allowing us to gauge how students are performing at specific points in time. But in reality, those tests are driving rigid instructional practices that can cause some students to fall further behind.

The unfortunate truth is that millions of students, including the vast majority of students from historically disadvantaged communities, are coming to middle school with unfinished learning from elementary school, and to high school with unfinished learning from the middle years. This places an immense burden on teachers not only to cover grade-level material but also to diagnose and fill each student’s unique learning gaps from previous grades — all within a single school year. That’s a tall order for even the most talented of teachers, and the challenge becomes even more daunting over time, as learning gaps continue to accumulate.

Students arriving in middle and high school multiple years behind grade-level standards need a viable instructional bridge that enables them to catch up and move ahead. This requires a strategic mix of pre- and on-grade skills, often for more than a single school year. However, today’s assessment and accountability policies, oriented around annual grade-level proficiency, make it much harder to pursue such a flexible instructional approach.

What better growth measurements could look like

For states and districts seeking to measure comprehensive learning growth, our report offers a number of recommendations, some of which can be incorporated into revised state ESSA plans right now. One idea is to adopt adaptive assessments that incorporate standards from multiple grade levels to better measure growth during the year — and there’s no reason why states can’t do this in conjunction with their existing grade-level state assessment.

ESSA’s requirement of a ﬁfth indicator provides a golden opportunity for states to use the adaptive assessments that currently exist. The ﬁfth indicator is broadly deﬁned by ESSA as a measure of school quality and student success selected by states. Many states have chosen chronic absenteeism or other nonacademic indicators for this category, but nontraditional academic measures are also an option. In Nebraska, for example, all students in testing grades in districts and charters across the state have the opportunity to take adaptive tests that span multiple grades. The changes in school growth percentiles on such assessments could be incorporated into a fifth indicator score, yielding much more accurate growth measurements.

Another opportunity is emerging with ESSA’s innovative assessment demonstration authority, commonly called innovative assessment waivers. These waivers allow up to seven states to design and pilot different types of summative tests that could roll out statewide after several years. Four states — Georgia, Louisiana, North Carolina, and New Hampshire — have already received approval from the U.S. Department of Education to design new ways of measuring student growth.

An especially promising example is emerging in Georgia, which is arguably leading the nation in pushing for new assessment systems that are better suited to helping students prepare for college and career. The federal government recently approved Georgia’s plan for a new adaptive assessment system that will be administered three times during the school year. Georgia Measures of Academic Promise (GMAP) will be based on the state’s standards, but instead of measuring only grade-level proficiency, it will “provide longitudinal growth data, instructionally relevant insights, and summative proficiency scores” by measuring student learning on, below, and above grade level. (Much of Georgia’s plan is aligned with our report’s recommendations.)

If this approach proves successful, those assessments will produce more accurate growth results during the year, which teachers can then use to make instructional adjustments. The assessments will also yield a summative proficiency score for each student at the end of the year, without requiring them to take the customary end-of-year, grade-level, statewide test. This approach could establish a new assessment ecosystem in Georgia that measures both proficiency and comprehensive learning growth, providing educators and the entire system with necessary and more timely information. And it would enable the state, districts, and schools to more frequently monitor instructional practices and support those that are generating results.

Notably, political support for more meaningful growth measures played a key role in the plan’s development. The governor’s office, the Georgia Department of Education, and school districts and parents across the state have rallied behind the effort to measure learning more effectively.

Still, despite many encouraging signs of support and progress, Georgia officials would be wise to consider a number of challenges posed by federal law as they build their plan. First, the Innovative Assessment Demonstration Authority flexibility waivers currently do not come with any funding support for states. States like Georgia that are on the leading edge of reform need more than signals from the federal government; they need monetary support.

Second, while federal law attempts to encourage innovation, it does so with some significant limitations. For example, it requires students who participate in any pilot of a new assessment to take the traditional state test as well. Besides placing an unnecessary burden on districts looking to innovate, this requirement could also spur a backlash among families who worry about over-testing.

Finally, the federal pilots focus strictly on assessments, neglecting the connection between growth, proficiency, and accountability. We still lack strong feedback loops between teaching and learning, assessments, and accountability systems. Innovation, therefore, will necessarily be limited until states are able to create accountability pilots that incorporate all of these factors.

Despite these limitations, states have an opportunity and a responsibility to rethink how they measure student learning. If our nation’s public education system has any hope of reaching the countless students who fall through the cracks each day, it will need to reimagine how it measures and rewards comprehensive learning growth and then develop new learning models. Instead of continuing to gaze at just the tip of the iceberg, policy makers and educators must dive down and take stock of the whole thing.

References

Chatterji, M. (2019). A consumer’s guide to testing under the Every Student Succeeds Act (ESSA): What can the Common Core and other ESSA assessments tell us? Boulder, CO: National Education Policy Center.

Patz, R.J. (2007). Vertical scaling in standards-based educational assessment and accountability systems. Washington, DC: Council of Chief State School Officers.

Note: Portions of this article are excerpted from New Classrooms’ recent report, The Iceberg Problem, available at www.icebergproblem.org.