K-12 Testing: To Infinity and Beyond Reason (Part One)

Doug Wren
Aug 27, 2021
6 min read

Updated: Sep 24

It’s back to school time and kids everywhere are excited about the prospect of taking lots of tests to measure their “learning loss” during the pandemic and prepare them for state-mandated tests at the end of the school year. In all honesty, testing is not one of the reasons most children want to return to the classroom. But it is true that many school districts will require teachers to administer a barrage of tests to their students. What lessons can we learn from America’s preoccupation with testing?

While tests have been part of our educational system since the 19th century (National Education Association, 2020), the number of standardized tests given to K-12 students skyrocketed after the No Child Left Behind Act (NCLB) became law nearly 20 years ago (Walker, 2015). NCLB’s replacement, the Every Student Succeeds Act (ESSA), retained its predecessor’s requirement that every child in grades 3-8 be tested annually in reading and math.

It makes sense to hold public schools accountable for student learning; however, the use of standardized test scores as the primary measure of learning in US schools has resulted in a variety of unforeseen consequences, most notably the loss of instructional time. Based on survey data from over 3,000 teachers, experts estimated that students typically spend 45 school days per year preparing for and taking district- and state-mandated tests (Robelen, 2016). In some districts, testing and test-related activities involve 60 or more days each year.

The overemphasis on testing also affects students’ physical and emotional health (Simpson, 2016), and studies have linked teacher stress, burnout, and attrition to test-centered accountability practices at schools (Ryan et al., 2017). Knowing they are being held accountable for their students’ test scores, some educators buckle and resort to unethical practices such as item-teaching (Popham, 2001) and changing answers on tests (Chen, 2021).

Yet another outcome of American education’s misguided covenant with standardized testing is the deprofessionalization of teaching. In April 2019, the president of the American Federation of Teachers cited low wages, micromanagement, the forced focus on standardized testing, and teachers’ lack of freedom to teach the way they want as the main culprits that are “killing the soul of teaching” (Will, 2019).

Despite what many citizens, politicians, and (shockingly) educational leaders believe, excessive test prep and testing does not lead to improved student achievement. Evidence from the National Center for Education Statistics indicates the opposite. Take a look at the global rankings of American 15-year-olds over a 16-year period (pre-NCLB through ESSA):

Reading – US went from 15th in 2000 to 24th in 2015
Math – US went from 18th in 2000 to 40th in 2015
Science – US went from 14th in 2000 to 25th in 2015

Source: ProCon/Encyclopaedia Britannica, 2020

If you’re a veteran teacher, none of this should surprise you. As for me, I was fortunate to have taught at a time when teachers had more autonomy. Because my principals recognized that I was a competent educator, they treated me like a professional and let me do my job. For the last 12 years of my career, I served as educational measurement & assessment specialist in the central office of a large school district. There I witnessed several ill-advised decisions related to the use of tests.

Today’s story begins in 1995. I was a first-grade teacher in the suburbs of Atlanta during the day and a commuting grad student at night. Online classes were not really a thing yet, and the UGA College of Education required doctoral students to attend classes on campus. My favorite course was EPY 700: Educational Tests & Measurements. The class involved some statistics, but from an educator’s perspective, it was mostly common sense. My experience in EPY 700 led to subsequent coursework and dissertation research focused on test development. In this context, the word “test” includes all types of measurement instruments: checklists, questionnaires, surveys, rating scales, observation tools, as well as traditional and alternate assessments.

Thirteen years later, I took job as an assessment specialist in the research office of another school district. The district was considered by many to be forward-thinking – the best in the region. My boss was eager for me to start reviewing the district’s local assessments, multiple-choice tests administered quarterly in the core content areas. Teachers and administrators used data from local assessments to gauge students’ attainment of the learning objectives that would be assessed on end-of-year, state-mandated tests.

With the help of a data specialist, I analyzed every test item (i.e., question) by using student data from previous administrations of the assessments. The data revealed which questions worked the way multiple-choice items are supposed to work and which tests were reliable. I didn’t even need to look at the actual test items to make recommendations for improving faulty items and overall test reliability. While the analysis formulas were somewhat complicated, my recommendations were not. I met with coordinators from the Department of Curriculum & Instruction (C&I)—the people who created and were responsible for these tests—to explain in layperson’s terms what should be done to make their tests better, suggestions such as:

Item #4 is too easy – 99% of examinees got it right. Unless you’re trying to identify a handful of students who didn’t master this objective, you should beef up the item.
Distractors C and D on Item #17 are not plausible (distractors are the incorrect choices on a multiple-choice test) – only 3% of examinees chose either C or D. The remaining 97% of examinees can easily remove C and D from consideration, make a guess between A and B, and have a 50/50 chance of getting it right. Can you replace C and D with more credible distractors?
Item #25 is a big problem – kids who did well on the entire test are missing this item at a higher rate than kids who did poorly on the test. Either discard or revise the item.

Much to my surprise, not all of the coordinators were pleased to have this information. Their bosses, the executive directors for elementary and secondary C&I, seemed lukewarm. I found out later that the directors told their subordinates they were under no obligation to revise the tests based on my recommendations. Some coordinators went back to the drawing board; others didn’t change a thing on their tests.

Over the next several years, the number of locally developed assessments increased, due to the belief among a few high-level administrators that more information from more tests would result in more children passing more state-mandated tests at the end of the school year. Part of their rationale was that the local assessments could predict each student’s success or failure on the end-of-year tests. It reached the point where principals were complaining vociferously about the lack of instructional time due to excessive testing requirements from “downtown.” The superintendent agreed and the local assessments became optional, although some principals continued to force their teachers to give every test. Soon afterwards, our superintendent left for a higher paying superintendent’s position in neighboring state.

Mandatory local assessments re-emerged with a vengeance two years later. Once again, it was a few downtown administrators who made the call, which was that the tests needed to be longer to adequately prepare students for the state tests. The idea was that lengthy multiple-choice tests would build students’ stamina for the grueling end-of-year tests. That’s what they told me when I made the lateral move to C&I.

To this day, I’m still looking for empirical research suggesting that children’s test-taking stamina can be increased by taking really long tests. To be fair, various test prep websites offer tips on building stamina for testing (e.g., Tungsten Prep, 2020; Tutoring Service of New York, 2020), but the tips are developmentally appropriate for high school students preparing for the ACT or SAT, not elementary and middle school kids.

Midway through the “Year of the Long Tests,” I received permission from my new boss, the Chief Academic Officer, to administer a survey to obtain staff feedback on our local assessments. Survey data confirmed that testing overkill was a problem across the district. Teachers, assistant principals, and principals consistently reported the following:

The reading and math tests were oppressively long.
Some students were taking three full class periods to complete a single reading test.
Computer labs and laptops in schools were not available for anything except testing.
Students and teachers had developed negative feelings towards tests in general.

Consequently, the number of tests decreased the next year and the length of quarterly reading and math tests was substantially reduced. I continued to review the local assessments in the core content areas, though it was business as usual. A new set of executive directors for elementary and secondary C&I—while thanking me for my efforts—were not very concerned whether their coordinators heeded my recommendations for revising tests. Thankfully, I did have a few good customers, including the elementary language arts and math coordinators.

It’s been almost two years since I retired, but I still keep up with happenings in my old district. I heard recently that mandatory local assessments will be rolled out in force for the 2021-22 school year. This reminded me of George Santayana’s famous quote: “Those who cannot remember the past are condemned to repeat it” (1905, p. 172)

As we reach the end of today’s story, I invite you to return to Edjacent’s blog page in two days to read the second part of K-12 Testing: To Infinity and Beyond Reason. Among other topics, Part 2 addresses these issues: