Jim AngermeyrJim Angermeyr

Two decades ago, Jim Angermeyr was one of a small group of educators and psychologists who came up with a new kind of standardized test, the growth model or value-added assessment, which has since become the gold standard.

Unlike the ubiquitous No. 2-pencil exams that measure how many correct answers a student can give, the Northwest Evaluation Association tests and others that use the same architecture are designed to get harder as long as students are acing them. When subjects start to falter, the answers get easier.

In addition to the dreaded, legally mandated Minnesota Comprehensive Assessments (MCAs), which reveal little data that is useful to teachers and administrators, most school districts now administer some version of Angermeyr’s test. Not only can it show where a particular child is with individual academic skills, administered in the fall and spring the exam shows whether a student made a year’s progress.

Next week, Angermeyr will retire from the last stop on a 37-year career in education, as director of research and evaluation for Bloomington Public Schools. (He is being replaced by his Minneapolis counterpart, Dave Heistad, which is a coup for Bloomington and cause for much weeping in the cities of the first class, but that’s another story.)

One would think, then, that Angermeyr would be thrilled to retire at the very apex of our national love affair with tests, the moment when the value-added assessment is being touted by policymakers throughout the country as a kind of cure-all for everything that ails education.

Perfect for its original purpose, but …

Rather, he is justifiably proud of his professional accomplishments, but deeply skeptical about the test-obsessed state of the field he’s leaving. As he sees it, his baby is terrific and perfect for its original purpose, but it’s being used to beat up on teachers, kids and principals — while failing to help close the achievement gap or better struggling schools.

We had to dog him all through MCA season, but the other day we finally got Angermeyr to sit for an exit interview. An edited version of that conversation follows.

MinnPost: Policymakers on both sides of the aisle now agree that the standardized testing mandated by No Child Left Behind had created a juggernaut of epic dimensions. The current thinking is that the way to fix this it to substitute the tests you worked on designing two decades ago. Yet I know from talking to you in the past that you are not a proponent of this, or of many of the ways we propose to use test data now.

Jim Angermeyr: I think among assessment professionals there’s always a recognition of the limitations of every test, including NWEA’s testing. We have a healthy respect for error and how to measure it. And always a certain amount of caution when you’re interpreting results.

That caution grows as the groups get smaller, like looking at a classroom instead of a whole school. And that caution grows even more when the stakes increase because increasing the stakes can lead to all kinds of distortions, whether it’s the cheating that goes on in some of schools that you’ve been reading about around the country, or whether it’s just the general over-emphasis on testing to the exclusion of other things.

MinnPost: How do you think we arrived at this place?

JA: To be honest, I think it’s a lot of external factors. It’s politicians and some policymakers who believe tests can do more than they really can. And there’s not enough people stopping and saying wait a minute. When you can summarize a whole bunch of complicated things in a single number, that has a lot of power and it’s hard to ignore, especially when it tells a story that you want to promote. And that’s where it gets really twisted.

MinnPost: So if you ran the universe, how would we use tests, and which tests?

JA: The tests we have, whether they’re state tests or commercially available tests, are by and large designed by modern psychometric theories, and they’re pretty sound. They do a good job of measuring some important outcomes. And they do a good job of aligning to the important standards. So it’s not the tests themselves.

Where the distortion comes in is that you can only test a limited amount of the domain. Even if it’s a domain like mathematics, you can’t cover everything. And so you make assumptions about kids’ skills in that broader domain. Do we have eighth graders who are good readers based on a pretty small sample of questions and items?

Testing professionals know that you’re just sampling the domain and you don’t try to make inferences further than that. But nonprofessionals do that all the time. “American students are 51st in the world in reading.” There are a lot of assumptions that are made before you can get to that conclusion, but people leap right over that.

If I was running the world, I would severely reduce the accountability stakes for tests. I would certainly eliminate things like No Child Left Behind. I would probably take away the current waiver. Even if it looks better, sometimes it’s still really the same wolf in different clothing.

I would do away with standards, to be honest. Even though on paper they sound kind of cool, they assume all kids are the same and they all make progress the same way and move in lockstep. And that’s just not accurate. Standards distort individual differences among kids. And that’s bad.

I would put testing back as a local control issue in school districts. I would take the emphasis off of evaluating and [compensating] teachers. I would put the emphasis on good training for principals and curriculum specialists and teachers on how to interpret data and use it for the kind of diagnosis and assessment that it was originally intended for.

MP: I have spent time in a handful of schools where the teachers use quizzes or assessments on the fly to determine how many kids got a lesson right, which ones need it delivered a different way and so on. They’ve literally kind of upped the number of children mastering the material by looking at that data. What do you think of that?

JA: Any activity that teachers use where they’re focused on “What do my kids know and understand?” as opposed to, “What have I covered and taught?” is a good thing. So I don’t want to denigrate this form of assessment as not valuable. But I don’t want to equate kind of assessment and the inferences you make from it with the kind of assessment that we’re generally talking about in my world or the state talks about or No Child Left Behind talks about. Those are designed to measure the broader domain of subject areas: “How well can kids read?” Not, “How well did kids learn the six things I was covering in my class today?”

So two different kinds of tests for two different purposes. And teachers would generally like the latter and should do more of it, but it can’t substitute for the former.

However, we are getting kids to certain levels of proficiency on broader domains. So I still think we need summative accountability tests, but we need them locally managed because the pressure is different.

MP: Say more about that.

JA: Let’s go back 18 years, 19 years, when I first introduced NWEA testing in Rosemount-Apple Valley-Eagan. It was not for accountability purposes. It was designed primarily to help measure the growth of students from one year to the next so that we could evaluate whether our curriculum programs were working or not. It was a program evaluation tool. And it also obviously informed parents about progress kids were making. But we had a lot of cautions in there about don’t over interpret this, kids go up and down from one year to the next, etc.

It was really for that local curriculum improvement process. There was never any sense that we would grade schools on this or that we would rank teachers from high to low or that we would try to show that one school was doing a better job than another school.

Once you introduce that kind of stuff, I just think the distortions grow exponentially. But the test itself was basically the same test we’re using today.

MP: Do you feel like a small voice in the wilderness?

JA: No. I don’t think I’m saying much different than a lot of people say. We don’t get listened to very much by politicians. I mean, if you read the cautions about [using] value-added measures for [measuring] teacher effectiveness or the cautions that people have raised about judging schools based on test scores, that stuff’s been around a long time. And the testing community pretty much speaks with one voice about that. I think you and I talked before about how the biggest fans of value-added tests are economists, not educators.

MP: Yes. Where I struggle is that before No Child Left Behind, we really did have vast groups of kids whom nobody really assessed, and the emperor was never naked. We made certain assumptions about them that we’re probably still making. This group is disadvantaged in this way and so our expectations can only go so far.

JA: I agree. I’d like to think that those places and those schools were few and far between, but I know there were some. I think there have always been weak school systems with weak principals and some weak teachers that just did exactly what you said. I don’t know if the spotlight of No Child Left Behind or high-stakes testing has changed those schools, maybe they have.

I look at the lists of schools that are struggling now with that spotlight and they have been struggling for a long time. There are examples. You’ve written about some and I’ve seen some where there really are some changes, but they’re so rare to be the exception that proves the rule.

So I don’t know. The grad rule is a good example. The amount of money we spend every year giving reading and math tests to students to give them a high school diploma has done absolutely nothing to improve the graduation rate; it’s done nothing to improve the quality of the graduates. The level of proficiency represented in those tests is so low I don’t think it’s changed anything.

Have there been a few kids who’ve had to struggle for a few more years and take whatever kind of interventions the schools can design to get them to pass it? Sure. Is it justified? When I think about all of the things that we could be spending that money on, I just don’t see that as very valuable. But the grad rule is one of those things you can never take away because politicians can point to it and say, “we’ve raised our standards,” even though it’s by a trivial amount.

Minnesota schools have historically been very strong, but it hasn’t been because we’ve had testing. There’s a lot of other things we can do to strengthen schools. And the biggest one is having high quality, talented people go into the profession. What are we doing to promote that? We’re giving them more tests. We’re making them sign on to [the state’s controversial merit pay program] Q-Comp. We’re making them do more and more that has little to do with their profession and more to do with keeping track of their Q-Comp points. And it’s not going to attract people to the profession.

MP: What other observations after 22 years in the assessment arena do you care to offer?

JA: I’ve really enjoyed it. Like I said, I’ve enjoyed working at schools. I’m always amazed by how much teachers do, whatever it is you ask them — and by and large without much complaining. But I worry about it.

My daughter’s a teacher now, a high school teacher. I worry about the fact that she doesn’t have a rule of 90 to be able to retire before she’s 66. She’ll be teaching for 40 years. I worry a little bit about the size of her classes because her district can’t pass a levy. I’m frankly worried about the role of technology and what that’s going to introduce into her life.

But by and large, I’ve had a good run. I really feel pleased with what I’ve contributed and what I’ve learned from people over the years.

Join the Conversation

7 Comments

  1. skepticism justified

    The current testing mania was just coming into focus when I left the classroom 16 years ago, so I feel fortunate to have missed the most recent cycle of teacher-bashing. It’s kind of refreshing to read that someone actively involved in test design understands the distortion that can (and does) take place when the sampled group is small, when multiple factors and results are reduced to a single letter or number, and – especially – when the test result gets used for political, rather than educational, purposes.

    I couldn’t agree more that the best way to strengthen a school system is to get smart, articulate, knowledgeable people in front of more classrooms. Those kinds of people are not attracted by imposing extra hoops to jump through, or by routine public thrashings by legislators or school board members whose knowledge of classroom teaching approaches zero.

  2. Thank goodness!!! Will anyone listen?

    Thank you so much for this. I love the NWEA-MAP for how it guides me and focuses me in the classroom. However, I could show you MAP scores from one year that overall look like I was the best teacher in the universe, and from other years that look bad on the surface. The ironic thing is those rough years I might have actually had to work harder to get the gains I did.

    Meanwhile, people like Teri Bonhoff will say that NOT using tests to fire teachers is indefensible, and there is no push back.

    People like Mike Cerisi will say you lack courage if you don’t want to use these scores to fire teachers, and there is no push back.

    People like that will just say we are all bought off by Ed MN, and there is no push back.

    Meanwhile, Mr. Angermeyr, The National Academies of Research, The Educational Testing Services, and RAND testing all quietly speak the truth no one cares about.

  3. Don’t kill the messenger

    Mr. Angermeyr is highly regarded and it’s great you’ve recognized his work, good to hear/read his perspective after 37 years.

    Unfortunately, it would be a disaster for students and the state to return to the days when there was no comparable student performance information from school-to-school and district-to-district.

    Some people want to get rid of student standards and aligned tests because they don’t like what happens with the results. OK, let’s have our “arguments” about what the appropriate responses to the results should be – but don’t kill the messenger!

    Eliminating standards and aligned tests will simply return us to the days when some groups of students were expected and encouraged to prosper, and others were largely left to survive on their own.

    FYI – the results of those “dreaded” MCAs (and their counterparts in other states) have been used to identify and recognize effective teachers and schools – and highlight successful practices.

    1. I think you missed the entire point of the story

      The story was not about getting rid of value added tests. That is a red herring. It wasn’t about getting rid of standards. The story was about using tests in a productive and ethical way in order to improve education.

  4. Why not do away with standards for doctors, drivers too?

    Angermeyr says,

    “I would do away with standards, to be honest. Even though on paper they sound kind of cool, they assume all kids are the same and they all make progress the same way and move in lockstep.”

    Why stop with education standards? Why not do away with all standards for people like doctors? Drivers too? Teachers?

    All tests and measures are imperfect. But using a variety of assessments helps us understand what’s happening with youngsters, or anyone we assess.

    As a long time proponent of multiple measures (and that was the system used at a school where Ms. Hawkins attended), I think multiple measures are better than no standards.

    So tell us, Ms. Hawkins, given your very sympathetic interview, would you like to know if your kids can do a certain amount of math, or write a 5 page paper with coherent sentences, or know how to compare different products to determine which might be the best bargain? Or do you agree that we should do away with standards in education?

    1. So we should judge doctors by the same measure as teachers?

      Great. For example, infant mortality of poor, African American kids is way higher than white, middle class babies. If we followed your logic, we should fire the doctors who treat poor, minority women.

      Same thing for things like type 2 diabetes. It is epidemic in the poorer communities. Any doctor who sees poor patients should be fired if they get more patients with diabetes than a doctor in Edina.

  5. misuse of NWEA MAP

    As an elementary teacher I am not an expert at statistics. I was never trained in the methods of a statistician nor did I ever take a statistics class. I am however, an expert at practicing with children. I use assessment, both formative and summative to guide my teaching but I am really an expert at sitting down next to a child and deciding what guidance she or he needs next based on how that child reasons through a math problem or problem-solves through a new piece of text. This is the data I was trained to use in my practice. The qualitative moments not the numbers.

    I have witnessed, from both a teacher perspective and a parent perspecitve, how this test is monutmentally misused to rationalize some of the most severe marginalization of all children, often at the very tender age of six. Ironically, the adults behind this are not villians! These are teachers, administrators and policy makers, the very grown-ups that have childrens’ best interests at heart!

    I appreciate Angermeyr’s skepticism but it is too little, too late. He neglects to acknowledge the role of teacher as practitioner and the responsibility of the larger society in eliminating the gross social and economic inequities that are often behind poor school achievement. With this said, I hope adults are listening to him.

Leave a comment