Two decades ago, Jim Angermeyr was one of a small group of educators and psychologists who came up with a new kind of standardized test, the growth model or value-added assessment, which has since become the gold standard.
Unlike the ubiquitous No. 2-pencil exams that measure how many correct answers a student can give, the Northwest Evaluation Association tests and others that use the same architecture are designed to get harder as long as students are acing them. When subjects start to falter, the answers get easier.
In addition to the dreaded, legally mandated Minnesota Comprehensive Assessments (MCAs), which reveal little data that is useful to teachers and administrators, most school districts now administer some version of Angermeyr’s test. Not only can it show where a particular child is with individual academic skills, administered in the fall and spring the exam shows whether a student made a year’s progress.
Next week, Angermeyr will retire from the last stop on a 37-year career in education, as director of research and evaluation for Bloomington Public Schools. (He is being replaced by his Minneapolis counterpart, Dave Heistad, which is a coup for Bloomington and cause for much weeping in the cities of the first class, but that’s another story.)
One would think, then, that Angermeyr would be thrilled to retire at the very apex of our national love affair with tests, the moment when the value-added assessment is being touted by policymakers throughout the country as a kind of cure-all for everything that ails education.
Perfect for its original purpose, but …
Rather, he is justifiably proud of his professional accomplishments, but deeply skeptical about the test-obsessed state of the field he’s leaving. As he sees it, his baby is terrific and perfect for its original purpose, but it’s being used to beat up on teachers, kids and principals — while failing to help close the achievement gap or better struggling schools.
We had to dog him all through MCA season, but the other day we finally got Angermeyr to sit for an exit interview. An edited version of that conversation follows.
MinnPost: Policymakers on both sides of the aisle now agree that the standardized testing mandated by No Child Left Behind had created a juggernaut of epic dimensions. The current thinking is that the way to fix this it to substitute the tests you worked on designing two decades ago. Yet I know from talking to you in the past that you are not a proponent of this, or of many of the ways we propose to use test data now.
Jim Angermeyr: I think among assessment professionals there’s always a recognition of the limitations of every test, including NWEA’s testing. We have a healthy respect for error and how to measure it. And always a certain amount of caution when you’re interpreting results.
That caution grows as the groups get smaller, like looking at a classroom instead of a whole school. And that caution grows even more when the stakes increase because increasing the stakes can lead to all kinds of distortions, whether it’s the cheating that goes on in some of schools that you’ve been reading about around the country, or whether it’s just the general over-emphasis on testing to the exclusion of other things.
MinnPost: How do you think we arrived at this place?
JA: To be honest, I think it’s a lot of external factors. It’s politicians and some policymakers who believe tests can do more than they really can. And there’s not enough people stopping and saying wait a minute. When you can summarize a whole bunch of complicated things in a single number, that has a lot of power and it’s hard to ignore, especially when it tells a story that you want to promote. And that’s where it gets really twisted.
MinnPost: So if you ran the universe, how would we use tests, and which tests?
JA: The tests we have, whether they’re state tests or commercially available tests, are by and large designed by modern psychometric theories, and they’re pretty sound. They do a good job of measuring some important outcomes. And they do a good job of aligning to the important standards. So it’s not the tests themselves.
Where the distortion comes in is that you can only test a limited amount of the domain. Even if it’s a domain like mathematics, you can’t cover everything. And so you make assumptions about kids’ skills in that broader domain. Do we have eighth graders who are good readers based on a pretty small sample of questions and items?
Testing professionals know that you’re just sampling the domain and you don’t try to make inferences further than that. But nonprofessionals do that all the time. “American students are 51st in the world in reading.” There are a lot of assumptions that are made before you can get to that conclusion, but people leap right over that.
If I was running the world, I would severely reduce the accountability stakes for tests. I would certainly eliminate things like No Child Left Behind. I would probably take away the current waiver. Even if it looks better, sometimes it’s still really the same wolf in different clothing.
I would do away with standards, to be honest. Even though on paper they sound kind of cool, they assume all kids are the same and they all make progress the same way and move in lockstep. And that’s just not accurate. Standards distort individual differences among kids. And that’s bad.
I would put testing back as a local control issue in school districts. I would take the emphasis off of evaluating and [compensating] teachers. I would put the emphasis on good training for principals and curriculum specialists and teachers on how to interpret data and use it for the kind of diagnosis and assessment that it was originally intended for.
MP: I have spent time in a handful of schools where the teachers use quizzes or assessments on the fly to determine how many kids got a lesson right, which ones need it delivered a different way and so on. They’ve literally kind of upped the number of children mastering the material by looking at that data. What do you think of that?
JA: Any activity that teachers use where they’re focused on “What do my kids know and understand?” as opposed to, “What have I covered and taught?” is a good thing. So I don’t want to denigrate this form of assessment as not valuable. But I don’t want to equate kind of assessment and the inferences you make from it with the kind of assessment that we’re generally talking about in my world or the state talks about or No Child Left Behind talks about. Those are designed to measure the broader domain of subject areas: “How well can kids read?” Not, “How well did kids learn the six things I was covering in my class today?”
So two different kinds of tests for two different purposes. And teachers would generally like the latter and should do more of it, but it can’t substitute for the former.
However, we are getting kids to certain levels of proficiency on broader domains. So I still think we need summative accountability tests, but we need them locally managed because the pressure is different.
MP: Say more about that.
JA: Let’s go back 18 years, 19 years, when I first introduced NWEA testing in Rosemount-Apple Valley-Eagan. It was not for accountability purposes. It was designed primarily to help measure the growth of students from one year to the next so that we could evaluate whether our curriculum programs were working or not. It was a program evaluation tool. And it also obviously informed parents about progress kids were making. But we had a lot of cautions in there about don’t over interpret this, kids go up and down from one year to the next, etc.
It was really for that local curriculum improvement process. There was never any sense that we would grade schools on this or that we would rank teachers from high to low or that we would try to show that one school was doing a better job than another school.
Once you introduce that kind of stuff, I just think the distortions grow exponentially. But the test itself was basically the same test we’re using today.
MP: Do you feel like a small voice in the wilderness?
JA: No. I don’t think I’m saying much different than a lot of people say. We don’t get listened to very much by politicians. I mean, if you read the cautions about [using] value-added measures for [measuring] teacher effectiveness or the cautions that people have raised about judging schools based on test scores, that stuff’s been around a long time. And the testing community pretty much speaks with one voice about that. I think you and I talked before about how the biggest fans of value-added tests are economists, not educators.
MP: Yes. Where I struggle is that before No Child Left Behind, we really did have vast groups of kids whom nobody really assessed, and the emperor was never naked. We made certain assumptions about them that we’re probably still making. This group is disadvantaged in this way and so our expectations can only go so far.
JA: I agree. I’d like to think that those places and those schools were few and far between, but I know there were some. I think there have always been weak school systems with weak principals and some weak teachers that just did exactly what you said. I don’t know if the spotlight of No Child Left Behind or high-stakes testing has changed those schools, maybe they have.
I look at the lists of schools that are struggling now with that spotlight and they have been struggling for a long time. There are examples. You’ve written about some and I’ve seen some where there really are some changes, but they’re so rare to be the exception that proves the rule.
So I don’t know. The grad rule is a good example. The amount of money we spend every year giving reading and math tests to students to give them a high school diploma has done absolutely nothing to improve the graduation rate; it’s done nothing to improve the quality of the graduates. The level of proficiency represented in those tests is so low I don’t think it’s changed anything.
Have there been a few kids who’ve had to struggle for a few more years and take whatever kind of interventions the schools can design to get them to pass it? Sure. Is it justified? When I think about all of the things that we could be spending that money on, I just don’t see that as very valuable. But the grad rule is one of those things you can never take away because politicians can point to it and say, “we’ve raised our standards,” even though it’s by a trivial amount.
Minnesota schools have historically been very strong, but it hasn’t been because we’ve had testing. There’s a lot of other things we can do to strengthen schools. And the biggest one is having high quality, talented people go into the profession. What are we doing to promote that? We’re giving them more tests. We’re making them sign on to [the state’s controversial merit pay program] Q-Comp. We’re making them do more and more that has little to do with their profession and more to do with keeping track of their Q-Comp points. And it’s not going to attract people to the profession.
MP: What other observations after 22 years in the assessment arena do you care to offer?
JA: I’ve really enjoyed it. Like I said, I’ve enjoyed working at schools. I’m always amazed by how much teachers do, whatever it is you ask them — and by and large without much complaining. But I worry about it.
My daughter’s a teacher now, a high school teacher. I worry about the fact that she doesn’t have a rule of 90 to be able to retire before she’s 66. She’ll be teaching for 40 years. I worry a little bit about the size of her classes because her district can’t pass a levy. I’m frankly worried about the role of technology and what that’s going to introduce into her life.
But by and large, I’ve had a good run. I really feel pleased with what I’ve contributed and what I’ve learned from people over the years.