Fifty-six years ago, a journalist by the name of Darrell Huff wrote a book entitled “How to Lie With Statistics.” Intended for the lay reader, it contained funny illustrations and witty, cautionary passages explaining things like why correlation does not imply causation. Happily, it’s still in circulation.
I read it during my J-school days and my takeaway then was the same as it is now: In most instances, statistics are as fungible as any commodity gets.
I raise this because there are three interesting controversies ripping through national education policy circles this week that Minnesotans might want to attend to. All three concern the use of standardized tests to assess teacher effectiveness, a topic that’s on the agendas both of Gov. Mark Dayton and both caucuses in the current state Legislature.
It’s not unreasonable to ask, as we take up yet another incendiary teacher accountability discussion, whether our statistics might fib a little — or at least merit serious challenge.
Blogger challenges Rhee’s claims
It seems a retired Washington, D.C., math teacher with a blog has done what the nation’s education writers could not do: He challenged former D.C. Schools Chancellor Michelle Rhee’s claims about her classroom achievements. Apparently, she claimed on her résumé that she was able to raise proficiency rates for 90 percent of her students in both reading and math to the 90th percentile or above. Supposedly, they started at the 13th percentile.
Rhee did boost test scores, but into the 50th percentile, data gathered by blogger G.F. Brandenburg purports to show (scroll down to Jan. 31). Why do I say purports? Because the underlying datasets would tell me, personally, less than a cup of dried tea leaves. More important, by my lights, Brandenburg’s numbers add up for Washington Post “Class Struggle” columnist Jay Mathews, a respected veteran who takes a frequent drubbing on the blog.
To be clear, Mathews disagrees with Brandenburg’s assertion that Rhee lied. He suggests there is a more nuanced story to be told about Rhee and her principal and conclusions they drew from data available at the time. I think Mathews would know, but I still think Rhee — a reform rock star with a national platform — has some ‘splainin’ to do.
Her answers matter for several reasons. Rhee was a Teach for America teacher at the time she supposedly turned in her stellar performance. Similar claims of exceptional performance by TFA recruits and other grads of alternative teacher preparation conduits, including Rhee’s own New Teacher Project, are at the heart of the calls for the creation of alternative teacher licensure provisions here and elsewhere.
It matters because it suggests Rhee, who gained both a national following and notoriety for her assault on D.C.’s lowest-performing teachers, might not have survived her own tenure as chancellor. And of course it matters very much to teachers everywhere whose performance may now be tied to that of their students.
An LA dust-up
Tidy segue to raging controversy No. 2: In August, the Los Angeles Times published a story based on a survey by a RAND Corp. researcher on measuring teacher effectiveness and a database showing how each and every Los Angeles Unified School District teacher fared using the rubrick. Within hours of its posting online, the database had garnered a quarter of a million hits. A month later, the paper came under fire from the teachers’ union, which asserted that its publication contributed to the suicide of a teacher who was rated “less effective.”
In the wake of the controversy, teachers unions in New York and elsewhere have petitioned to keep teacher evaluation data private. The counterargument: Parents and taxpayers have a right to know how effective teachers are.
Seems a couple of researchers with the Research and Evaluation Methodology Program at the University of Colorado at Boulder have parsed the LA Times’ data and concluded that, as a measurement of teacher effectiveness, it is “deeply flawed.” The researchers were unable to replicate RAND’s findings. Indeed, when they reran the same data using their own methodology, about half of the teachers’ ratings changed.
You can read the report. My takeaway: The paper was most accurate in assessing the highest and lowest performers; within the ranks of the “average” teachers, things get distinctly muddy — with potentially grave consequences.
Small wonder that colleges of education — next on the “value-added” assessment firing line — are freaked out that the National Council of Teacher Quality and U.S. News and World Report are changing the way they rate teacher-preparation programs. Many of the changes are reportedly attempts to address colleges’ concerns that the ratings process is not transparent or accurate, but the council does plan “to supplement the content-based analysis at the heart of its methodology with information on candidate classroom performance culled from ‘value added’ data,” according to Education Week.
The stakes are high, indeed: One of the reforms pushed by the Obama administration’s Race to the Top education funding competition was the ability to measure the effectiveness of teacher-prep programs by tying alumni performance in the classroom to student achievement.
Jim Angermeyr is the director of Research, Evaluation & Testing for Bloomington Public Schools and one of the designers of a widely respected value-added test lots of Minnesota schoolchildren take two or three times a year. He’s also something of a standardized testing skeptic.
‘The inferences we’re drawing can be wrong’
His view of the controversies, in my vernacular: We’re looking at a bunch of blind men fondling an elephant. Economists, in his opinion, tend to be very supportive of the use of value-added data in evaluation. Educators and psychometricians, not so much.
“It’s not necessarily that the methodologies are wrong,” he said. “It’s that the inferences we’re drawing can be wrong.”
The kids are the greatest of the variables, of course. The tests may tell you a student is reading better or sliding in math, but they don’t tell you whether she spent the summer with a tutor or he is so young the test isn’t as accurate as it would be in an older child.
Nor is the same test used from year to year. A particular student or teacher may fare better on a test closely normed with curriculum vs. one aligned with a set of knowledge-based standards.
“You leave out a lot of the potential variables,” Angermeyr said. “They’re just not at the point where we should use them to make decisions about jobs.”
Meanwhile, there’s a great deal of evidence that good old-fashioned classroom observation by peers and skilled principals is a terrific way to gauge teacher effectiveness. Ask any parent: Even before a child heads into kindergarten it’s usually possible to figure out which teachers are coveted, which seem just fine and which ones need coaching or a new line of work.
It’s enough to make one wonder whether we should slow down and make sure value-added data really is adding value.