Gas station without pumps

2012 May 21

Can’t do value-added measures with a low-ceiling test

Filed under: Uncategorized — gasstationwithoutpumps @ 18:35
Tags: , , , ,

Everyone who know anything about state tests knows that they are mostly low-ceiling tests, designed to separate those who have not learned any of the material from the marginally competent.  Few of them are designed to measure the performance of students in the top 1%.  This makes the use of state tests for “value-added” measures of teacher performance a serious hazard for teachers of gifted students—their students start out well ahead, so there is no room on the test for them to improve.  The mismatch between the test and what the students are currently learning can result in lower than expected scores due to student boredom and resulting clerical errors.

This is not just a theoretical concern: one teacher in New York has left the profession after being unfairly branded The worst eighth-grade math teacher in New York City.  This teacher got over 1/3 of her honors 8th grade math class to get a perfect score on the Regents Integrated Algebra exam, and all of them passed (much better than most classes of 10th graders).

But they did not do well on the 8th-grade state exam, which tested basic arithmetic that they had done 3 years earlier—they had no reason to work hard on the boring test (no consequences for the kids, just for the teachers) and goofed off.

The only way a value-added measure of teaching can work is if the test measures what the teacher is teaching.  If the test is arithmetic, and the teacher is teaching algebra, then the test cannot be used to measure how much value the teacher has added.  If an appropriate pair of pre- and post-tests had been used (like the Regents exam), I suspect that this teacher would have come out looking reasonably good.

2011 May 10

Value-added teacher ratings, LA Times does it again

Filed under: Uncategorized — gasstationwithoutpumps @ 18:31
Tags: , ,

The LA Times has once again published thousands of “value-added” teacher ratings:
Los Angeles public schools, value-added teacher ratings: Times updates value-added ratings for Long Angeles elementary school teachers.

My opinions on the practice have not changed much from when I blogged about their previous posting.  This time, though, they published a FAQ page, which helps provide more transparency to the process, and they do have graphical ways of expressing the confidence interval and not artificially expanding the tiny differences in the middle of the range, which should help interpretation of the results.

The presentation this time seems to be much better, though the rather severe limitations of using the standardized test to measure teacher competence are still unavoidably present.

2010 August 16

Value-added teacher ratings

Filed under: Uncategorized — gasstationwithoutpumps @ 15:18
Tags: , ,

Yesterday the LA Times published a story about assessing all the LAUSD teachers by a value-added approach.  They will be publishing a database of rankings for 6,000 elementary school teachers later this month, after giving teachers a brief period in which to comment on their rating.

Unfortunately, the article does not give the technical details of how the “value added” is calculated.  There are hints that it is done by looking at the average difference between the achieved score for the students and the expected score for the same students based on prior achievement scores (perhaps using percentiles).  It makes some difference what scale the scores are on and what the ceilings are for the tests.

For low-ceiling tests (as most state tests are), a teacher of gifted students who are already hitting the maximum scores on the tests will always be seen as having no value added, because the students can’t show their improvement.

Quite predictably, the teachers’ union is outraged over the LA Times publishing the data, going so far as to call for a boycott of the newspaper, though the Times appears to have acquired the data quite legally through Freedom of Information Act requests.  The union is worried that teachers will be penalized for poor performance on the rating system, and that the seniority-based system on which teacher promotions have historically been made is in danger.  They do have a good point that the tests used are a far from adequate measure of how much learning has taken place, but they are much better than having no measure of student performance, and much better than the cronyism of personal evaluations by the principal, which is the current system.  Tests before and after an intervention (teaching the students for a year) are the standard way to determine whether an intervention is successful.

One reasonable critique of the method used is the assumption of causality:  “Teacher A’s students on average advance more than Teacher B’s students”  implies “Teacher A is a better teacher than Teacher B”.  This is a reasonable assumption, but is not guaranteed to be true.  There are a lot of reasons why a class may perform better or worse that are not related to the teacher.  Still, the approach of averaging over at least three years and only looking at large differences should eliminate a lot of the one-time artifacts and minor statistical fluctuations.  Systematic biases in the assignment of students (for example, if one teacher gets a lot of hard cases and other gets a lot of teacher-pleasers) can certainly distort the picture.

Teacher bloggers will soon be ranting and raving over this move by the LA Times (see, for example, Rational Mathematics).  I expect most will question the validity of the tests, as that is the easiest target.  Personally, I think that the non-random nature of the selection of students for each class is likely to be a bigger source of error.  The response in Education Week‘s Teacher Beat is more measured, pointing out some of the other conclusions (which are well supported by other studies), such as that who the teacher is matters more than which school, and that paper qualifications have little correlation with effectiveness (measured in this value-added way).

I see one other danger, and that is that any ranking system will always put someone on top and someone on the bottom.  If there are huge differences in teacher effectiveness (as there seem to be between the extremes), this is not a major problem, but the risk of amplifying small differences in effectiveness to large differences in rank (particularly in the middle of the pack) is high.

Does anyone have more detailed information about the methods used in the LA Times ranking system?  I suppose the thing to do is to contact “Richard Buddin, a senior economist and education researcher at Rand Corp., who conducted the statistical analysis as an independent consultant for The Times.”  The RAND staff page for him gives contact information for him, as well as his CV.  His credentials certainly look good, and this is not the first time he has analyzed student performance to measure the effectiveness of teachers.

%d bloggers like this: