I have been looking at the latest fad sweeping education, standards-based grading (SBG), and trying to see if it something I should incorporate in my own grading practices.
My first post on SBG looked at some of the assumptions and guiding principles of SBG, concluding that it looked like a good idea if you took a reductionist view of education, where you could split your objectives into separately assessable standards.
My second post on SBG looked at the unspoken assumption that assessment is cheap, something that is not the case in many of my classes.
Another problem I have with SBG is that for a lot of standards, the goal is sustained performance, not one-shot success. It isn’t enough to get a comma right once—you have to get have to get almost every comma right, every time you write. Similarly, it isn’t enough to use evidence from primary and secondary sources and cite them correctly once—you have to do it in every research paper you write.
If you forget to include the “sustained performance” or “automaticity” (to use a buzzword that elementary math teachers seem fond of) components of the standards, you get a sloppy implementation that reinforces the do-it-once-and-forget-it phenomenon that makes students unable to do more advanced work.
SBG aficionados believe in instantaneous noise-free measures of achievement. If a student takes a long time before they “get it”, but then demonstrate mastery, that’s fine. This results in the practice of replacing grades for a standard with the most recent one. I think that is ok, as long as the standard keeps being assessed, but if you stop assessing a standard as soon as students have gotten a good enough score (which seems to be the usual way to handle it), then you have recorded their peak performance, not the best estimate of their current mastery. Think about the fluctuations in stock prices: the high for the year is rarely a good estimate of the current price, even if the prices have been generally going up.
If you want to measure sustained performance, you must assess the same standard repeatedly over the time scale for which you want the performance sustained (or as close as you can come, given the duration of the course and the opportunity costs of assessment). The much-derided average is intended precisely for this purpose: to get an accurate estimate of the sustained performance of a skill.
SBG tries to measure whether students have mastered each of a number of standards, under the assumption that mastery is essentially a step function (or, at least, a non-decreasing function of time). Under this assumption, the maximum skill ever shown in an assessment is a good estimate of their current skill level. There is substantial anecdotal evidence that this is a bad assumption: students cram and forget. Indeed, the biggest complaint of university faculty is that students often seem to have learned nothing from their prerequisite courses.
Conventional average-score grading makes a very different assumption: that mastery is essentially a constant (that students learn nothing). While cynics may view this as a more realistic assumption, it does make measuring learning difficult. One of the main advantages of averages is that they reduce the random noise from the assessments, but at the cost of removing any signal that does vary with time.
The approach used in financial analysis is the moving-window average, in which averages are taken over fixed-duration intervals. This smooths out the noisy fluctuations without eliminating the time-dependent variation. (There are better smoothing kernels than the rectangular window, but the rectangular window is adequate for many purposes.) If you look at a student’s transcript, you get something like this information, with windows of about a semester length. Each individual course grade may be making the assumption that student mastery was roughly constant for the duration of the course, but upward and downward trends are observable over time.
Can SBG be modified to measure sustained performance? Certainly the notion of having many separate standards that are individually tracked is orthogonal to the latest-assessment/average-assessment decision. Even student-initiated reassessment, which seems to be a cornerstone of SBG practice, is separate from the latest/average decision, though students are more likely to ask for reassessment if it will move their grade a lot, and the stakes are higher with a “latest” record. Student-initiated reassessment introduces a bias into the measurement, as noise that introduces a negative fluctuation triggers a reassessment, but noise that introduces a positive fluctuation does not.
Perhaps a hybrid approach, in which every standard is assessed many times and the most recent n assessments (for some n>1) for each standard are averaged, would allow measuring sustained performance without the assumption that it is constant over the duration of the course. If the last few assessments in the average are scheduled, teacher-initiated assessments, not triggered by low scores, then the bias of reassessing only the low-scorers is reduced.