Gas station without pumps

2010 December 18

Recent posts on standards-based grading

Filed under: Uncategorized — gasstationwithoutpumps @ 17:19
Tags: , , , , ,

There are  couple of posts on standards-based grading that I’ve been meaning to talk about.

One is a New York Times article by Peg Tyre that presents standards-based grading (in 8th grade math at Ellis Middle School) as a panacea.  “Knowledge grades” are based solely on averaging end-of-unit tests.  There is a separate “life grade” for things like work habits, effort, and citizenship.   While this is a positive change in many ways (no more extra credit for bringing the teacher school supplies), it misses the point of standards-based grading, which usually means splitting the “knowledge” into separately evaluable topics or skills and allowing students to keep working until they have mastered each topic, not just averaging their one-shot end-of-unit tests.

I don’t know whether Ellis Middle School or  Ms. Tyre is abusing the terminology, or whether Ms. Tyre simply left out the important part of Ellis Middle School’s change in order to meet a deadline.  Either way the article is not about standards-based grading, despite the prominent mention of “a new, standards-based grading system.”

More recently, quantumprogress had a post on “Perfectionism and SBG“, in which he (she? I’ll assume male for the rest of this post, as the odds are higher, given that the blogger is a physics teacher) comments on the behavior of perfectionists under SBG scoring schemes in physics classes.

He wonders why a student wants to have perfect scores on all standards going into the test, particularly when the student express it as “I would like to enter the exam having showed understanding on absolutely everything.”  He doesn’t believe it is because the student really wants to understand everything, but relates it to “conceptual consumption” in which ideas are treated like things to be acquired as status symbols (like some bird watchers’ life lists of species sighted).  He also relates the idea to video game achievement lists as described in “Conceptual Consumption and Kicks to the Head” (see also my posts Experience points for classes and Just scoring points).

He idealistically says

I want them to see to see learning as a process, I want them to see that just when you think you understand a concept, you realize how much more there is to understand, and that is a good thing. Precisely for this reason, it’s pointless to put together a list of “intellectual achievements” for you to achieve, …

Personally, I think that the video game designers have tapped into a good way to get engagement, and that we are better off as educators trying to exploit the gaining-points mentality than trying to fight it.  There is significant risk if the point system is poorly designed, both of students forgetting the material as soon as the points have been earned and of not valuing anything that does not have points.  But I believe that addressing those challenges head-on in designing the grading system works better than wringing one’s hands and wishing the students valued learning for its own sake (of course, a few will, and we don’t want to lose them in the point-chasing crowd).

So, as I see the challenge, it is not to magically instill a love of learning in students, but to set up a system in which continuous small rewards keep students attention on the learning and get them to push themselves to do things that previously they would have regarded as either impossible or not worth the effort.  The small rewards need not be “points” or “grades”, but they have to be observable by the students (changes in their knowledge and skills, which outsiders can easily observe, are not really accessible to student perception). Some larger rewards (like the “achievements” lists in some video games) can also be motivating for longer-term commitment.

I think that another point that came up in the comments on Conceptual Consumption and Kicks to the Head is important:

However given the short checkpoints, the challenge never felt insurmountable, and I generally (with a few exceptions) could tell where it was a flaw in my skills or tactics that had let me down—and so the fun was in trying to overcome that.

What do we have as educators that corresponds to the “short checkpoints”? What to do we have that tells students where the flaw in their skills or tactics that let them down, automatically, so that they immediately want to try again?


  1. Good points. I think standards based grading, at least the way I try to implement it, can offer students the short checkpoints and give them the tools they need to understand their mistakes and the desire to try again. My greater issue is with students who think that’s all there is to do—show mastery of every concept, without ever really asking why, or trying to put their new understanding to use in new and unfamiliar situations. Of course, these are idealistic and rather nebulous goals, but I think they are worthy ones for our students to have. I also love the ways video games are implemented to teach you through small incremental progress and continuous feedback, and would love to make this more of a part of my classroom—we use webassign for homework, which is a big, but somewhat awkward help, but I’d still like to get to the point where I could instantly grade quizzes (without having to move to multiple choice) when kids take them, and have them leave the room with a clear understanding of what they need to work on that night. I see something like this developing in this blog post about integrating social media into education, where a teacher was able to creat an online certification center, so students could automatically assess and certify themselves as experts in various concepts (say reading velocity graphs or Free Body Diagrams in physics) and then students who are struggling with these topics could seek out these ‘experts’ for help.

    Comment by quantumprogress — 2010 December 18 @ 19:20 | Reply

  2. Hello, I’ve been reading through your SBG-related posts and finding them very helpful. You have my gratitude for tackling this issue with a critical eye from a post-secondary perspective — I was about to give up on finding just such a thing. I teach electronics in a 2-year diploma program in Canada (we fall in between what Americans would call “vocational/technical” and an associate’s degree, neither of which exist in Canada). Reading the blogs of SBG advocates made me realize that I am already doing some of it, but I have some of the problems you describe w/r/t retention, synthesis, and reassessment. Here’s what I plan to try in January, including some tentative solutions to the problems you describe. Although I’ve been doing elements of this for a year now, I’m a new teacher talking about a program I haven’t even implemented yet — caveat emptor!

    Before the “what”, here’s the “why”. I’m asking myself: do my students know what they’re weak at? Do they know specifically what they need to do to improve? I’ve realized that answer is no, or at least not as much as they could. So SBG, or some elements of it, might help. If your students already know those things, and have opportunities to raise their grade as (if) their skills improve, then maybe SBG is a solution for a problem you don’t have.

    Retention: I plan to let grades go down as well as up. I don’t see how this contradicts the ideals of SBG… in fact I think it implements them more completely (let the grade reflect your understanding *now*.) I agree with you about the problem of noise-free measurement. However, I don’t see how this is worse than the conceptual void of summative assessment. I also plan to let my students know that any skill can show up on any test, so they’d better keep them fresh. I thought about doing some kind of weighted average of the last n scores (n>1) — haven’t decided. However, I don’t think this is incompatible with SBG. Even the purest single score keeper is using a window — it’s just a really narrow window (the length of the assessment). This is a quantitative difference, not a qualitative one.

    Synthesis is my next problem. I’m thinking of using a 5-point scale where getting a 5/5 means using the skill in combination with at least n other skills. If you can only demonstrate a skill in isolation, you’re stuck at a 4. (For very small values of 4. I will make 4 actually a 79%, since the school considers 80% to constitute “honours”).

    Another tactic: keep a separate score for the conceptual skill (does this application require a leading or lagging circuit? Why?) and the mathematical skill (how much is this circuit leading/lagging). YOu can get a 4/5 on the former and a 1/5 on the latter. In your application: when you get the hopelessly jumbled program, you could record it as “0” (nothing to assess yet), and request that the student rework it in pseudo-code or flow-chart format. Would that allow you to tell whether the problem was Markov chains or Python syntax?

    Another option: give partially solved problems. I don’t know if this makes sense in your application, but sometimes I can solve a problem up to different points, and ask the student to identify what the next step should be, or ask them to do the next step. Or give them an incorrect solution and ask them to find the problem and correct it. Or explain in what ways the result will be distorted by this kind of error. If there are several possible answers, so much the better. Every time the student chooses strategy A, their grade changes for A. If B is also a valid strategy, but they never choose it, their grade for B stays in the “nothing to assess yet” category (0/5).

    Finally, I think it’s perfectly reasonable to ask the student to propose the question or application. If you want a 5/5, come see me during office hours, be prepared to present to me an application of a leading or lagging circuit that I have not covered in class, and be prepared to answer questions about why this is a logical use without resorting to your iphone. Maybe they even have to propose the topic and book the appointment a few days in advance.

    Reassessment is not cheap: that leads to the idea of how difficult it can be to craft a reasonable test of a skill. Don’t have any ideas about this except making the student do it (if that’s even possible). If they really want to raise their grade, perhaps they will. Or find like-minded people teaching in your exact same discipline and create a bank of assessments.

    Project-based learning: in this case, I think SBG is not the solution (or rather, SBG is a solution to a problem you don’t have.) In a project, where you are building something or designing something and you know all the attributes that thing must have, it is often obvious what’s not working. SBG probably doesn’t have any additional clarity to add. I would argue that SBG is not the cause of the reductive view of learning that you describe — rather, our school system is. SBG just de-obfuscates that fact. Let’s not shoot the messenger.

    Finally, you’ve written about the problem of students enrolling in your class without the pre-reqs. Does this mean they are being admitted without having taken a programming course? Or that they’re nominally passing the programming course without being able to do anything useful? If the latter, maybe your contribution to SBG-land is to convince the programming prof to use it ;) If the former, perhaps it’s time to exclude these people from the class.

    If neither is possible, I can think of one other option, but it may be unpalatable. Don’t grade them on their programming. It’s not a programming course, right? So don’t grade them on it. Grade them on evidence of their understanding of Markov chains (or whatnot). If they are unable to provide *any* evidence of that, then, well, why on earth should you give them any points? I’m suggesting withholding the points — not the feedback. When you get that jumbled mess of code, it’s probably worth writing a note explaining that their programming skills are too weak for you to see any evidence of what they’re learing in your course. They should withdraw and save their money and time, or get a tutor, or (etc.) if it’s truly that bad (I teach C and PLC programming, so I’m in a position to believe you). If it’s not that bad, then they should (insert feedback here) but giving them points for programming badly, in a non-programming course, doesn’t seem beneficial.

    I hope this is of some use to you, even if only as grist for the mill. If you have any thoughts on how to poke holes in my plan, I hope you will go ahead — otherwise my students will certainly do it in 2 weeks!

    Best regards, and thanks again for your thoughtful posts.


    Comment by Mylene — 2010 December 18 @ 23:15 | Reply

    • Lots of food for thought here. Just some quick knee-jerk reactions (which may change after I’ve thought some more).

      Programming pre-reqs: The seniors told me that they had never designed a program themselves, just coded routines in scaffolding created by the instructor. I asked a CS prof who teaches some of the courses and is known to care deeply about pedagogy, and he talked with the instructor. It turns out that the students were right—there had been extensive scaffolding even in the “advanced programming” class. The professor is going to have long discussions with the instructor about the intent and level of the courses, so that the students get scaffolding at the beginning of the series, but are weaned off of it by the advanced programming course.

      Not grading programming: The course I teach is one of the last ones the undergrads take, but the first one the grads take. Assessing whether the students can program (and letting them know whether they are strong or weak programmers) is an important function of the course, and helps the students determine what sort of research projects they should undertake or what remedial work they need in programming. Programming is a crucial part of bioinformatics, but not everyone who comes into the programming has the level of skill we expect to see, especially as we take a fair number of biologists who have just started programming a year ago. So though the course is not primarily a programming course, I do talk a bit about what makes for good bioinformatics programs and I do evaluate them on the skills they bring to the course (or develop during the course). The course is on models and algorithms, and I don’t believe anyone can really understand the algorithms we study without implementing them in a program.

      There have been attempts in bioinformatics to create various “courseware” depositories. I’ve been involved in two such projects, both of which seemed to me to be failures: either nothing was contributed or stuff was contributed but no one wanted to use it. I’ve shared my assignments freely, and I think that one or two of them have been picked up elsewhere, but the core of bioinformatics is still seen very differently by different practitioners, so there has not been as much sharing as one would like. I’ve yet to see someone else’s assignment that I thought I could work into the course (aside from assignments that former students helped me develop for the class). The last time I developed a completely new assignment, two of us worked on it for about 3 weeks to get a rough draft we were ok with. Creating projects that are interesting, instructive at the grad level, and doable in a week is hard!

      I have been doing reassessment (in the form of resubmissions of writing or programming assignments) for quite a while. Grades can definitely go down as well as up, though usually there is at least some improvement from one draft to the next.
      I don’t average the grades, but report all of them (with notes on what got better and what didn’t) in narrative evaluations. I do have higher standards on the regrades, as they have had the benefit of my feedback and debugging help, and I want to see more than just them fixing the stuff I showed them how to fix.

      The final letter grade tends to be approximately an average of the final draft grades, though I sometimes weight assignments by their importance.

      Comment by gasstationwithoutpumps — 2010 December 18 @ 23:53 | Reply

      • Interesting. Quick note: clearly, above, I meant that I hadn’t found any other post-secondary blogs in engineering and related fields on this subject. There are lots of post-secondary blogs that address SBG.

        Over the course of your SBG posts, you’ve defined these problems:

        1. Creating new assessments
        2. Synthesis
        3. Retention
        4. Motivation

        Here are my new, improved thoughts after reading your reply. Warning: this is long (I went back to make it shorter and it got longer. I’m going to stop now). I find this fascinating and helpful in picking apart my own ideas, so I hope this isn’t too much. If you want to continue the conversation offline, let me know. (If you simply don’t have the time to continue the conversation at all, no problem!)

        Create new assessments or redefine “topics”?
        “The hard part is not listing topics but coming up with meaningful assessments…” “I have a hard time pinpointing exactly what I want the students to know in a testable way.”

        What if you’re modeling the wrong object? I know that some people (notably Dan Meyer) assess a “concept list” but others (I will soon be among them) assess skills instead. Here’s the before-and-after view:

        Concept list: Hidden Markov Models
        Skills list:
        – can develop program to produce statistics
        – can use statistics to answer biological problems

        You put it like this: “I am not so concerned whether they have gotten all the details of Markov chains or of specific models for DNA sequences, as whether they have the ability to move from the description of a stochastic model to an implementation and to use that implementation to test a hypothesis.”

        So. If “move from a stochastic model to an implementation” is a skill that they do in all 6 programming assignments, then voila, 6 assessments. No need to spend weeks creating more assignments.

        “Part of my problem is that I’m not interested in whether the students know factoids about biology, programming… I’m interested in their synthesis”

        I am going to be so bold as to suggest that you *do* care. You care about their underlying skills because you are getting a headache while trying to grade hopelessly jumbled, ungradably messy assignments. And even if you don’t need this information, the students do.

        I’m going to go further out onto a limb from my previous comment and suggest that SBG isn’t inherently more reductionist than any other grading system. Answering an experimental question like “where [is] the flaw in their skill” requires testing variables independently while holding the others controlled. This isn’t reductionist: it’s experimental design (yes, I realize that there may be more than one causal variable; but can you think of a faster way to find that out than by testing the variables one by one?)

        Someone has to test the underlying skills in isolation from each other. Specifically, those tests need to “produce statistics to answer problems.” ;) People who could do this include
        – the profs of the pre-req courses, based on assessments of individual skills
        – the student themselves, based on their knowledge of their own strengths
        – you, based on… what? What statistics can answer this question?

        If the synthesis you require obscures data about the prerequisite they need to improve, there’s nothing you can do to help, and the problem will have to be solved by someone else. Let’s note that if the students’ weaknesses are obscure to you (with trained design/analysis skills and lots of experience about the causes of common mistakes), they are almost certainly obscure to the students.

        And no game layer of experience points can change change that meaningfully.

        You ask, “How do we get students to value the knoweldge and skills they acquire in class enough to retain them for future use?”

        I propose that video games create short-term engagement, but I know of none that create long-term engagement. Replayability is notoriously short. Except in the case of the small minority of players who devote their lives to competitive Donkey Kong. And I bet they are at least as rare as the students who love learning for its own sake. ;)

        I can think of 3 things that sometimes improve retention.
        1 — Repeated use of the material (mentioned by other commenters, to whom you replied that sometimes this works out in the curriculum design and sometimes it does not), which is independent of grading scheme.
        2 – Knowing what the heck the point is *before* I start learning. Most textbooks are organized like this: Learn some abstract, disconnected, apparently pointless stuff. Then, find out what it’s for. There is no good reason for this, and it’s bad design to boot (Dan might even call it pseudocontext). You don’t start by writing a program and then finding something to apply it to. You start by analyzing what the goal is. Then you find or create a tool that accomplishes your goal. So teaching in the reverse order can help. (Layman’s overview of “the point”, technical details, review of “the point”). Again, this is about curriculum design, and is independent of evaluation scheme.
        3 – Feeling like I’m in control of my improvement. (See above).

        “What do we have as educators that corresponds to the “short checkpoints”? What do we have that tells students where the flaw in their skill or tactics that let them down, automatically so that they immediately want to try again?”

        I go back to my initial point: the checkpoints should be small in the sense of bite-sized, but they don’t have to be small in time. It seems like you are trying to time-division multiplex when it would be more useful to frequency-division multiplex. Here are the things you’ve written that sound like SBG-testable skills you are proposing (the pattern of repetition suggests maybe only 3-4 discrete skills):

        – Choose the right tools to solve a bioinformatics problem (note: it might be possible, at least sometimes, to assess their choosing without assessing their solving. “I picked this tool because (forethought)” is a testable thing.)
        – Create programs
        – Create programs that produce meaningful statistics
        – Use statistics to answer biological problems (here again, it might be possible to give them the statistics – or maybe just a working program – and have them practice their “answering the problem” skills)
        – Create programs that produce statistics and use them to answer biological problems
        – Do research
        – Do research and write it up.
        – Analyze what problem you need to solve and design a program that can do that
        – Write programs, do research, write up the programs and research.
        – Move from description of stochastic model to implementation
        – Move from implementation to test hypothesis
        – Move from description of stochastic model to implementation to testing a hypothesis

        Several of these have pieces that are physically possible to assess independently, and could be ongoing (with increasing complexity) throughout the semester. I bet you would only have to assess the pre-req skills once (maybe a small, trivial assignment that students are expected to have completed before the first class meeting). But if you do need to create reassessments for the pre-req skills (toy programs), these are *much* cheaper than creating a whole new synthesis project.

        The existing 6 synthesis projects are surely more than enough to satisfy any SBG evangelist ;) The component skills don’t have to be separate; they just have to be discernible. (And if the student passed in a toy program that is total crap, that probably gives you some insight into what’s wrong with Project 1).

        Something else that we’ve danced around:
        Course and Program Design

        On one hand you don’t care about their ability to write toy programs. On the other hand you write that “assessing whether students can program is an important function of the course.” I don’t think you can have it both ways, and your current quandaries reflect this contradiction, rather than a failing of any grading scheme. Either you can already tell whether their programming skills are good enough (in which case you are able to give them a grade for “programming” on Project 3), or you can’t tell, which means you will need to assess those skills independently. How else can you fulfill important functions of the course like what sorts of research projects they should undertake? It looks to me as if there are three options.

        a) Pre-req courses stay the same
        b) Your course stays the same
        c) Most students do well in your course and make good decisions about future research, and you grade their assignments without needing painkillers.

        Pick any two.

        You state that “SBG assumes that diagnostic testing is cheap.”
        I disagree — SBG assumes that diagnostic testing is necessary.

        So, that’s my attempt to debug some of these issues… note that in this process, I am attempting to evaluate a messy synthesis when I have not independently assessed the underlying parts ;) so I recognize that I could be way off base here. Let me know.

        (Incidentally, “Let me know” may be applicable to your grading. What if part of the requirement for project submission was a short paragraph having the student let you know what they had the most trouble with? You might not agree, but it would be information about their perception, at least. And might help their ability to discern the “changes in their knowledge and skills[, which are] not really accessible to student perception” otherwise.)

        Ok, that’s enough of that. Hope the end of your semester went well!

        All the best,

        Comment by Mylene — 2010 December 21 @ 00:24 | Reply

  3. Lots of good ideas in your comments, Mylène. I like getting substantive feedback in the comments, though your comment here is a bit big to digest in one sitting.

    As always when I get feedback, I have some knee jerk reactions that may be totally wrong (it generally takes me several days to figure out which of my reactions are automatic defensiveness and which really reflect my beliefs and reasoning.

    I do give a pre-req programming skills check: the first of the six programming homeworks is a short “warmup” exercise. Unfortunately, it is also in a new language for most of the students (Python), and so it is difficult to separate the mistakes due to an unfamiliar language from the mistakes due to incompetent programming. The exercise does serve several purposes: to assess whether the students know some Python or can learn it quickly, to find out if they know how to document a program, and to find out if they can do simple I/O. The feedback I give the students on this homework often improves the subsequent homeworks substantially, since the students know that I will be reading their programs and looking at the in-program documentation (something most of them have never experienced in previous courses). Because most of the students are learning Python at the same time as doing this assignment, it has to be small—so small that it can’t really test program design skills, which only come out when a program is somewhat larger and more complicated.

    It was on the second or third assignment that the students who had never designed a program before (only done coding on scaffolded assignments) started realizing that they were in trouble. Together we managed to diagnose an unexpected deficiency in the prereq courses (and I’ve started trying to address that problem, though it is difficult as the course is in another department and our students make up only a tiny part of the enrollment. Luckily I’ve found a faculty member in the home department of the course who shared my concerns and will try to get them addressed.)

    I like the idea of asking the students what aspect of each assignment gave them the most difficulty. I might also ask for suggestions for ways that the difficulty could be reduced without reducing what they learned from the assignment. (Sometimes the difficulty is an essential part of the learning, sometimes it is a distraction from the essential parts.)

    Your skills list for the bioinformatics class is an interesting one, though not quite the list I would have chosen. (So what would I have chosen, if trying to express it in that form? I’ll have to get back to you on that.) One that struck me immediately as outside the scope of the course is “Analyze what problem you need to solve and design a program that can do that.” That seems to me too big for this course which is usually only the second bioinformatics course the students take (the first one for most students is a “tools” class which teaches them about existing tools and how to select the appropriate one). I’m perfectly willing to shove the much more difficult task of novel algorithm design from statements of biological problems to later courses and thesis research. For this class, I’m content if they learn some of the standard algorithms of the field (like dynamic programming) thoroughly enough to implement them. I do try to teach them how to derive the algorithms from the problems, so that they know how to think about coming up with new dynamic programming algorithms, but I don’t expect them to be able to do that themselves in their first course.

    That’s it for the knee jerk. I’ll try to make more substantive responses later, either in the comments or in other blog posts.

    Comment by gasstationwithoutpumps — 2010 December 21 @ 08:33 | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: