Gas station without pumps

2012 May 5

Grading scales

Filed under: Uncategorized — gasstationwithoutpumps @ 21:43
Tags: , , , ,

I have seen a lot of teacher blogs and forum posts in which some percentage⇒grade scale is taken as obvious or mandated.  These generally follow a simple pattern, with A, B, C, and D each taking up 10% of the range, and <60% being an F (see this web page, for example).  There is nothing sacred about this way of distributing grades—it is simply a traditional way in the US for designing tests so that most questions are very easy.

The consequence of this test design strategy is that most of the questions on the test tell you nothing about most of the students, and only a tiny fraction of the questions distinguish the good students from the excellent ones.  From an information-theoretic standpoint, this sort of test design only makes sense if all you care about is separating the failures from the rest of the students—otherwise it is a really stupid design choice.

Not all tests are designed in this stupid way.  For example, the AP tests, which are intended to group the test takers into 5 groups (5, 4, 3, 2, and 1, roughly corresponding to the traditional meanings of A, B, C, D, and F grades), uses a scale in which getting about half the points will result in a “3”.  For example, here are the scores that I’ve been told are needed for the released 2008 AP Biology test (in the new scoring system, which requires students to guess stuff they don’t know, as there is no correction for random guessing):

  1. 0—57   0%–38%
  2. 58–68   39%–45%
  3. 69–80   46%–53%
  4. 81–94   54%–63%
  5. 95–150  63%–100%

Note that there is a symmetry here, with about as many points allocated for 1s as for 5s (rather than the 6-to-1 ratio of F-to-A in the traditional US scale).  There are still a lot of easy questions, but there are also a lot of hard ones.  (Each AP test is scaled differently, as the difficulty of the questions is not precisely matched, so calibration is done to try to make the final 5-point scores have comparable meanings from year to year.) Knowing any 2/3 of the AP Bio material thoroughly is enough to get a 5, as is knowing all of the material pretty well, but knowing less than half the material will result in a failure.

Getting a 4 on an AP exam is not a matter of making one or two careless mistakes (as a B often is on traditionally scaled exams), but of not being able to answer a substantial fraction of the questions.

Although I like the AP scale much better than the traditional US scale, I don’t think that it is an optimal design.

Most people are interested in the border between 2s and 3s (since that is where many colleges put the no-credit/credit boundary), but placement in college courses may depend on thresholds above that (the 3-4 or 4-5 thresholds).  No one much cares about accurate placement of the 1-2 threshold, as both are considered failing (like an F or a D).  If I were designing a test to be used the way AP scores are used, I’d want a lot of questions to be near each of the difficulty levels associated with the important boundaries, with the 2-3 boundary getting the most attention.  I think that such a test would result in a score distribution more like

  1. 0%–5%
  2. 5%–30%
  3. 30%–60%
  4. 60%–80%
  5. 80%–100%

Note that this would reduce the ability of the test to distinguish at the top end slightly, but those distinctions are being lost when the raw score is converted  to the 5-point scale anyway.

Such a test would have fewer easy questions than the current AP tests, and either fewer hard questions or fewer total questions (depending on whether the current difficulty of getting a 5 is in the difficulty of the questions or simply in the time pressure of the test).

Of course, it may be quite difficult to design such a test, as it is difficult to identify “easy” questions to eliminate.  I suspect that the exam writers have already discarded all the questions that most students scoring a 3 or better get right. A very informative question will split the test takers in to two roughly equal-sized groups, but different questions will split the test takers differently.  The problem is that different students have learned different subsets of the many different types of knowledge and skills being tested.  (Note: I’ve simplified a bit here, as the 50-50 split is only maximally informative if the scores for the questions are independent of each other—we actually want to select conditional independent questions, conditioned on what we are actually trying to measure, which can result in needing questions at easier and harder levels than the 50-50 level.)

The premise of having a single scale score is that there is one thing being measured—in the case of AP Bio tests, mostly likely this “construct” is what fraction of the required material has been learned.  To estimate this for a population of students who have learned different random subsets to different levels of mastery, you need to sample widely across all the material.  A weak student may have learned a small portion of the material very well, but that should not be sufficient to pass the test. Although questions that are easy for almost everyone can be eliminated, you will still have a lot of questions that many of the weak students get right, because the questions happen to be in the subset of the material that they learned.

In that case, you can end up with a scale like the one this AP Bio test uses, where you need to get 45% of the points to pass. Using more difficult questions and lowering the threshold (as I proposed earlier in this post) would mean that students with a deeper understanding of a smaller fraction of the material could pass.  On tests that cover a very cohesive body of material (like Calculus or Physics C: Mechanics), my idea might be good, but for survey courses of broader, more disparate fields (like AP Bio or any of the history tests), it may make more sense to test for wider coverage and less depth, as they currently do.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: