I’m on a couple of the Advanced Placement teacher mailing lists (Physics because I’ve been home-schooling my son in calculus-based physics, biology because I’ve been attempting to get bioinformatics into AP bio courses as a teaching tool). On one of the lists, a fairly new teacher brought up a concern about grading—last year many of his or her top students got 1s on the corresponding AP exam. This triggered an excellent discussion about the meaning of grades and the value of the AP test. In this blog post I’m going to repeat my contributions to the discussion, lightly edited to remove any identifying info, with brief summaries of other views to show what I was responding to (often lumping several people’s ideas together).
Speaking as a college professor, one of the main values of the AP exams is providing a uniform external calibration for the level of high school classes. Most high school teachers don’t have that much communication with other teachers, particularly not on matters like what level of performance should be expected of students at different levels. The result has been an enormous grade inflation over the past few decades, so that “A” is the most common grade in many schools rather than a rare accolade. (The problem is common in colleges also, perhaps even worse than in high schools, especially in the humanities.)
Having an external calibration (an A in this course is roughly the same as a 5 on the AP exam) is very useful for gauging the level of the course, for the teacher, for the students, for parents of the students, and for the colleges that might admit the student. The AP exam scores, after all, are supposedly set to correlate with the grades the students would have gotten in a first-year course in college. It used to be that 5, 4, and 3 correlated well with grades of A, B, and C, but grade inflation in the colleges appears to have advanced faster than on the APs, so perhaps 5 and high 4s correspond to an A, and low 4s and 3s to a B (depending on the college, of course, as grade inflation is far from uniform).
Of course, the AP test does not measure all the things that go on in an AP course, and a student can do well in the course and poorly on the exam or vice versa, but if A students in a course are consistently getting 1s on the test, it leads one to suspect that grade inflation has happened. Similarly if C students are routinely getting 5s, one suspects that the course grading is ridiculously harsh.
Others pointed out the obvious thing, that the test is a 3-hour snapshot of how a student did on an arbitrary subset of the material on one day. Seniors who have already been admitted to college (or decided not to attend) may have little incentive to do well, particularly if their college does not give credit for AP exams. One teacher, who posts a lot of good stuff on the mailing list, asked me directly:
I am curious to know what portion of the grades in the courses you teach are determined by one, largely multiple choice exam? I’ve never taken a worthwhile course where “passing” was ever determined in such a way.
He called it right on that one. I’ve never given a multiple-choice exam in 31 years of being a professor—multiple-choice exams are very difficult to write well, and really only appropriate when there are enormous numbers of test takers to reduce the cost and variance of grading and amortize the large cost of making the exam. For that matter, I give very few exams—most of my courses are graded on the basis of week-long or quarter-long projects, papers, and programs. I’m not particularly interested in the things testable by multiple-choice tests (mainly memory and simple reasoning tasks), but in what students can do with a sustained effort. My most recent course (Applied Circuits for Bioengineers) was graded mainly on the basis of their weekly design reports based on their lab work (about 5 pages of writing a week, and any mistakes in the schematics or explanations meant that they had to redo the writing).
I’m not going to defend the AP exams as great ways to evaluate learning, but they are better than the exams that most teachers write and rely on for grading, and they do have the advantage of uniformity across a large number of classrooms. A 5 on an AP exam may not tell me a lot about a student’s capabilities, but I believe it tells me more than an A from teacher I’ve never met and who may have only taught a handful of AP students.
I agree that the goal of an AP course is not “college credit” or even “preparation for the AP exam,” but the learning that takes place. But grades and exams scores are used for selecting kids for college admission (as being better than a simple lottery or selection based solely on money or race), so it is better if the exams and grades are as meaningful as they can be made (at reasonable cost—a lot of state testing is providing very little useful data at enormously high cost in both money and lost time). Because teachers have so little opportunity to calibrate their own grading, an external test at the right level provides very useful information.
I agree that a single sample of small number of students may not tell you much about the level of instruction in a class, but may be a warning that recalibration is needed. There are many possible reasons for the discrepancy (difference in content between course and exam, difference in level of expectations, student test-taking ability or attitude, random noise, …). For a course labeled “AP”, the teacher has a responsibility to make sure that the content of the exam is covered in the course. As a parent, I would also want the level of expectations in the course to be as high as in a first-year college course. If the students are uniformly doing worse on the test than what the teacher expects, then some reflection on why the expectations are wrong is needed.
Elsewhere in the discussion, another teacher asked
What if, instead of grades, we could present colleges with better data about what we actually care about: skills that aren’t reportable by traditional methods.
When admissions officers are trying to select <6% of the students from 38,828 applicants (as Stanford did this year), it is difficult to process voluminous communications from individual teachers. They rely on summary statistics (GPA and SAT scores, for example) to do crude filtering, then concentrate on student essays and letters of recommendation. The very selective schools sometimes try to correct the GPAs based on grade inflation at the high school (there are databases of information about each high school being sold—I’ve no idea how accurate the information in the databases is, but some admissions offices use them).
The college faculty, who might care about the “skills that aren’t reportable by traditional methods”, are rarely part of the admissions decisions.
Public universities are usually forced to have a simple formula for most of their admissions, to be able to show the public that they are being scrupulously fair. If they took into account “skills that aren’t reportable by traditional methods”, parents of rejected students would scream to their legislators to cut off funding to the university. (An exception is always made for athletics, which is a sacred cow in the US.)
Later in the discussion, after more narrative transcripts were proposed and the value college admissions officers put on letters of recommendation had been introduced, I wrote about my experience with narrative evaluations.
I’ve been engaged in this debate for decades. When I started teaching at UCSC, we had almost exclusively narrative evaluations,with optional grades in some classes. We moved to optional grades in all classes, then grades plus narratives, and now to grades with optional narratives. (Incidentally, in each vote I voted in favor of narratives with optional grades.) I have not noticed a difference in how easily students get into grad school, though one of the main arguments used against the narrative-only system was that it was hurting the chances for students getting into med schools, as their “pass” grades were getting converted to Cs by the big med schools, no matter what the narrative said. I think that the real driver behind the switch from narratives to grades was the incredible workload of preparing narrative evaluations for large courses. In many classes, the “narrative” was computer-generated from a list of grades, and not very informative.
I had to review narrative transcripts for honors review of graduating seniors, and I often found it very difficult to interpret the narratives. It took a long time to read a narrative transcript, and often told me very little about the student. There was no controlled vocabulary, and the same word might be used by one instructor for a barely passing performance and by another for a truly excellent performance. I could see why med schools could be frustrated by the difficulty of dealing with this format. Nowadays the honors review in the school of engineering is based on GPA, with a well-defined grey region where student research projects can make a difference in the honors rating. I understand that the honors review now takes only a fraction of the time it used to, despite large increases in the number of students reviewed, and the time is spent only on the cases where some thought is needed.
As a grad director, I read a lot of applicant files for admission to our program. GRE scores and GPA do matter, but not very much, as the GRE tests essentially the same stuff as the SAT (nothing college level on the general GRE and there isn’t a subject GRE in our field) and college GPAs are often highly inflated, depending on the college. We also get a lot of foreign applicants, whose grades come on a bewildering variety of different scales that are essentially uninterpretable. We expect high GRE scores of all applicants, but decide between them based mainly on their personal statements and letters of recommendation. What we are looking for is strong evidence that the students can do research (not just coursework), and the best evidence is that the student has already done substantial research. Many of our grad students come to us with multiple publications already—more than I had when I got my first faculty position. Narrative transcripts would not be a good substitute for letters of recommendation from faculty that had supervised research—the signal we’re looking for would be buried in the noise of irrelevant comments for coursework that is not that important to us.
As a homeschooling parent of a high school junior who is likely to fit in best at a super-selective college like Harvey Mudd, MIT, or Stanford, I worry a lot about how to put together his transcript, school profile, and counselor letter to show that he really would fit in. There are mailing lists with 1000s of readers dedicated to parents worrying about how to get their home-schooled children into appropriate colleges (email@example.com, for example). I’m having to rely heavily on external validation of his coursework, much of which is not even accredited. SAT 2 and AP exams form part of that validation, while college and university courses form another part. Science fairs and contest exams (AMC12 and AIME in math, F=mA in physics) form yet another part, though his contest scores are not stellar enough to make him a shoo-in at the colleges where he would fit, since he does all his exams without prep. We are trying to get letters of recommendation from the university faculty (he’s been at the top of the classes), since those will be particularly informative for admissions officers.
I’ve written blog posts about how homeschoolers can get into the University of California, which is highly bureaucratic, but fairly straightforward:
In short, I like narrative feedback to students and their parents, but do not find narrative transcripts to be a particularly useful way to select students from a large pool (whether for honors or for admissions to the next level of education). Small numbers of well-selected letters of recommendation are much more helpful.
I didn’t say this on the AP forum, but I will have to provide a narrative transcript of my son’s high-school curriculum, as it is pieced together from a variety of different sources. We’ll probably provide 2 formats for the transcript: a one-page one that lists courses and grades (for those courses that have grades) and a multi-page transcript that describes the content and level of each course in more detail. Admissions officers in a hurry can glance at the one-page summary to see that he has taken all the expected courses, and those with more time or more interest can read the detailed descriptions to see that the courses were solid, even if not officially accredited.
Teachers participating in this discussion seemed to be in agreement that an AP class had several purposes, and that the most important ones were teaching the kids how to master college-level material and instilling both a knowledge of and a love of the subject. Preparing for the exam was seen as a secondary goal at best—useful for calibrating the level of the course and with the possible college credit as a nice incentive to get students to study, but not the central focus of the teaching. I found the discussion refreshing—teachers were talking about the goals of their courses and the value of different assessment techniques without falling into defensive postures or reciting meaningless eduspeak mantras. There was a general feeling that the AP test was about as good as a 3-hour test could be and useful for calibrating course levels, but that a single data point was not sufficient for evaluating student performance.
I did not see teachers blaming students or prior teachers for poor performance, nor talking about doing a lot of “test prep”, which seem to be the big problems with how standardized tests are handled in schools in discussions I’ve seen elsewhere. The attitude among the AP teachers seems to be that if they teach a good course with student engagement in a lot of the right content, then the test will take care of itself. Most of the discussion on the AP list is about lesson plans, text books, sources for lab supplies, debugging lab procedures or test questions, and (at this time of the year) review materials. There are occasional complaints about administrators (mostly about ridiculously little teaching time for the course or overcrowded labs, but sometimes about administrators tying teacher evaluations to the single 3-hour test taken by the students), but most of the discussion focuses on the content and pedagogy. I wish there were such discussion sites for the college-level courses that I teach.