In Critiquing code, I mentioned that I spend a lot of time reading students’ code, particularly the comments, when grading programming assignments. For the assignment I graded this weekend, I did not even run the code—grading students entirely on the write-up and the source code.
The assignment is one that involves writing a simulation for a couple of generative null models, and gathering statistics to estimate the p-value and E-value of an observed event. Running their code would not tell me much, other than detecting gross failures, like crashing code. The errors that occur are subtler ones that involve off-by-one errors, incorrect probability distributions for the codon generator, or not correctly defining the event that they are counting the occurrence of. These errors can sometimes be detected by looking at the distribution of the results (particularly large shifts in the mean or variance), but not from looking at a small sample. The students had available plots of results from my simulations, so they could tell whether their simulation was providing similar results.
So I read the write-ups carefully, to see if the students all understand p-value and E-value (this year, sadly, several still seem confused—I’ll have to try again on improving their understanding), to check whether the distributions the students plotted matched the expected results from my simulations and previous years’ students, an to see whether the students explained how they extracted p-values from the simulations (only a a couple of students explained their method—most seem to have run a script that was available to them without even reading what the script did, much less explaining how it worked).
Whenever I saw a discrepancy in the results, I tried looking for the bug in the student code. In well-written, well-documented code, it generally was fairly easy to find a bug that would explain the discrepancy. In poorly written, poorly documented code, it was often impossible to figure out what the student was trying to do, much less where the code deviated from the intent. Even when the results appeared to be correct, I looked for subtle errors in the code (like off-by-one errors on length, which would have been too small to appear as a change in the results).
There was only one program so badly written that I gave up trying to figure it out—the student had clearly been taught to do everything with classes, but did not understand the point of classes, so he turned every function into its own class with a
__call__ method. His classes mostly did not have data in the objects, but kept the data in namespaces passed as arguments. The factoring into classes or functions bore little resemblance to any sensible decomposition of the problem, but had arbitrary and complex interfaces. The result looked like deliberately obfuscated code, though (from previous programs by this person) I think it represented a serious misunderstanding of an objects-first approach to teaching programming, rather than deliberate obfuscation. I instructed the student to redo the code without using any classes at all—not an approach I usually take with students, but the misuse of classes was so bad that I think that starting over with more fundamental programming concepts is essential.
Some of the students are now getting fairly good at documenting their code—I don’t seem to have any superstar programmers this year, but there are a couple of competent ones who are likely to do good work on their theses (assuming they don’t discard the documentation habits I’m enforcing in this course). Some of the students who started out as very poor programmers are beginning to understand what it takes to write readable, debuggable code, and their programs have improved substantially. Many of the students have figured out how to separate their I/O, their command line parsing, and their computation into clean separate parts of their code, without sacrificing much efficiency. Even several who were writing very confusing code at the beginning of the course have improved their decomposition of problems, simplified their style, and improved readability of their code enormously.
In one comment on Critiquing code, “Al” commented “I would guess it has more to do with grit and the ability for a student to stick with a tough problem to find a solution.” I rejected that interpretation in my reply: “some of the worst programmers are putting in the most effort—it is just not well-directed effort. The assignments are supposed to be fairly short, and with clean design they can be, but the weaker programmers write much longer and more convoluted code that takes much longer to debug. So ‘grit’ gets them through the assignments, but does not make them into good programmers. Perhaps with much more time and ‘grit’, the really diligent students could throw out their first solutions and re-implement with cleaner designs, but I’ve rarely seen that level of dedication (or students with that much time).”
Now I’m not so sure that my reply was quite right. Some of the biggest improvements seem to be coming from students who are working very hard at understanding what makes a program good—when I complain about the vagueness of their variable descriptions or their docstrings, they improve them in the next assignment, as well as redoing the programs that elicited the feedback. But some of the students are falling behind—neither redoing assignments which they got “redo” on, nor keeping up with the newer assignments. So there may be some merit to the “grit” theory about who does well—it isn’t predictive for any single assignment, but it may help distinguish those who improve during the course from those who stay at roughly the same level as they entered the course.