Gas station without pumps

2013 November 8

Critiquing code

Filed under: Uncategorized — gasstationwithoutpumps @ 22:57
Tags: , , ,

In The Female Perspective of Computer Science: Why Arts and Social Science Needs Code: Testimonials, Gail Carmichael continues her “Why are we learning this?” guide for arts and social science students with “a set of testimonials from people in the field that learned to code.” I’ve pulled out a little piece of one testimonial here:

Emily Daniels, Software Developer and Research Analyst, Applied Research and Innovation at Algonquin College

As an artist you probably already have a thick skin developed by years of crits where others continually tear down your work and expect you to pick up the pieces. This will prepare you for similar responses to your programs and is also immensely useful in software development. It seems from my experience that most computer studies programs don’t spend nearly enough time preparing people to respond well to negative or constructive feedback of their work. [emphasis added] It would benefit a lot of developers to be able to take criticism in stride like an artist can, so if you can, you are ahead of the game.

I think that this is an important critique of many engineering programs, and of computer science programs in particular. Many students finish CS programs with high grades but are still unable to write good programs.  A big part of the problem is that no one has ever looked at their programs—certainly not critically, with an eye toward making the students better programmers through pointing out things that they have still not mastered. I think that CS may need to have more of the sort of criticism that a good studio art class or writing circle has: strong feedback about what needs improvement tempered with some praise for what is good. (I’m not a proponent of the “3 good for 1 bad” school of ego-stroking—that approach provides a lot of emotional support but very little improvement in performance.)

I try to provide strong feedback in my first-year grad course in bioinformatics, where I require eight programming assignments and two writing assignments.  The prior programming experience of my students varies, from students who’ve had just two introductory programming courses to students who have had BS degrees in computer science and 25 years as programmers in industry.  As a general rule, more experience in programming results in more competence at the programming assignments, and those with CS degrees do better than those without, but the differences are not as large as one would expect. I’ve had students who had earned straight As in several previous programming classes but who could not produce adequate programs even for the “warm-up” assignment in three tries (with extensive feedback on the first two).  I’ve also had students who had only one or two prior programming classes work very hard and produce adequate (though not stellar) programs, even on the more difficult assignments near the end of the course.

For many of the students in my course, I am the first person to read their code, no matter how many previous programming courses they have had.  I’m often also the first person to give them feedback on their variable names, comments, and docstrings—the things that the compiler ignores but which are crucial for anyone trying to understand the code.  Many of the students have no idea how to write a program that can be read by someone else (or even by themselves in six months), because they have never written anything but throwaway code, and no one has taught them the difference between good programming practice and throwaway programming.

What are the differences between the really good programmers, the adequate programmers, and the poor programmers?  Is there any way to predict who will rise to the challenge and who will fall flat?

The really good programmers seem to be the smartest (they also do very well on the written work that does not include programming), and they are very good at decomposing problems into sensible subproblems that can be clearly described and independently implemented and tested.  The cleanness of their problem decomposition leads to clean data structures and simple functions that are easy to document.  I believe that many of them write their docstrings (defining what the functions are supposed to do) before writing the code that implements it.  Most of the top programmers have had a lot of previous programming experience, but not everyone with a lot of experience turns out to be a good programer.  I don’t think I end up teaching the good programmers very much about programming—mainly I reinforce habits that they might otherwise have been tempted to let slide, since no one else seemed to care.

The adequate programmers do fairly reasonable problem decompositions, slightly awkward data structures, and code that is almost right.  They often debug their programs into existence, messing up on subtle boundary conditions.  Their variable names tend to be vague, giving some indication what is in the variable, but not a precise indication of the meaning.  They often make mistakes that result from interpreting a variable one way in one part of a code, but slightly differently in a different part of the code.  Their documentation seems to be an attempt at explanation after their code is finished and more or less working—it provides them no help during the time they spend debugging, which is most of their time. Their programs are often several times longer than the programs by the good programmers, because they added a lot of unnecessary special-case code, to compensate for awkward program decomposition or data structures.

Many of the adequate programmers can become good programmers, with some help in learning how to decompose problems more cleanly.  One of the best ways I can think of for getting them to improve is to have them write their docstrings before they code the functions, to provide terse but fairly complete external descriptions of those functions.  If they can’t come up with terse, clean descriptions, then they’ve probably done the decomposition wrong, and a little more time spent on thinking about the problem at a high level will probably save them a lot of debugging time later on.  Getting them to focus on the precise meaning of their variables and data structures and getting them to think about edge conditions probably has a lasting impact on their programming ability. I see enormous improvements in some of these students over the 10 weeks I have them in class, and I like to think that the huge amount of time I spend on providing feedback has a lasting effect, not just a do-it-for-the-class-but-never-again effect.

The poor programmers decompose the problem in random ways, often with highly inappropriate data structures (like copying an I/O format as an internal representation).  The awkward decomposition results in functions that cannot be tersely described, since the entire environment in which the function is embedded has to be just-so for the function to have any meaning at all.  Their variables usually have meaningless names (like “flag”, “i”, or “args”) or highly misleading names (like “kmer” for the length of a string, rather than for a length-k string).  Their code is often poorly tested, crashing on standard uses cases or obvious boundary conditions.  In many cases it looks like the students have tried to “evolve” their code—making random mutations to the code in the hope of increasing its fitness.  As in biology, most mutations are deleterious, so this approach to programming only works if you make millions of tries and use very strict selection criteria.

I think that many of the poor programmers who have had several programming courses have only worked on “scaffolded” assignments, where they fill in the blanks on programs that someone else has decomposed for them.  They have never decomposed a problem into parts themselves, and have no idea how to go about doing it.

I don’t know how to convert the poor programmers into adequate programmers—I can point out their problems to them and suggest better decompositions or better data structures, but I don’t know whether they will learn from that. Many can take specific suggestions (about variable names or data structures) and implement the changes, but there does not seem to be much transference to the next problem, where they again do random decompositions of the problem and use meaningless variable names.  In many cases, I think that the muddiness of their code reflects muddiness of their thinking (their writing in English is often similarly disordered and confused).  If they are new to programming, there is hope that with practice they will learn to think more precisely, but if they have been programming for a while and are still flailing, I have no idea how to help them.  Luckily I don’t get many really poor programmers—most tend to drop after the first couple of assignments, realizing that their usual programming style is not going to get them through the course.

I have had students switch from being poor programmers to adequate programmers (in one case after failing the course three times and succeeding on the fourth try), but I don’t think I can take any credit for the improvement—I didn’t do anything differently on the fourth try than on the previous three. I’ve also had one student fail four times, so it isn’t just that I give up and pass students who aren’t doing the work.

I don’t generally fail students until near the end of the course—when earlier work is not up to passing quality the grade I give is “REDO”.  I expect students to redo the work fixing the problems that I’ve identified and resubmit it.  Many students do learn from the redone assignments, and start turning in adequate work the first time. For many of the weaker students, it may take more than one round of “redo” before their work is of high enough quality. A few never seem to get it, and make the same types of mistakes in assignment after assignment.

After about three assignments, I can tell pretty much how well the students will do for the rest of the course.  The top programmers on the first three assignments will continue to do well, often improving their coding in minor ways as they pick up the little bits of feedback I can provide them.  The adequate programmers who are striving to become good ones and the adequate programmers who are content to stay at their current skill levels are also evident.  I’d like to spend the most time on feedback for the adequate programmers striving to become better—they are the ones most likely to benefit (they are also usually the majority of the class).  In practice, though I spend most of my grading time on the bottom of the class, trying to figure out what is going on in really unreadable code.  I am sometimes tempted to triage the grading, with little time spent on the good programmers or the hopeless ones, and if the class were much larger I’d have to do that, but so far I’ve been trying to provide useful feedback to everyone.

I have yet to find any good way to predict who will do well before I’ve read the first two programs.  Number of CS courses, grades, or years of programming experience are only weak predictors.  I suspect that I might get more useful predictions from SAT scores or IQ tests measuring general intelligence, but I don’t have access to that information.


  1. Fantastic post, I arrived here in a round-about way (thank you interwebs), but I have a feeling SAT scores or IQ won’t help you either. I’ve been discussing this with a lot of people recently and I would guess it has more to do with grit and the ability for a student to stick with a tough problem to find a solution.

    I would be fascinated to hear if you examine this further.

    Comment by Al — 2013 November 10 @ 07:52 | Reply

    • I’m not seeing “grit” as being particularly predictive in my classes. Occasionally, yes—some of the repeated failures came from students giving up and not turning in work 3 weeks into the course, or never redoing the assignments that got grades of “redo”. But these students are all seniors or grad students in engineering or science programs. They’ve been through previous programming courses or through difficult courses like organic chemistry—they’ve been highly selected for their ability to stick with a tough problem.

      Indeed some of the worst programmers are putting in the most effort—it is just not well-directed effort. The assignments are supposed to be fairly short, and with clean design they can be, but the weaker programmers write much longer and more convoluted code that takes much longer to debug. So “grit” gets them through the assignments, but does not make them into good programmers. Perhaps with much more time and “grit”, the really diligent students could throw out their first solutions and re-implement with cleaner designs, but I’ve rarely seen that level of dedication (or students with that much time).

      Comment by gasstationwithoutpumps — 2013 November 10 @ 09:43 | Reply

  2. I don’t get this. Who is not reading their code? That is an appalling dereliction. Even when I was a grad student at an R1 university, and there were 200 students in CS1, their code was read. There were 10 TAs. The professor and the head TA developed assignments, and a grading standard (we didn’t use the term rubric back then) for it. Design was always part of the standard. When we TAs graded, we filled out a grading sheet for each program, and put comments on the code. I have followed that practice ever since.

    Would it be OK if I posted this on the SIGCSE mailing list for discussion? I think this is a very important topic.

    Comment by Bonnie — 2013 November 11 @ 04:39 | Reply

    • My grad students come from all over the country, and the undergrads both from my school and surrounding community colleges. A lot of schools seem to be relying primarily on I/O testing for beginning programming courses. Even those that are reading the code are generally only grading for “correctness” not style. Very few are grading based on the clarity and usefulness of the comments and docstrings. It may be that there are some upper-division programming courses that do finally start teaching students how to document their code properly and how to factor problems reasonably, but I’m not seeing much evidence for it in the students coming out of lower-division programming courses.

      Grading standards (“rubrics”) are great for providing uniform feedback across multiple graders, but they are not very good at inducing thoughtful, useful feedback. I find that I need to read code fairly carefully to catch the inventive and unusual ways in which students mess up a program—there is no way I can anticipate all the problems, even after giving essentially the same programming assignment 10 times.

      I have occasionally had a TA do grading for me, and I’ve found that most know so little about how to write programs well that they don’t produce much useful feedback on how to write well. (And the really top programmers are generally not available as TAs, since everyone wants to hire them as grad student researchers.) I’ve been lucky once or twice to have a top programmer as a TA (someone interested in the teaching experience, even though it meant a cut in pay from the researcher stipend), but my classes are small enough that I rarely get assigned a TA.

      Of course, my sample is small, and may be distorted by the large fraction of biologists rather than computer scientists that I see in the class.

      Comment by gasstationwithoutpumps — 2013 November 11 @ 08:37 | Reply

      • Honestly, when I was a grad student, I think we TAs were better programmers than the professors! :-) I agree that grading standards and rubrics do not cover all, but in CS1, the bulk of the mistakes tend to follow certain patterns, and when we got something really wild, we conferred with each other. But certainly, the expectation in my day was that someone with more expertise would read each students program, and that design would be part of the grade.
        One of the nice things about teaching at a smaller school is that our intro sections are small enough (around 25) that we can realistically read the code ourselves.

        Comment by Bonnie — 2013 November 11 @ 10:10 | Reply

        • I can think of several professors who are not good programmers, so it would not surprise me to find TAs who are better programmers than them.

          A class size of 25 is about right for a writing course with detailed feedback—and that is what a programming course should be, but rarely is.

          Comment by gasstationwithoutpumps — 2013 November 11 @ 11:08 | Reply

  3. […] Critiquing code, I mentioned that I spend a lot of time reading students’ code, particularly the comments, […]

    Pingback by Grading programming assignments | Gas station without pumps — 2013 November 12 @ 09:48 | Reply

  4. […] on why they have found it useful to learn to code (thanks to Alfred Thompson for the link).  Gas Stations Without Pumps has an interesting post based on one of the […]

    Pingback by Why Arts and Social Science Needs Code: Testimonials | Computing Education Blog — 2013 November 28 @ 22:08 | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: