Gas station without pumps

2013 May 17

Gettting high school students excited about programming

Filed under: Uncategorized — gasstationwithoutpumps @ 20:33
Tags: , , ,

When it comes down to it, what I’m feeling right now is probably what every teacher feels at some point—the magical epiphany that “A few weeks ago, my students didn’t know [what programming was], and now they’re [running home to work on coding challenges] and [saying that they want to study computer science in college].”

There were two parts to this success story:

  • a one-hour introduction to programming, which was really a thinly disguised pitch for the computer science course we’ll be offering for the first time next year. I had my students warm up with a Do Now asking them to identify some of the many ways they rely on coding in their everyday lives, without even realizing it. I used this PowerPoint as a basis for our discussion, which led into this (now semi-viral) video put out by Code.org, and finally some live coding in Python (the Word Smoosher was a big hit).
  • a visit by Jeremy Keeshin, cofounder of CodeHS.com. Thirty-two (out of 74) juniors from across the academic spectrum signed up for an after-school workshop that Jeremy ran to introduce students to some of the basics of coding as well as the terrific online platform for learning coding that he has developed. The program is such that students are able to watch short tutorial videos and work on challenges at their own pace, and Jeremy and I mainly circulated to help students troubleshoot …

It sounds like a very good beginning—I hope that the programming course that they are offering next year goes well.

Read the full story at via Infinigons, etc.: Have an hour to fill with your students?.

2013 March 21

Why Python first?

Filed under: home school,Uncategorized — gasstationwithoutpumps @ 11:21
Tags: , , , , , , ,

On one of the mailing lists I subscribe to, I advocated for teaching Python after Scratch to kids (as I’ve done on this blog: Computer languages for kids), and one parent wanted to know why, and whether they should have used Python rather than Java in the home-school course they were teaching.  Here is my off-the-cuff reply:

Python has many advantages over Java as a first text-based language, but it is hard for me to articulate precisely which differences are the important ones.

One big difference is that Python does not require any declaration of variables. Objects are strongly typed, but names can be attached to any type of object—there is no static typing of variables. Python follows the Smalltalk tradition of “duck typing” (“If it walks like a duck and quacks like a duck, then it is a duck”). That means that operations and functions can be performed on any object that supports the necessary calls—there is no need for a complex class inheritance hierarchy.

Java has a lot of machinery that is really only useful in very large projects (where it may be essential), and this machinery interferes with the initial learning of programming concepts.

Python provides machinery that is particularly useful in small, rapid prototyping projects, which is much closer to the sorts of programming that beginners should start with. Python is in several ways much cleaner than Java (no distinction between primitive types and objects, for example), but there is a price to pay—Python can’t do much compile time optimization or error checking, because the types of objects are not known until the statements are executed. There is no enforcement of information hiding, just programmer conventions, so partitioning a large project into independent modules written by different programmers is more difficult to achieve than in statically typed languages with specified interfaces like Java.

As an example of the support for rapid prototyping, I find the “yield” statement in Python, which permits the easy creation of generator functions, a particularly useful feature for separating input parsing from processing, without having to load everything into memory at once, as is usually taught in early Java courses. Callbacks in Java are far more complicated to program.

Here is a simple example of breaking a file into space-separated words and putting the words into a hash table that counts how often they appear, then prints a list of words sorted by decreasing counts:

def readword(file_object):
    '''This generator yields one word at a time from a file-like object, using the white-space separation defined by split() to define the words.
    '''
    for line in file_object:
        words=line.strip().split()
        for word in words:
             yield word

import sys
count = dict()
for word in readword(sys.stdin):
     count[word] = count.get(word,0) +1
word_list = sorted(count.keys(), key=lambda w:count[w], reverse=True)
for word in word_list:
    print( "{:5d} {}".format(count[word], word) )

Note: there is a slightly better way using Counter instead of dict, and there are slightly more efficient ways to do the sorting—this example was chosen for minimal explanation, not because it was the most Pythonic way to write the code. Note: I typed this directly into the e-mail without testing it, but I then cut-and-pasted it into a file—it seems to work correctly, though I might prefer it if if the sort function used count and then alphabetic ordering to break ties. That can be done with one change:

word_list = sorted(count.keys(), key=lambda w:(-count[w],w))

Doing the same task in Java is certainly possible, but requires more setup, and changing the sort key is probably more effort.

Caveat: my main programming languages are Python and C++ so my knowledge of Java is a bit limited.

Bottom-line: I recommend starting kids with Scratch, then moving to Python when Scratch gets too limiting, and moving to Java only once they need to transition to an environment that requires Java (university courses that assume it, large multi-programmer projects, job, … ). It might be better for a student to learn C before picking up Java, as the need for compile-time type checking is more obvious in C, which is very close to the machine. Most of the objects-first approach to teaching programming can be better taught in Python than in either C or Java. For that matter, it might be better to include a radically different language (like Scheme) before teaching Java.

The approach I used with my son was more haphazard, and he started with various Logo and Lego languages, added Scratch and C before Scheme and then Python.  He’s been programming for about 6 years now, and has only picked up Java this year, through the Art of Problem Solving Java course, which is the only Java-after-Python course I could find for him—most Java courses would have been far too slow-paced for him.  It was still a bit low-level for him, but he found ways to challenge himself by stretching the assigned problems into more complicated ones.  His recreational programming is mostly in Python, but he does some JavaScript for web pages, and he has done a little C++ for Arduino programming (mostly the interrupt routines for the Data Logger code he wrote for me).  I think that his next steps should be more CS theory (he’s just finished an Applied Discrete Math course, and the AoPS programming course covers the basics of data structures, so he’s ready for some serious algorithm analysis), computer architecture (he’s started learning about interrupts on the Arduino, but has not had assembly language yet), and parallel programming (he’s done a little multi-threaded programming with queues for communication for the Data Logger, but has not had much parallel processing theory—Python relies pretty heavily on the global interpreter lock to avoid a lot of race conditions).

2013 February 12

Descaffolding

Filed under: Circuits course — gasstationwithoutpumps @ 21:45
Tags: , , , , , ,

Grant Wiggins, in his post Autonomy and the need to back off by design as teachers, talks about the need for teachers to withdraw scaffolding so that students can learn to do stuff on their own:

Everywhere I go I see way too much scaffolded and prompted teaching – through twelfth grade. By high school, Socratic Seminar, Problem Based Learning, and independent research ought to be the norm not the exception: you have no hope for success in college or the workplace without such independence. Yet, practically no district curricula are written to signal, explicitly and by design, the need for increased student decision-making and independence in using their growing repertoire as courses and years unfold. Rather, the work just gets harder but is still highly directed. Endless worksheets, prompts, reminders, and ongoing feedback keep co-opting the development of student autonomy.

Unfortunately, the problem does not stop at 12th grade.  A few years ago, I had a particularly weak group of programmers in my senior bioinformatics class, and I was talking with them about their prior education.  It turned out that most of them had never designed a program before—they had coded, but always within a scaffold provided by the instructor, and they had no idea how do divide a problem into sub-problems, which I see as the very essence of engineering and of programming.  Now, if these students had only had the first Java programming class, I would have been sympathetic, but they had had this level of scaffolding all the way through an upper-division class called “Advanced Programming”.  (They’d also had the same teacher for all their programming courses—a good instructor, but one who scaffolds too much, and so a better teacher for the lower-level courses.)

I complained to a member of the computer-science department who cares deeply about teaching, and he promised to speak with the instructor.  Things were better in subsequent years, but this year I was again hearing from the seniors that all their programming courses involved writing code inside scaffolds provided by the faculty.  The gradual withdrawal of support doesn’t seem to have sunk in as an essential part of the pedagogy of the department (or of that particular instructor, at least).

In my circuits course, I’ve been trying to get the students to do things on their own, without having to be led every step of the way.  I’m making some progress on some aspects of the problem (they are no longer asking “is this right?” all the time in lab, but are taking to heart the “try it and see!” answer I nearly always give them), but progress is slow—I still see no evidence of them reading the assigned material before coming to class, or finding lab partners before the day of the lab, or even doing the prelab design assignments without my explicitly telling them to do so.

I do scaffold the lab assignments, with gradually increasing design complexity and autonomy over the quarter.  (Though I’m thinking of re-ordering some labs next year, so that they get more scaffolding on using gnuplot to model and fit data—that hit them too hard in the electrode lab this year.)

I keep expecting them to want to take things into their own hands and come up with things they want to try, but they all seem to approach labs as ritual exercises in performing pre-determined protocols—the legacy of badly designed physics, chemistry, and molecular bio labs. I need to kick them out of this “ritual magic” view of laboratory work.  Having them do designs before coming to lab to build and test them should help (it certainly did with the audio amp lab)—I’ll have to see if I can work that into the earlier labs more.  That might be easier when I split some of the labs, so that measuring a component is done in one lab, then designing with it before the next.

I am worried that some this year will not be able to do the more detailed design of the last three labs (the class-D power amp, the instrumentation amp for strain-gauge pressure sensors, and the EKG amplifier), even if they understand all the concepts needed and can design each block that goes into the final design.

I’ve started to notice that they are afraid to commit to an answer to an exercise or a design problem even when they do, in fact, know how to do the problem.  If they bring that extreme hesitancy to the final labs, where they have to make several design decisions, they’ll shut down before they get the design done.  They have enough resources (op amps, instrumentation amps, resistors, capacitors, PC board space, …) that they don’t need to come up with anything close to an “optimal” design.  There are lots of “good enough” designs that will do just fine for this course.  I think I need to do some more scaffolding of system-level design (like the block diagram for the audio amp), but I need to withdraw that scaffolding before the EKG lab.

I’m hoping that this week’s tinkering lab will encourage more open-ended exploration of a design space for them, and get them over their fear of not knowing the “right” answer. There is no “right” answer for the tinkering lab.  I did explore the space a little to make sure that there were some easy-to-find designs that were interesting—I don’t want them flailing in a design space that is too difficult to explore.  I also provided scaffolding in the form of systematic exercises in modifying the oscillator (like looking at the effect of adding resistors or capacitors between any pair of nodes—it helps that the initial circuit has only 4 nodes). But I’m not going to try to direct the students to any particular design—I really hope they come up with different designs from each pair of lab partners, and that someone comes up with some wildly different ideas that I did not even explore.

I plan to have the students coming out of the circuits course capable of doing some useful electronics design and of writing readable design reports—goals that are much harder to meet than the “pass a test on some circuits concepts” goal of the EE 101 course.  I’ll be pushing the students pretty hard in the class, because I know that they can do it, even if they are still not convinced of it.

I think that these students have been short-changed in the past by teachers who had low expectations of them. Because the bioengineering students take so many intro courses in so many different sciences, they’ve had little time for the advanced courses that might have stretched them—I’m having to do a lot of stretching them all at once, which is not comfortable for them or for me.

I wish we could have a year to develop the engineering practices at a saner pace, but 10 weeks of circuits is all they get, so I’m trying to make the most of it.

2013 January 22

Where you get your BS in CS matters

Filed under: Uncategorized — gasstationwithoutpumps @ 21:26
Tags: , , , ,

I used to be a firm believer that only your final degree matters—if you get a PhD from a prestigious department, it doesn’t matter where you did your undergrad work.  My own history has lead me to believe this, as my Stanford PhD has been useful in opening doors that I don’t believe a Michigan State degree (my BS institution) would have.  But recently I’ve had cause to rethink this a little—where you do your BS does affect whether you go on to grad school, your chance of getting into a prestigious grad school, your chance of getting a fellowship to stay there, and maybe even your probability of finishing the PhD in a timely fashion. I got lucky in that my non-prestigious BS did not interfere with my getting into Stanford or getting graduate fellowships, but if I’d known then what I know now …

The Computing Research Association has recently released a report about where the CS PhDs in the US did their undergraduate work (thanks to Mark Guzdial for pointing me to it), and it is more lopsided than I thought:

Only one institution (MIT) had an annual average production of 15 or more undergraduates. Three other institutions (Berkeley, CMU, and Cornell) had an average production of more than 10 but less than 15. Together, these four baccalaureate institutions accounted for over 10% of all Ph.D.’s awarded to domestic students. The next 10% of all Ph.D.’s in that period came from only eight other baccalaureate institutions (Harvard, Brigham Young, Stanford, UT Austin, UIUC, Princeton, University of Michigan, and UCLA). In total, 54 (6.7%) of the 801 baccalaureate institutions accounted for 50% of the total Ph.D. production.

Of course, the top three institutions are the top three institutions in computer science by almost any measure (including size), so it is not too surprising that they produce a large number of BS students who go on to get PhDs.  Unfortunately, the report does not provide the rate of alumni going on to get PhDs in computer science by institution, but only in aggregate:

Fraction of BS awardees getting PhDs in computer science within 6 years, by type of baccalaureate institution. [figure copied from http://cra.org/resources/crn-online-view/exploring_the_baccalaureate_origin_of_domestic_ph.d._students_in_compu/]

Fraction of bachelor’s graduates getting PhDs in computer science within 6 years, by type of baccalaureate institution. [figure copied from http://cra.org/resources/crn-online-view/exploring_the_baccalaureate_origin_of_domestic_ph.d._students_in_compu/]

It is clear that the research institutions send far more of their graduates on to get PhDs, but whether this reflects a difference in the goals of their students, the advising they get, or the quality of the education is unknown.

The report tries to get a proxy for quality by looking at how many students from an institution got NSF fellowships or honorable mentions in computer science.  Of course, this may reflect advising as much as it does educational quality, as many eligible students never apply for NSF fellowships.  The tilt towards research institutions is even stronger by this measure:

Approximately 80-90% of all awards were made to students who completed their undergraduate studies at research universities, which is somewhat higher than their representation (76%) in graduate programs overall.  Over the last ten years, students from four-year colleges received 10% of the GRF fellowships (they represent about 11% of students receiving a Ph.D.).  Students from master’s institutions received fewer than 6% even though they represent about 15% of the Ph.D.’s and 40% of all undergraduate degrees.

The report lists the top 22 institutions by number of NSF fellowships their alumni got in computer science (covering 51% of awardees).  Not surprisingly, the top 4 are MIT, Carnegie-Mellon, Stanford, and UC Berkeley (Cornell which was 4th in number going on to get PhDs was 11th in number getting NSF Fellowships—is that bad advising about applying for fellowships, or too theoretical an orientation for NSF?).   Interestingly, there is one 4-year school that makes it into the top 22 list for NSF Fellowships: Harvey Mudd, which beats out bigger schools like UC San Diego and UC Irvine (the only other UCs besides Berkeley to make it onto the top 22 list—UCLA doesn’t make the list).  A few other 4-year schools do respectably (Olin College of Engineering, Swarthmore, and Williams College), but most get just one or two students going on to get NSF fellowships in CS.

My son is currently a junior in high school and has expressed a desire to go to grad school in computer science, so we need to choose colleges to visit.  I don’t think we’ll have the time or energy to visit 22 colleges, but I think we should probably concentrate our visits on the colleges and universities that are sending kids on to grad school in large numbers and getting NSF fellowships for them—he is more likely to have the peer groups and advising he needs at such institutions.  Looking at the named institutions in the top 12 for production and in the top 22 for NSF, I get a pretty short list—only 10: MIT, Berkeley, CMU,  Cornell, Harvard,  Stanford, UT Austin, UIUC, Princeton, and University of Michigan.  We might want to add in some more West Coast institutions from the top 22: University of Washington, Cal Tech, Harvey Mudd, UCSD.

I don’t think we’ll visit all 14 campuses (Cornell is damned hard to get to—even worse than when I taught there 26 years ago, and UIUC is not much better), but at least this list is shorter than the other ones we’ve tried to compile, and we have prior evidence that these schools are good at getting many students on the path that he currently wants.  Harvey Mudd is the only small school on the list, and I wonder if we should add a couple of other small schools—Olin College of Engineering and Swarthmore, for example.  Of course, I don’t know when he or I will have time to visit colleges—we both have pretty full schedules this year.  He may have to apply to some without visiting them, and only visit if they accept him.

2012 December 7

On grading programs

Filed under: Uncategorized — gasstationwithoutpumps @ 10:01
Tags: , , , , , ,

One of our senior grad students sent me an e-mail today:

Subject: Pedagogy question
Date: Thu, 6 Dec 2012 22:07:34 -0800

I was talking about 205 with one of the first year students and it
occurred to me that I didn’t know how you tested people’s code. The
student was sure that you actually read the code and I always imagined
you had a pipeline that used unit testing on the submissions and only
read the code when things went wrong.

How do you do it? Thanks

The answer is that for most assignments I do both.  I have a Makefile that runs each student’s assignment on a number of test cases, and compares the output to the desired output.  For some assignments, where the output is not unique (for example, in sequence alignment, where there may be multiple correct alignments, I do some postprocessing of the outputs to get comparable values (for alignments, I rescore their output alignments and compare sequence lengths and scores, rather than the alignments themselves). I’m doing simple I/O testing, not unit tests, as unit tests need to be designed into the programs as they are built.  I’m not giving the students scaffolds to build their programs around, not even library specifications, so I can’t have a generic set of unit tests for their programs.

I often have to put student-specific code into the Makefile to correct for minor student errors, like misnaming a command-line option for the program or having set the default values for options wrong.  Sometimes I have to make a copy of a program and edit it (for example, to remove MS-DOS carriage-return characters from the source code) before I can get it to run, but I try to avoid doing that after the first couple of assignments.

Some assignments do not lend themselves to simple I/O comparisons (such as the simulation tests to determine the frequency of long ORFs on the reverse strand of a gene). For those, I don’t run the programs, but rely on the student-reported histograms and model fits for the output check.

Without the I/O tests, it would be very difficult for me to grade the programs well, as there are often subtle bugs in student code that can only be teased out by testing unusual inputs—I certainly don’t have the days it would take to do a thorough reading and debugging of each student’s code.  Each year I try to improve at least one of the automated tests to do a more thorough job of testing—sometimes this improvement involves modifying the assignment to provide more clearly specified or more testable formats, sometimes it is just the addition of some more corner cases to the test suite, and sometimes it is better postprocessing to do more thorough checking of the output.  I reuse nearly the same assignments each year (though 2 assignments this year were used for the first time), so that the testing improves gradually from year to year.

After doing the I/O checks, I then read all the programs. I start with the programs that did not pass I/O tests, trying to provide debugging help.  That debugging help is usually the most time-consuming part of the grading.  On all the programs I try to provide some feedback on the documentation (which is usually awful, though I had one student this year who really got the point of documentation and did a better job of it than I usually do).

In addition to the documentation feedback, I try to provide suggestions to make the Python more efficient or more idiomatic. For example, I had one student this year who always used complicated “while” loops where simple “for” loops would have been much easier to read, and lots of times people would write multi-line loops than could be more easily expressed with a list comprehension.  Students who wrote very good, idiomatic Python got many fewer comments from me than students who were struggling, though I always tried to find at least one useful thing to say to them.

The code-reading is rarely fun—maybe 10% of the students program well enough to make reading their code pleasant rather than painful, but I think that the feedback the students get from it is one of the most important parts of the course pedagogically.  Most of the students have never had such detailed feedback on their programming, because it is expensive to provide.  It takes me an average of 10–20 minutes per student to do the code reading (depending on the complexity of the assignment), which is barely feasible for a class of 20 once a week and would be prohibitive for a larger class or  with more frequent assignments. The I/O checks generally only take about 10 minutes per student, and could be reduced substantially if I just failed students who didn’t pass the I/O checks, rather than trying to patch my test or their code to allow almost working code to be tested.

I spend much more time on the weaker programmers than on the good ones, which may be why the senior grad student does not remember much of the feedback (he also took the course when we were still using Perl, rather than Python, and I provided less feedback on good programming style and idioms—it is damned hard to write a good program in Perl, and many Perl idioms are nasty, ugly hacks).

By the end of the course, everyone who passes is at least a minimally competent Python programmer and can write the sort of data-wrangling code that every computational biologist needs.  I think that this year students all ended up with a decent grasp of using tuples as keys or values for hashes and of using generators (made with “yield” statements) for doing input parsing.

Several of them also learned how to collaborate without copying. I had several students who talked to each other about each assignment, helping each other learn Python and the algorithms of the course, while writing distinctly separate code.  They were also very good about acknowledging their collaborators—a habit that more courses should be trying to develop. There was only one student I had to chide about copying code he did not understand (and that wasn’t working anyway), and even he had properly cited the original author of the code, so was not in any danger of an academic integrity violation.

Most of the students also can do substantially better documentation than when they started the course.    Over half the class is now routinely commenting on the meanings of their variables and the return values of their functions, though not always as clearly as I would like.

I’m considering rewriting my old “document-this-code” assignment from when I taught technical writing, updating it for Python and adding it to the course, which is already pretty full with 7 programs, a fellowship application, and a research paper, in addition to the “content” material about bioinformatic models and algorithms.  The workload is high for a 5-unit course, which should total only about 120 hours, including the 35 hours of class time (I suspect that the workload varies from 100 to  150 hours already), and so I’m hesitant to add another assignment. It seems that many of the students have never seen a decently commented program and have no idea what I mean when I complain about the vagueness of their comments. They understand when I complain about the lack of comments, but don’t know what they should put in the comments.

All the students in my course have had previous programming courses, often several such courses.  (I had a couple of students start the course this year without prior programming courses, and drop after the first two assignments, with the intent of trying again after taking some programming courses.)  If I remember right, only one of the students started the course with a good documentation style.

I understand why the huge CS courses can’t provide the detailed feedback needed to get students to document well, but I think that it is a shame.  Writing programs is like writing papers in many ways, with the same concerns about organization at different scales and the need for clarity, completeness, correctness, and conciseness (the four Cs of technical writing—I was going to write a blog post about that over a year ago, and never finished it—one of my 167 unfinished draft posts).  Programming classes should be taught like writing classes, with detailed feedback from experienced programmers, but I’m afraid that administrators are so in love with mass production that they are more likely to want to make programming courses bigger with less feedback to the students rather than move to a high feedback, high cost approach that could actually produce good programmers.

Of course, administrators would love to convert the writing courses into mass-production classes also. Writing classes used to have only 20 students per instructor, in order to provide adequate feedback, but the continued defunding of instructional activity in favor of administrative bloat and constructing new research buildings has resulted in writing classes having more like 30 students per instructor, with the consequent decrease in the quantity and quality of feedback to the students.  It is only a matter of time before administrators decide that MOOCs with “peer feedback” are good enough for the peons, and eliminate small classes and professional feedback from writing courses also.

 

 

 

 

Next Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 148 other followers

%d bloggers like this: