Gas station without pumps

2014 September 30

Ebola genome browser

Filed under: Uncategorized — gasstationwithoutpumps @ 21:00
Tags: , , , , ,

For the past week, I’ve been watching the genome browser team (led by Jim Kent) scramble to get together an information resource to aid in the fight against the Ebola virus.  They went public today:

We are excited to announce the release of a Genome Browser and information portal for the Jun. 2014 assembly of the Ebola virus (UCSC version eboVir3, GenBank accession KM034562) submitted by the Broad Institute. We have worked closely with the Pardis Sabeti lab at the Broad Institute and other Ebola experts throughout the world to incorporate annotations that will be useful to those studying Ebola. Annotation tracks included in this initial release include genes from NCBI, B- and T-cell epitopes from the IEDB, structural annotations from UniProt and a wealth of SNP data from the 2014 publication by the Sabeti lab. This initial release also contains a 160-way alignment comprising 158 Ebola virus sequences from various African outbreaks and 2 Marburg virus sequences. You can find links to the Ebola virus Genome Browser and more information on the Ebola virus itself on our Ebola Portal page.

Bulk downloads of the sequence and annotation data are available via the Genome Browser FTP server or the Downloads page. The Ebola virus (eboVir3) browser annotation tracks were generated by UCSC and collaborators worldwide. See the Credits page for a detailed list of the organizations and individuals who contributed to this release and the conditions for use of these data.


Matthew Speir
UCSC Genome Bioinformatics Group

2014 September 25

Details matter

Filed under: Uncategorized — gasstationwithoutpumps @ 11:57
Tags: , ,

In a comment on Mark Guzdial’s blog post Pushback in California on Computing in Schools, “lizaloop” wrote

2. While I support the effort to bring programming into schools I’d like to see us also emphasize the concept of each person taking charge of his or her own learning. Because programming languages change so rapidly anyone who intends to do serious coding will have to repeat the learning process over and over. Our introductory CS classes need to focus more on “learning how to learn” than on the specifics of any one coding language.

While it is certainly true that programing languages go in and out of fashion (where are the COBOL programmers of my youth now?), and it is also true that people need to keep learning if they want to remain in any technical field for long, that doesn’t mean that intro CS courses should give up on teaching specifics and instead tackle the nebulous goal of “learning how to learn”.

In fact, one of the things that distinguishes computer programming from many other courses of study is that you must learn the specifics of the programming language and tools you are using if you are to get anything done.  “Big picture” thinking does not get programs written.  It does you no good to “learn how to learn”, if you then don’t learn any of the details.

One of the major things that students new to computer programming (and to other engineering fields) need to learn is that details matter.  The specifics of the computer programming language they are learning matter—not forever, since they might never use that particular programming language again—but while they are doing what they are doing right now, the details matter.  That is an enduring, transferable lesson, but it depends on having a course in which the specifics of the coding language are viewed as important.

Of course, this does not mean that the teacher has to spend a lot of time on the details—the compiler or interpreter will make it abundantly clear to the students that details matter.  But the teacher can strengthen or weaken that lesson by their own example and by their grading.  Teachers who only do sloppy pseudocode on the board, never filling in the missing details or correcting mistakes, will convey the impression that the details are unimportant.  Teachers who don’t mark down sign errors, punctuation errors, and off-by-one errors in quiz or exam answers also give the impression that “close is good enough”, which is emphatically not the case in real programming.  Teachers who never read their students’ programs, but only do crude I/O testing to grade programs, will give the impression that documentation doesn’t matter.

On the other hand, teachers who spend all their time fussing over semicolons and never talking about the bigger ideas of algorithms, data structures, problem decomposition, and program documentation will produce low-quality copy editors, not programmers. One reason I like Python as a pedagogic language is that the syntax is simpler than many other languages, so that I can get the details right without having to spend so much time explaining the details (and the students can spend more time on more interesting debugging than chasing down punctuation errors).

But I’m not trying to dump on lizaloop’s ideas entirely.  Computer programming courses are a good place to teach students to “learn how to learn”, though not by ignoring the specifics and following some sort of touchy-feely metacognition curriculum.  Indeed, it is because of the specific details that students must master to get anything done that programming is such a good subject for learning how to learn.

Those specifics are essential to the task at hand (not some nebulous future need that is all most math or science classes manage), but they are easily looked up in on-line documentation with generic Google searches.  Try googling things like python reverse string, HTML color, or c++ function pointer to see how easy it is to get online tutorials and documentation on specific details.

It is completely reasonable for a teacher to give students concepts and keywords, but expect them to look up some of the details for themselves.  For example, a teacher might explain RGB color space and how colors are often encoded as three one-byte numbers in hexadecimal format, but not give any specific color codes, expecting students to find the codes they need either by experimentation or by on-line search.

In the approach I’m suggesting, the students are still focused on the details of the coding language, but they are also learning how to learn (at least at the lowest level of learning how to look up specific facts).

Teaching students how to learn more complex concepts on their own (like how to choose which color space to work in) is probably a forlorn hope, but getting them to learn to look up specific facts (like how to transform from HSV color space to RGB color space) and low-level details (like HTML color codes or how to reverse a string in Python) is certainly a reasonable expectation for an intro CS course.

2014 September 21

Narrowing the gender gap in CS

Filed under: Uncategorized — gasstationwithoutpumps @ 13:41
Tags: , , ,

Today’s post collects a few drafts of pointers to articles about narrowing the gender gap in computational fields.  The first article is from CACM,  Computing’s Narrow Focus May Hinder Women’s Participation | News | Communications of the ACM:

In her position as a professor of computer science at Union College, Barr found contextualizing computer science classes led to an increase in female enrollment. “We said, ‘let’s show them that computer science can be useful by giving themes to the introductory CS courses, so students can see their relevance,’” she said. “For us, it’s been enormously successful. Ten years ago we taught the introductory course to 29 students, and 14% of them were women. This year there were over 200 students, and 39% of them were women.” Beyond college, Barr said, she’d also like to see “a bigger funnel into the corporate world and the tech industry, with people coming from many other majors. It doesn’t have to be just CS majors.”

The suggestion there is that providing interesting applications in the intro courses helps retain student interest, particularly among female students.  The  article seems to have struck a chord with some female computer scientists.  Here, for example, is a response from Katrin Becker’s blog:

A big part of what attracted me to computer science was what I could do with what I was learning. That, and that programming is largely about lists, organizing, and puzzles—all things that women often find appealing.

Personally, I think that well-designed intro courses that excite students about the possibilities of the field would serve to retain more men as well as more women, but it is certainly possible that the effect is stronger for some groups of students than for others.  Exactly what applications are chosen may make a difference also—picking applications that fit male stereotypes (car engine controllers and missile guidance systems?) may even be counter-productive in narrowing the gender gap.

Another possible explanation for why women make up such a small part of engineering and the “hard” sciences comes from an article in The Washington Post,  Catherine Rampell: Women should embrace the B’s in college to make more later – The Washington Post:

A message to the nation’s women: Stop trying to be straight-A students.

No, not because you might intimidate easily emasculated future husbands. Because, by focusing so much on grades, you might be limiting your earning and learning potential.

The college majors that tend to lead to the most profitable professions are also the stingiest about awarding A’s. Science departments grade, on a four-point scale, an average of 0.4 points lower than humanities departments, according to a 2010 analysis of national grading data by Stuart Rojstaczer and Christopher Healy. And two new research studies suggest that women might be abandoning these lucrative disciplines precisely because they’re terrified of getting B’s.

The observation is that women are more deterred from entering a field by getting low grades than men are—they found that women who got Bs and Cs in their intro courses changed majors to ones that graded more leniently, while men with low grades continued slogging along in their initially chosen major.  The data was from economics, not engineering, departments, and I don’t know whether the same behaviors apply. The article cites another study that suggests that the same behavior occurs in STEM fields:

Arcidiacono’s research, while preliminary, suggests that women might also value high grades more than men do and sort themselves into fields where grading curves are more lenient.

The suggested action is to advise women not to be intimidated by B grades.  I don’t know whether that has been attempted anywhere, but I have my doubts that just telling people not to be afraid of Bs is really going to change their strategies for maintaining their self images.  Catherine Rampell also makes a rather careless mistake in saying

Remember, on net, many more women enter college intending to major in STEM or economics than exit with a degree in those fields. If women were changing their majors because they discovered new intellectual appetites, you’d expect to see greater flows into STEM fields, too.

The mistake is in assuming that switching to and from STEM fields is equally easy.  In fact, the much larger set of required course and longer prerequisite chains make it much easier to switch out of STEM fields than into them.  Freshmen are advised to prepare for the most restrictive major they are interested in to keep their options open.  What seems to be happening is that women bail out of the tough majors at a higher level of performance than men do.

Of course, it is a mistake to think of “STEM” as monolithic entity. From The Shriver Report – 10 Reasons Why America Needs 10,000 More Girls in Computer Science:

2. Girls Are Already Making the Grade in Bio (Science)

Using AP test-taking as a measure of pipeline illustrates the true nature of STEM participation for girls. Female test-takers exceed or are close to parity with males in psychology, calculus, biology, and chemistry, but only account for 18 percent of AP computer science test takers. According to the National Center for Education Statistics, women already make up nearly 60 percent of degree recipients in biology, a whopping 85 percent in health professions, and around 50 percent in social sciences. In fact, 20 times as many girls took the AP biology test, as did AP computer science. The majority of women in ’STEM’ fields choose life sciences, so simply saying we need to increase the number of women in STEM is a mistake. Instead, we need to narrow the conversation to focus on computing and IT fields, where the shortfall is the largest.

Not only are women already over-represented in biology at the BS level, but biology has been over-producing PhDs for a couple of decades relative to the demand, so that jobs in biology research are very difficult to get and generally pay substantially less then other science and engineering fields.  There are some very high paying jobs in biomedical research, but the demand for them far exceeds the supply—the “postdoc holding tank” in biology is enormous.

I don’t have any action items coming out of these articles—I’ve already put together a freshman design course for the bioengineering majors that did hands-on, applied work providing applications for some low-level computer programming.  While I’ll continue to try to improve that course, there aren’t many lower-division courses taught by our department for majors (the others are bioethics and a no-prereq intro to biotechnology, both of which are dominated by non-majors).  The Baskin School of Engineering has just created a Computational Media department, which will take over the game design program (a predominantly male program) from CS, but which is expected to create some new computational media courses.  we’ll have to see whether these have any effect on the number of women in computational fields at our university.

2014 September 20

Improving feedback for fan

Filed under: freshman design seminar — gasstationwithoutpumps @ 12:22
Tags: , , , , , ,

I wanted to look at the step response of the fan and of the heater, so that I could see if I could derive somewhat reasonable control parameters by theory, rather than by cut-and-try parameter fiddling.  Most of the tutorials I’ve looked at give empirical ways of tuning PID controllers, rather than theoretical ones, even ones that use Laplace transforms to explain how PID controllers work and how to determine whether or not the control loop is stable if you are controlling a linear time-invariant system with a known transfer function.

When I first looked at the fan response, I noticed a problem with my tachometer code:

The tachometer gives two pulses per revolution, but the markers used are not perfectly spaced, so I get different estimates of the speed depending which falling-edge-to-falling-edge pulse width I measure.  The difference between the two speeds is about 1.6%.

The tachometer gives two pulses per revolution, but the markers used are not perfectly spaced, so I get different estimates of the speed depending which falling-edge-to-falling-edge pulse width I measure. The difference between the two speeds is about 1.6%.

I rewrote the tachometer code to trigger on all four edges of a revolution, and to record the time at each edge in a circular buffer. This way I can use a full revolution of the fan for determining the speed, but get updated estimates every quarter revolution, available in micros_per_revolution:

volatile uint32_t old_tach_micros[4];  // time of last pulses of tachometer
	// used as a circular buffer
volatile uint8_t prev_tach_index=0;    // pointer into circular buffer
volatile uint32_t micros_per_revolution; // most recent pulse period of tachometer

#define MIN_TACH_PULSE  (100)   // ignore transitions sooner than this many
				// microseconds after previous transition

void  tachometer_interrupt(void)
{   uint32_t tach_micros = micros();

    if (tach_micros-old_tach_micros[prev_tach_index] < MIN_TACH_PULSE) return;
    prev_tach_index = (prev_tach_index+1)%4;  // increment circular buffer pointer
    micros_per_revolution= tach_micros-old_tach_micros[prev_tach_index];
    old_tach_micros[prev_tach_index] = tach_micros;
}

In setup(), I need to set up the interrupt with attachInterrupt(FAN_FEEDBACK_INT,tachometer_interrupt, CHANGE);

I think that this improved tachometer code may be a bit too much for first-time programmers to come up with. Circular buffers use a bunch of concepts (arrays, modular arithmetic) and are likely to cause a lot of off-by-one errors. Interrupts alone were a complicated enough concept for students to deal with. I don’t know whether the improvement in speed measurement would be justifiable in the freshman design course.

The new tachometer code did smooth out the measurements a lot, though, as expected—it reduces the fluctuation in measured speed to about 0.3%, which is limited by the resolution of the micros() timer I’m using on the Arduino board. I then tried recording some step responses, both for upward steps and downward steps. The upward steps are reasonably approximated by an exponential decay (like a charging curve):

The low speed is 724.7 rpm, and the high speed is 6766.3rpm. The exponential fit is not perfect, but it is certainly a good enough approximation for designing a closed-loop system.

The low speed with PWM=0 (always off) is 724.7 rpm, and the high speed with PWM=255 (always on) is 6766.3rpm. The exponential fit is not perfect, but it is certainly a good enough approximation for designing a closed-loop system.

The response to a downward step, however, is not well modeled by a simple exponential decay:

The fan spins down gradually at first (with a time constant about 1.6s), but at low speed the speed changes faster (as if the time constant dropped to about 0.6s).

The fan spins down gradually at first (with a time constant about 1.6s), but at low speed the speed changes faster (as if the time constant dropped to about 0.6s).

Note that the fan slows down much more gradually than it speeds up, which means that it is not a linear, time-invariant system. In a linear system, superimposing a step-up and a step-down would cancel, so the responses to the step up and step down should add to a constant value—the fan most definitely does not have that property.

I was curious whether the difference was just apparent for large steps, or also for small ones, so I tried steps between PWM duty cycles of 100/256 and 160/256:

A small upward step is again quick, with almost the same time constant as before.

A small upward step is again quick, with almost the same time constant as before.

The small downward step is faster than before, though still substantially slower than the upward step of the same size, and with an initially slower response than the final convergence.

The small downward step is faster than before, though still substantially slower than the upward step of the same size, and with an initially slower response than the final convergence.

I’m going to try writing a couple of ad hoc controllers for the fan, to see if they behave better than the PID controller I’ve been using: open-loop control using just  PWM=(setpoint-740)/25; a simple on/off control with a single threshold; hysteresis, using two thresholds instead of 1; PI control with no anti-windup; and a controller that goes to full on or full off when the error is large, to make a quick transition,  switching to approximately the right PWM value,  when the error is small, with PI control thereafter.

I think that the open-loop controller will have a steady, but wrong speed; the  crude on/off controllers will make an audible pulsing of the fan motor; the PI controller will suffer from overshoot when making big steps, and the on/off/PI controller should make nice steps, if I can tune it right.

I implemented all the controllers and ran a test switching between setpoints of 1000RPM and 5000RPM every 30 seconds.  Here are plots of the behavior with different control algorithms:

The PWM values computed by the various control algorithms show the integrator windup problem for PI clearly after the downward transitions—PI takes a long time to recover from the errors during the downward edge.

The PWM values computed by the various control algorithms show the integrator windup problem for PI clearly after the downward transitions—PI takes a long time to recover from the errors during the downward edge.

The mixed algorithm does a very good job of control, with little overshoot.  The simple PI algorithm has substantial overshoot, particularly when the control loop wants a PWM value outside the range [0,255]. Open loop has significant offset and wanders a bit.  On/off control oscillates at about 10hz, and adding hysteresis makes the oscillation larger but slower (about 5Hz).

The mixed algorithm does a very good job of control, with little overshoot. The simple PI algorithm has substantial overshoot, particularly when the control loop wants a PWM value outside the range [0,255]. Open loop has significant offset and wanders a bit. On/off control oscillates at about 10hz, and adding hysteresis makes the oscillation larger but slower (about 5Hz).

The errors for the mixed controller are only about ±0.3% and overshoot or ringing at the transitions <40RPM.  The simple PI controller overshoots by 340RPM and takes 20 seconds to recover from the integrator windup on the downward transition.  The open-loop controller has offset errors of about 1% and a fluctuation of about ±0.7% at the high speed, and an offset of 1% and fluctuations of about ±0.5% at the low speed.  The on/off controller has an offset of  about 0.5% at high speed with fluctuations of ±2%, and an offset of 28% with fluctuations of ±28%.  Adding hysteresis slows down the oscillations, but makes them larger (0.2% offset, fluctuations ±3% at high speed, and 44% offset with fluctuations of ±70% at low speed).The mixed algorithm which uses on/off control for large errors and PI for small errors, with back-calculation of the integral error when switching to the PI controller seems to work very well.  But would I be able to get freshmen to the point of being able to develop that themselves within a 2-unit course?  Probably not, but I might be able guide them through the development in a series of exercises that started with on/off control, then went to modeling and open-loop control, then the PI control, and finally the mixed control.  It would take most of the quarter.

2014 September 17

Putting the heater in a box

Continuing the saga of the incubator project in the recent posts:

On my to-do list for the project

  • Put the whole thing into a styrofoam box, to see whether extra venting is needed to allow things to cool down, and to see how tightly temperature can be controlled. Find a smaller bread board or prototype board to put the controller on—my current bread boards are all 6.5″ long, and the box only has room for 6″, especially since I put the resistor in the center of the 6″×12″ aluminum plate, which just fits the box. I suppose I could drill a couple more holes in the plate and mount the resistor off center, but I rather like the idea of building the controller as an Arduino shield, so that the Arduino + controller is a single unit. Another possibility is to drill a hole in the styrofoam box and run cables through the box for the resistive heater, the fan, and the thermistor. Even if the grounds are connected outside the box, this is only 8 wires. Putting the control electronics outside the box would reduce the clutter in the box and make tweaking easier

    I got this done today, by drilling a hole in the box and soldering long wires onto the resistor and the thermistor, so that all the active electronics could live outside the box. Incidentally, “drilling” did mean using a drill bit, but I held it and turned it with my fingers—styrofoam is so soft and grainy that I feared a power drill would tear out big chunks.

    Here is the interior of the styrofoam box, with the lid open.  The 6"×12" aluminum plate covers the bottom.  The thermistor is on the left, propped up by a rubber foot, the resistor in is the center, and the fan is sitting on a foam pad on the right. (The foam is to reduce noise until I can get the fan proper mounted in a baffle.)

    Here is the interior of the styrofoam box, with the lid open. The 6″×12″ aluminum plate covers the bottom. The thermistor is on the left, propped up by a rubber foot, the resistor in is the center, and the fan is sitting on a foam pad on the right. (The foam is to reduce noise until I can get the fan proper mounted in a baffle.)

    As expected, I can heat up the thermistor fairly quickly, but if I overshoot on the temperature, it takes a very long time for the closed box to cool back down. Cooling off just 1°C took over half an hour.

  • Add some low-pass filtering to the temperature measurement to reduce noise. Just adding 4 measurements in quick succession would reduce the noise and give the illusion of extra precision.

    I did this also. With the box closed, the thermistor reading is fairly stable, with fluctuations of less than 0.1°C (which was the resolution with a single thermistor reading before adding 4 successive reads).

  • The fan controller occasionally has a little glitch where the tachometer either misses a pulse or provides an extra one (I think mainly an extra one due to ringing on the opposite edge).  I could try reducing this problem in three ways: 1) changing which edge I’m triggering on, 2) using more low-pass filtering before the Schmitt trigger in the edge detector, or 3) using median filtering to throw out any half-length or double-length pulses, unless they occur as a pair.  (Hmm, the half-length pulses would occur as a pair, so this might not help unless I go to median of 5, which would be a lot of trouble.)

    I fixed this also, using two techniques:

    1. In the program I have 3 states for the interrupt routine that catches the edges: normally, I check that the edge is within 3/4 to 3/2 of the previously recorded pulse—if so I record it and continue. If it is less than 3/4 as long, I skip it, and change to a skip state. If it is more than 3/2 as long, I skip it and switch to a force state. In the skip state, I ignore the reading and switch to the force state. In the force state, I accept the pulse length (whatever it is), and switch to the normal state. With this state machine, I ignore a double-length pulse or a pair of half-length pulses together.
    2. The rising edge of the pulse from the tachometer is very slow (thanks to the RC filter of the pullup), but the falling edge is sharp. Extraneous pulses are more likely to occur if I trigger on the slow edge rather than fast edge, so I switched the polarity to make sure that I was using the falling edge (which is the rising edge of the output of the Schmitt trigger).

    I think that changing which edge I used made a bigger difference than trying to suppress the erroneous reads digitally. I no longer hear the occasional hiccup where the control algorithm tries desperately to double or half the fan speed because of a misread timing pulse.

  • Improve my anti-windup methods for both thermal and fan controllers, to reduce overshoot.

    I changed from using conditional integration and back calculation of the integration error to using a decay of the integration error based on the difference between the computed PWM setting and the limit when the limits were exceeded. I’m not sure this improved anything though, and it introduces yet another parameter to tune, so I may go back to the previous method. I did play around with the tuning parameters for the fan loop today, and realized that I still don’t have good intuition about the effect of parameter changes. I noticed that the fan control was oscillating a little (small fluctuations around the desired speed, but big enough that I could hear the changes), and I found ways to reduce the oscillations, but at the expense of slower response to step changes.

  • Improve my modeling of the thermal system, so that I can do more reasonable back calculation on setpoint change.

    I still need to do more thinking about the thermal modeling, since it is clear that I can’t afford overshoot when heating (though overshoot during slow cooling is unlikely to be a problem).

Still on my list with no progress:

  • Consider using a PID controller for the temperature to get faster response without overshoot.  (If I can reduce the noise problem.)
  • Design and build baffling for the fan to get better airflow in the box. I’ve made a little paper and wire baffle, to get better air flow over the resistor, but I’ve not done the full baffling to get good airflow in the box.
  • Figure out how to get students to come up with workable designs, when they are starting from knowing nothing. I don’t want to give them my designs, but I want to help them find doable subproblems. Some of the subproblems they come up with may be well beyond the skills that they can pick up in the time frame of the course. The more I work on this project, the more I realize that I and the students will have to be happy with them having explored the options, without getting all the problems solved.

I want to add to the list today:

  • Add changes to the cumulative error term whenever KP or TI are changed, to keep the PWM output the same after the changes—currently changing any of the control loop parameters adds a huge disturbance to the system.
  • Separate the control algorithm better from the rest of the code, so that I can use the same code base and quickly switch between different control algorithms: on/off, on/off with hysteresis, proportional control, proportional control with offset, PI control, PI control with anti-windup variants, PID control.
  • Add an option for recording the response of the system over a long time, so that I can plot input and output of the system with gnuplot. This would be nice for the fan control loop, but I think it will be essential for the temperature control loop.
  • Research control algorithms other than PI and PID, particularly for asymmteric systems like the temperature control, where I can get fairly quick response to the inputs when heating, but very slow response when cooling.
Next Page »

%d bloggers like this: