Gas station without pumps

2015 September 2

Another bioinformatics teaching post

Filed under: Uncategorized — gasstationwithoutpumps @ 09:52
Tags: ,

This seems to be a good time of year for posts about teaching bioinformatics:  I just got another post about teaching bioinformatics in my feed reader, Scripting for Biology – Online Virtual Classroom-based Module « Homolog.us – Bioinform:

I am building a number of online virtual classroom-based modules for researchers working on biological data. The description for the first one is attached below, and I will have a beta test starting Sept 14. Please feel free to pass to anyone interested. The beta test is free, and all course materials (including cloud account) will be provided. I currently have only a small number of spots left for this one. If interested, please email pandora at homolog.us.

The post describes an upcoming attempt to build teaching modules for researchers.  The classes will be chat-based and one thing particularly struck me:

We will keep the class size small (~10) so that I can monitor the work done by every student. Each student will be solving problem at his own pace without being impacted by the rest of the class. So, if someone learns fast, he can finish the modules quickly or go on to solve more difficult problems.

A class size of 10 is very good for personal attention—even at the University we rarely get the luxury of such a small class.  I wonder whether the modules are intended to scale to larger classes, or if the plan is always to have 10-student classes.

 

2015 September 1

Pedagogy for bioinformatics teaching

Filed under: Circuits course — gasstationwithoutpumps @ 10:48
Tags: , , , , ,

I was complaining recently about the dearth of teaching blogs in my field(s), and serendipitously almost immediately afterwards, I read a post by lexnederbragt Active learning strategies for bioinformatics teaching:

The more I read about how active learning techniques improve student learning, the more I am inclined to try out such techniques in my own teaching and training.

I attended the third week of Titus Brown’s “NGS Analysis Workshop”. This third week entailed, as one of the participants put it, ‘the bleeding edge of bioinformatics analysis taught by Software Carpentry instructors’ and was a unique opportunity to both learn different analysis techniques, try out new instruction material, as well as experience different instructors and their way of teaching. …

I demonstrated some of my teaching and was asked by one of the students for references for the different active learning approaches I used. Rather then just emailing her, I decided to put these in this blog post.

It is good to see someone blogging about teaching bioinformatics—there aren’t many of us doing it, and most of us are more focused on research than on our pedagogical techniques.  For that matter, in my bioinformatics courses, I’ve only been making minor tweaks to my teaching techniques—increasing wait time after asking questions, randomizing cold calls better, being more aware of the buildup of clutter on the whiteboard, … .  Where I’ve been focusing my pedagogic attention is on my applied electronics course and (to a lesser extent) the freshman design seminar.

I’ll be starting my main bioinformatics course in just over 3 weeks, a first-quarter graduate course that is also taken by seniors doing a BS in bioinformatics.  This will be the 14th time I’ve taught the course (every year since 2001, except for one year when I took a full-year sabbatical).  Although the course has evolved somewhat over that time, it is difficult for me to make major changes to something I’ve taught so often—I’ve already knocked off most of the rough edges, so major changes will always seem inferior, even if they would end up being better after a year or two of tweaking.  I think that major changes in the course would require a change of instructor—something that will have to be planned for, as I’ll be retiring in a few years.

My main goals in this core bioinformatics course are to teach some stochastic modeling (particularly the importance of good null models), dynamic programming (via Smith-Waterman alignment), hidden Markov models, and some Python programming.  The course is pretty intense (the Python programming assignments take up a lot of time), but I think it sets the students up well for the subsequent course in computational genomics (which I do not teach) and for general bioinformatics programming in their research labs. I don’t cover de Bruijn graphs or assembly in this course—those are covered in subsequent courses, though both the exercises Lex mentions seem useful for a course that covers genome assembly.

The live-coding approach that Lex mentions in his blog seems more appropriate for an undergrad course than for a grad course.  I do use that approach for teaching gnuplot in my applied electronics course, though I’ve had trouble getting students to bring their data sets and laptops to class to work on their own plots for the gnuplot classes—I’ll have to emphasize that expectation next spring.

It might be possible to use a live-coding approach near the beginning of the quarter in the bioinformatics course—on the first assignment when I’m trying to get students to learn the “yield” statement for make generators for input parsing. I’ve been thinking that a partial worked example would help students get started on the first program, so I could try live coding half the assignment, and having them finish it for their first homework.

One of the really nice things about Python is how easily one can create input handlers that spit out one item at a time and how cleanly one can interface them to one-pass algorithms. Way too many of the students have only done programming in a paradigm that reads all input, does all processing, and prints all output.  Although there are some bioinformatics programs that need to work that way, most bioinformatics tasks involve too much data for that paradigm, and programs need to process data on the fly, without storing it all.  Getting students to cleanly separate I/O from processing while processing only one item at time is the primary goal of the first two “warmup” Python programs in the course.

One thing I will have to demonstrate in doing the live coding is writing the docstring before writing any of the code for a routine.  Students (and professional programmers) have a tendency to code first and document later, which often turns into code-first-think-later, resulting in unreadable, undebuggable code. I should probably make a bigger point of document-first coding in the gnuplot instruction also, though the level of commenting needed in gnuplot is not huge (plot scripts tend to be fairly simple programs).

2014 September 30

Ebola genome browser

Filed under: Uncategorized — gasstationwithoutpumps @ 21:00
Tags: , , , , ,

For the past week, I’ve been watching the genome browser team (led by Jim Kent) scramble to get together an information resource to aid in the fight against the Ebola virus.  They went public today:

We are excited to announce the release of a Genome Browser and information portal for the Jun. 2014 assembly of the Ebola virus (UCSC version eboVir3, GenBank accession KM034562) submitted by the Broad Institute. We have worked closely with the Pardis Sabeti lab at the Broad Institute and other Ebola experts throughout the world to incorporate annotations that will be useful to those studying Ebola. Annotation tracks included in this initial release include genes from NCBI, B- and T-cell epitopes from the IEDB, structural annotations from UniProt and a wealth of SNP data from the 2014 publication by the Sabeti lab. This initial release also contains a 160-way alignment comprising 158 Ebola virus sequences from various African outbreaks and 2 Marburg virus sequences. You can find links to the Ebola virus Genome Browser and more information on the Ebola virus itself on our Ebola Portal page.

Bulk downloads of the sequence and annotation data are available via the Genome Browser FTP server or the Downloads page. The Ebola virus (eboVir3) browser annotation tracks were generated by UCSC and collaborators worldwide. See the Credits page for a detailed list of the organizations and individuals who contributed to this release and the conditions for use of these data.


Matthew Speir
UCSC Genome Bioinformatics Group

2013 November 12

Long weekend, little accomplished

Filed under: Uncategorized — gasstationwithoutpumps @ 22:47
Tags: , , , , ,

I just had a 4-day weekend, in which I got little more accomplished than a usual 2-day weekend.

Much of Saturday was spent trying to use PacBio reads to improve a draft genome of  a V. cholerae strain that  I had built with 454 reads a couple of years ago.  There was no problem getting blasr to map the reads, and I could call variants with “samtools mpileup”, though that took 2 CPU days to complete.  Unfortunately, that did not tell me what I really needed to know, which was whether the orignal assembly was in the right order.  I found a couple of places where the PacBio read mapping indicated problems (either the reads all terminated their mapping at nearly the same point, or they suddenly switched from aligning very well to aligning poorly).  Unfortunately, I’ve not yet figured out a good way to automate this detection, so I’m not sure I can find all the places which might have problems.  Dips in average quality of the mpileup consensus over 50- or 100-base windows pulls out the places where the alignments get bad, but not where they suddenly stop.  Furthermore, once I’ve identified the bad regions, I still need to break the genome apart there, rebuild the bad regions from the PacBio reads that map nearby, and see if I can stitch the genome back together (probably with extra repeats that had not been resolved in the 454 assembly).  I’m considering backing off and building a new genome assembly from just the PacBio reads (after cleaning them up using PacBio2CA and the 454 reads) and the Celera Assembler.  I can then compare the genome built from the PacBio reads and the one built from the 454 reads and resolve any discrepancies. Sigh, this project keeps getting bigger, just as I think I’m almost done.

On Sunday, I did a bunch of small tasks: raked leaves and shredded them, updated the grad alumni web page, announced the Freshman Design Seminar class (which will happen winter quarter, though once again as a “Group Tutorial” to prototype the course before submitting the official paperwork), wrote a letter of recommendation for a student applying to grad schools, scanned in the flyer for “Planet of the Abes” (the recent Dinosaur Prom show), scanned 35-year-old t-shirt of mine so that I can get another copy made, updated my paper list to include the just released PNAS paper, wrote a blog post, and caught up with a lot of my e-mail (though there are still some advising e-mails that I haven’t taken care of).

The flyer,  drawn by Hunter Wallraff, for the Dinosaur Prom Improv show.  Because the edge of the drawing was not reproduced on the flyer, I had to try to add it in by hand to get something usable for the titling of the video.  I did not correct the error in the URL for westperformingarts.com

The flyer, drawn by Hunter Wallraff who holds the copyright, for the Dinosaur Prom Improv show. Because the edge of the drawing was not reproduced on the flyer, I had to try to finish the S and add an E  in by hand for “Broadway Playhouse” to get something usable for the titling of the video. I did not correct the error in the URL for westperformingarts.com

On Monday, I did a lot of grading, wrote another blog post, used the Planet of the Abes flyer to make titles for the Dinosaur Prom video, and rendered the video (tying up my laptop all night).  I also cleaned up the scan of the old t-shirt and converted it to SVG so that a new silk screen printing can be done. I’ve tried looking for the copyright holder for the design, but I have no idea how to find him (or her)—Google image searches bring up nothing similar, and there is no signature on the design or the shirt. I started working on my slides for the talk I have to give on Thursday, but did not get much done.

This morning, I responded to more e-mail, wrote another blog post, did more grading, and returned the activity monitor I’ve been wearing for the past 2 weeks to the Sleep Center. In the afternoon, I did more grading at Gayle’s Bakery in Capitola, met with my son’s consultant teacher for a couple of hours, bought the usual weekly load of soy milk (only 2.5 gallons this week), did some other grocery shopping, finished the grading, recorded the grades, cleared the rest of the advising e-mail, and compared results on group theory problems with my son.  We’re a bit behind schedule there—he’s not finished all the Chapter 1 and Chapter 2 problems I assigned, and we’re supposed to be finishing Chapter 3 this week—I’ve not even assigned Chapter 3 problems yet.

Things I wanted to do this weekend but didn’t:

  • Get the slides done for Thursday’s talk
  • Get the Program Learning Objectives written for bioinformatics
  • Get assessment plans defined and written for the Program Learning Objectives for both bioengineering and bioinformatics.
  • Create a draft of a revised curriculum for the third track in bioengineering (which also needs a new name and a clearer focus).
  • Rewrite the handout for the next programming assignment in the Bioinformatics: Models and Algorithms courss.
  • Write code for looking for regions of the Helicobacter pylori genome that are possibly swapped in the current assembly and test for which rearrangement is most consistent with our data.
  • Start testing the BitScope differential input device they sent me.
  • Start working on Chapter 3 problems in group theory.
  • Start writing a paper on the segmenter that I described in my blog 3 months ago.
  • Clear the leaves off the roof before the rains start, since the leaves form dams that keep the rain from running off into the gutters properly.

There were probably other things, but I forget what they were now.  Once the to-do list gets longer than my piece of paper can hold, things fall off it.

 

2013 May 16

Snarky critiques

Filed under: Uncategorized — gasstationwithoutpumps @ 09:07
Tags: , , ,

I just read a marvelously snarky critique of the ENCODE papers (which most of the bioinformaticians I know considered flawed in their over estimates of how much of the human genome is “functional”).  Perhaps the best of the critiques is this one: On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE.

The article accuses the ENCODE authors of several academic sins:

Oddly, ENCODE not only uses the wrong concept of functionality, it uses it wrongly and inconsistently.

Sadly, the authors of ENCODE decided to disregard evolutionary conservation as a criterion for identifying function.

Some of their comments are marvelously snarky:

According to Eric Lander, a Human Genome Project luminary, ENCODE is the “Google Maps of the human genome” (Durbin et al. 2010). We beg to differ, ENCODE is considerably worse than even Apple Maps.

The article provides solid reasoning for why the estimate that about 80% of the genome is functional is completely bogus, and provides more reasonable estimates:

Ward and Kellis (2012) confirmed that ~5% of the genome is interspecifically conserved, and by using intraspecific variation, found evidence of lineage-specific constraint suggesting that an additional 4% of the human genome is under selection (i.e., functional), bringing the total fraction of the genome that is certain to be functional to approximately 9%. The journal Science used this value to proclaim “No More Junk DNA” Hurtley 2012), thus, in effect rounding up 9% to 100%.

The ENCODE project produced a lot of good data, but some of the hype surrounding it irritated a lot of biologists and bioinformaticians, who are pleased to see the ENCODE hype so amusingly and accurately skewered.

Next Page »