Gas station without pumps

2018 February 25

Weekend off!

Filed under: Uncategorized — gasstationwithoutpumps @ 15:43
Tags: , , , ,

I had only 2 hours of grading to do this weekend (but next weekend will make up for that, with more than 30 hours of grading), so I got a chance to do some other things for a change:

  • Buy groceries at Trader Joe’s.  (“Groceries” is misleading here, as I generally view Trader Joe’s as a beverage store—I bought soy milk, mineral water, hard cider, beer, port, and whiskey, plus cereal, chocolate, and prunes.  I don’t drink whiskey or mineral water and my wife doesn’t drink port or soy milk, but the cider and beer are for both of us.)
  • Do a protein structure prediction for a microbiology colleague.  I no longer use my own tools for protein-structure prediction, as they have succumbed to the changes in C++ and operating systems, so that they can no longer be compiled or run.  I’ve also not maintained the template library for several years.  Because the only predictions I get asked to make these days are ones for which there are good templates, I just use HHpred and Modeller on-line.  For that sort of prediction, they are quick and do an adequate job.  The goal of this prediction was to get a good guess of binding-site residues for a chemosensor, to guide site-directed mutagenesis.  Unfortunately, the available structures did not have ligands bound, and for most of them no one knows what the real ligand is anyway, so I had to make guesses based on the structure without solid evidence for how ligands bind to them.
  • Check whether the nFET and pFET we’ll be using next quarter have small enough gate capacitances to be driven directly from a comparator, or whether we’ll still need to use 74AC04 inverters as digital amplifiers.  We could probably just barely get away with using the comparators, but the chips end up running rather warm, so I’m still going to recommend using the digital amplifier.   One inverter for both the nFET and pFET gate seems to be fine, though—the rise and fall time is short enough that we don’t need to use a separate inverter for each gate.
  • Review courses for the Committee on Courses of Instruction meeting tomorrow—I only had 13 courses to review this time, and I’d already looked at half of them.

I still have this evening—maybe I’ll repot the free live Christmas tree my wife picked up yesterday.  We gave our old one away in January, because it was getting pot bound and we did not want to transfer it to a larger pot—the current one was as heavy as we could haul up the steps.  The new one is tiny, but should last us several years before it gets to be too big.  Today might also be a good day to put the Christmas ornaments back in the attic—we’ll probably have to rebox some of them, as Marcus (our kitten) has shredded some of the boxes.)

2012 September 19

Automated assessment of protein structure prediction

Filed under: Uncategorized — gasstationwithoutpumps @ 22:04
Tags: , , ,

A former student of mine today sent me a link to preliminary results from the latest CASP competition: Automated assessment of protein structure prediction in CASP10+ROLL (Hard targets).

The CASP competition community-wide experiment is an attempt to measure progress in the field of computational prediction of protein structure from sequence.  The idea of the experiment is to distribute the sequences for proteins whose structure has not been released, but which is known or about to be known (data collected and preliminary models built from the data).  The predictors use the sequences to predict the structures and register their predictions with the organizers of CASP.  When the structures are released, the organizers compare the registered predictions with the actual structures and report who has done particularly well.  A conference is held at which the prediction groups who did particularly well on one or more aspects of prediction report how they did it.

These CASP competitions happen every two years, and I’ve been to many of them, generally doing well enough to be invited to speak. For the past few years, since I’ve had no funding, I’ve stopped development of my protein-structure prediction methods, and just maintained the old web servers that provide a free prediction service to the community.

The School of Engineering wants to bill me for the electricity and  machine-room space that the old computers use, but I have no grant to charge them to.  The bill could be considerably reduced if the machines were replaced by newer machines that were smaller, faster, and lower power, but I have no funding for that either.  If anyone has some rack-mount Linux nodes that are less than 5 years old they want to donate, say 40–50 cores with local disks for every 2–8 cores, we could probably reduce the foot print in the machine room a lot.

Although I’ve not been doing active development lately, I did enter the SAM-T06 (written in 2006, based on methods developed in 2004) and SAM-T08 (written in 2008, based on improvements to SAM-T06 tested in 2006) servers into CASP-ROLL and CASP-10. My intent was just to provide a historical baseline of old methods that could be used for measuring progress in the field.

Although official results and results for human-involved predictions will not be available until the conference in Italy, December 9–12, the server-based predictions have been informally evaluated by Yang Zhang, whose server has done very well (best by several measures) in the past few CASPs.

In Zhang’s evaluations, the SAM-T08 server did quite well on the “hard” targets (3rd best of 67 servers), despite having had no development over the past 4 years, just weekly automatic updates to its library of models.  The method was developed to find “remote homologs”—proteins that are related to the target being predicted, but not closely related.  It seems to still be doing well at that task.

On the “easy” targets, where finding homologous proteins whose structure is known is easy, the task becomes one of choosing among different homologs, getting the alignment to the homologs as accurate as possible, and (possibly) combining information from different homologs.  The SAM-T08 method is not particularly good at choosing among homologs, and generally includes a few that are a bit too distant when there are many to choose among.  As a result, among the easy targets, SAM-T08 drops to 42nd out of 67 servers in Zhang’s automated assessment.  There isn’t a huge difference on the easy targets among the top 57 or so servers by his measure, as they are all pretty much pulling up the same templates and making minor tweaks to them.  The CASP assessor will probably pull out a variety of different measures to try to make finer distinctions among the methods.

If you combine the good results for the hard targets and the almost-as-good-as-everyone-else results for the easy targets, the SAM-T08-server comes out 8th of 67 servers for all targets. The older SAM-T06 server is in the middle of the pack, at 35th out of 67.  (Note: choosing other metrics will order the servers differently—I make no claim that the “8th” place position is in any way a robust estimate of the relative quality of the many servers.)

In a way it is very heartening that without my putting in any more work, my servers still do quite well. In another way it is depressing that the protein-structure prediction field seems to have made no progress in the past 4 years (and maybe the past 6).  I guess there is still some hope that a human-assisted prediction did much better but just hasn’t been automated yet, but I’m not holding my breath.  In the past few CASPs, the best human-assisted predictions were not really human assisted, but just the best servers run for longer, perhaps with hints taken automatically from other servers.

In a way, this lack of progress reinforces my decision to leave the field of protein-structure prediction, even though I still had a lot of ideas that could have been tried.  Almost all the ideas I had would have taken a lot of work to make tiny incremental improvements, and NIH has no interest in funding the hard work it takes to make small improvements.

NIH was looking for grant proposals that promised magical leaps forward, but I don’t think that there are any magical leaps coming in the next 10 years, and I was not willing to lie about that in grant proposals.  So NIH stopped funding me, giving the money to people who were better at hyping their research.

I was getting tired of having panels reject my proposals, sometimes for bogus reasons. On one proposal, one reviewer commented that my group didn’t need the money, even though at the time I had only one or two years left on a single grant that supported 2 grad students. I guess the reviewer had me confused with a different group, that had 30 or more grad students and postdocs in it.  I’ve  never had funding for a postdoc (though I paid one for a year out of money that was budgeted for my summer salary and a grad student).

I suppose in a way I really didn’t need the money—if I hadn’t been dedicated to training grad students I could have done the work by myself without funding. I was already converting all the funds for my summer salary into grad student support, and most of the computers I used over the years were ones surplussed from other projects that would otherwise have been thrown out. I never had enough grant funding to buy my own cluster, though I did once have enough to buy a file server and UPS for it, and something like 6–10 years ago I did buy a new desktop computer for my office.  It would be nice to work on a new computers, and to get a file server that is not so old and slow, but none of the federal agencies are interested in funding small equipment grants, and it isn’t worth the effort of writing such a proposal just to get it turned down.

It would have been nice to be able to hire someone to clean up the research code and make it better documented, more distributable, and maintainable. I tried a couple of times to get grants for that, but NIH would rather see the code quietly disappear, or me to spend 2–3 years doing it for free.  I’m not going to ask again, so the code will probably fade into oblivion.

There are a lot of unpublished bits of research in the SAM-T08 server (the scoring function for H-bonds without explicit hydrogens, the scoring function for disulfide bonds, the improvements to the HMM scoring in SAM, … ), but my writer’s block kicks in whenever I try to write them up, so without a co-author they are unlikely ever to get written.

Oh well, I don’t want to think about any more—I’ll just get depressed.  Better to think about the courses I’ll be teaching and the new research collaborations I’m working on, where I can do productive work without writing f***ing grant proposals.

2011 January 22

Course redesign for protein informatics

Filed under: Uncategorized — gasstationwithoutpumps @ 16:25
Tags: , , , , ,

For several years I have taught a graduate protein informatics course every Spring (2005–2009, skipping last year), which has focused mainly on my research area: protein structure prediction.

I will be teaching it again this Spring, and I’m thinking of doing a major revision to the course.  I’ve asked advice from students, alumni, and colleagues on a choice of two different directions to take the course, and I thought I would ask my blog readers also. Note: I’m on sabbatical next year, so this course won’t be offered again until 2013, if ever.  I’m currently teaching an unsustainable teaching load, and some courses will have to be shed.  I’ve only made fairly minor changes to the other 4 courses I’m teaching this year (partly because 2 of them were ones I designed and taught for the first time last year), so I can afford some time to think about more major changes to this course.

I’ve been thinking of two different ways to do the course:

  • PREDICTION: group project on protein structure prediction, building a full-scale prediction web site that uses a mix of UCSC tools and tools from other groups. This would be replacement for the somewhat dated servers currently being run (which have not been modified since 2008 and really follow the 2006 protocols). Individuals would examine different alternatives for standard steps to the prediction, and the group would try to piece together the best choices. This is much more structured and group-oriented than previous offerings of the course. There are unlikely to be PhD-sized projects coming out of this version.
  • DESIGN: journal club and individual projects to come up with design methods for proteins. This will be even less structured than previous offerings, as we struggle to find doable projects in design. There are unlikely to be projects smaller than PhD size, so the challenge will be to find pieces of projects that are still interesting but doable in a quarter.


Prediction is a well-established field with many researchers, so there are a lot of papers to read and standard methods to teach. Testing methods have been well-developed and there is a wealth of data for both training and testing. Building a web service is a useful skill, even if protein structure prediction is not a field you wish to study. I’ve been working in protein structure prediction for about 15 years, and our group has been one of the top ones for most of that time. On the other hand, I’ve not gotten a grant funded in the field for a long time, and I’ve pretty much given up trying. There are only so many rejections I can take before giving up.  I have heard from alumni who graduated with MS degrees that setting up web services is indeed one of the standard tasks that they are expected to do.

Design is much more speculative than prediction, as we have essentially no way of telling if our designs are any good—there certainly isn’t time within a quarter to design a protein, express or synthesize it, purify it, and characterize it. There are few groups working in the field, so the field is not as crowded. Risk: one of the most famous groups has come under suspicion of fraud (or at least fooling themselves) in the past couple of years—reading the papers of that group and the challenges to those papers could be an interesting study in research ethics.

Some new ideas have come up in discussions with others about the redesign.  One that I had not thought of is adding a wet-lab component to the course.  I don’t do wet-lab research and have no wet-lab skills, so this would certainly require a co-instructor (who may be available, but I don’t know if there would be budget to pay him).  I had thought that our 10-week quarter was far too short to learn protein structure and design tools, do a design, express it, and test it in one quarter.  The advances in gene synthesis, though, make this much more feasible than it used to be, as one local company is promising 2-day turn-around for gene synthesis and delivery in an expression vector.  I don’t know that we’d be able to afford them (no prices on their web site!), but I’ve contacted them to check.  We would not get much past doing solubility checks (and maybe circular dichroism), but solubility is one of the big pitfalls of protein design, so even that would be valuable.

Added info:  the gene-synthesis company responded to my request for info last night (five-hour response to an e-mail request on a Saturday is pretty impressive).  They have $0.75/basepair (<15 business days) to $3/basepair (<5 business days), but for the protein sizes we’re interested in it would be a flat $1000 for a rush order. I suspect that the rush order would be too expensive for us, but we might be able to afford the standard order (which they claim averages 10 business days for the small genes we’d be interested in having synthesized).

I’ll be on sabbatical next year, trying to choose the direction for my research for the next 5–10 years.  The two main contenders are protein design and assembly of genomes from next-generation data.  If I revamp the protein informatics course to do design rather than prediction, then the two courses I’ll be teaching in the Spring would correspond to the two fields.  (The other course is Banana Slug Genomics, which I taught for the first time last year.)

So—I’m looking for advice from my readers: should I keep the course subject in protein structure prediction and move it towards development rather than research, or should I change the subject to protein design with much more open-ended and speculative sorts of homework and projects?  Teaching the course exactly the way I last taught it two years ago is also a theoretical possibility (and certainly would be the least work), but I don’t think I want to do that.

%d bloggers like this: