Gas station without pumps

2011 November 24

Harry Potter’s World—junk science at NLM

Filed under: Uncategorized — gasstationwithoutpumps @ 00:06
Tags: , , , , ,

I was recently pointed to a site at the U.S. National Library of Medicine that uses a popular literary figure to inspire kids to learn real science: Harry Potter’s World Renaissance Science, Magic, and Medicine.  They have both an English-class lesson plan (7th–10th grade) and a science-class lesson plan (7th–11th grade). I was prepared to praise them for this integrated curriculum, which seems to me like an excellent way to try to bridge C.P. Snow’s two cultures in academia.

But I glanced quickly down their list of resources and saw Human Mendelian Traits and Human Mendelian Traits for Teachers. A quick look revealed that both were propagating serious myths about human genetics—myths that have been comprehensively debunked at Myths of Human Genetics.

Unfortunately, the myths form a core part of the lesson, and so there is not a lot salvageable once the myths are removed.  I think that it may be appropriate for the NLM to take this lesson plan off their site until they can rework it into something consistent with what is known about human genetics.  They are not doing anyone a favor by putting their brand name on junk science.

2011 November 19

Starting a local effort to get bioinformatics into AP bio

Filed under: Uncategorized — gasstationwithoutpumps @ 00:32
Tags: , , , ,

As those who have been reading my blog for a while know, I’m on a task force attempting to get bioinformatics into high school biology (particularly AP bio) classes, and I have a series of posts about Advanced Placement Biology courses and the AP Bio exam.  I recently posted about an attempt in Colorado to introduce bioinformatics into AP bio and invited grad students in my department to do something similar.

I had a couple of students respond right away, and I sent out queries to three local high school biology teachers that I had previously had contact with.  One responded enthusiastically, and so I set up a meeting for him, the two students, and myself to meet Friday afternoon (Nov 18).  In the morning, a third student (who had not subscribed to the mailing list I’d announced effort on) indicated an interest also, but she could not come to the meeting.  I expected the other two students to be there.

I was a little surprised to find not two students, but three at the meeting (the two who had sent me early e-mail had recruited yet another student).  We now have four grad students (Olga, Yulia, Dorothy, and Paola) interested on working on getting bioinformatics into bio classes at one (or more) of the local high schools. (Note: more are still welcome—UCSC grad students should probably talk with Olga, who seems to be recruiting others to join.)

We did a lot of brainstorming today to try to figure out the scope and nature of the project.  We agreed on some general principles:

  • The primary goal is to teach students biology, not computer science or bioinformatics.  The bioinformatics should be good support for the underlying biology lesson.
  • Whatever we produce should be made available on the web (but putting any answer keys behind password protection, should we end up producing anything that needs a key).
  • The students will present the lessons to the class (both to expose the high school students to college student role models and to give the grad students practice teaching), but the lessons should be teachable by non-bioinformaticians.  In particular, the high school teacher should be able to teach it himself next year.
  • If things work out well, it might be worth presenting a paper explaining the project (and advertising the materials) at a high school biology teachers conference (perhaps an NABT conference?).

We discussed various tools and possible topics to teach, but did not settle on any particular topic.  Instead each student will think about what tool or technique they would like to teach and what lesson it will support.  One idea that seemed to have some traction was to go through the process of identifying a gene, getting its sequence (with surrounding DNA) from a genome browser, designing primers for PCR, and verifying that the primers uniquely selected the gene again using the genome browser. We also talked about looking at sequence logos of huge alignments (perhaps of HIV proteins) to identify conserved regions, making phylogenetic trees, and other possible lessons. The use of the genome browser to show introns and exons and the greater conservation in most exons was also discussed.

One cool thing one could do with the PCR lesson is to have the students design primers, order the most promising set, and then do a PCR reaction and gel electrophoresis to confirm amplifying the right length DNA. The problem is that we could design the primers, but not then do the PCR, as the high school does not have a thermocycler, and hand cycling with water baths for PCR is rather tedious. (I did once blog about a very low-cost capillary PCR method I read about, but I don’t know if it actually works.) The reagents for PCR cost about $100 for 200 reactions (for example, for the New England Biolabs kit) and need to be stored at -20ºC.  The primers cost about 30¢ a base (in 10 nmole amounts, which I think is enough for about 200 reactions), so add about $20 to the cost of a PCR experiment for the class.  The most expensive thing is the thermocycler, which costs about $300–500 used—I saw one on ecrater.com for about $220, including shipping.  I wonder if any of our faculty with connections in the biotech industry can get surplus thermocyclers cheaper. In any case, it looks like doing the PCR  wet lab experiment would cost about $500 in startup costs and $40/class in consumables.  This may be too much for the high school, unless we can get donations of reagents or loans of equipment.  The PCR reaction itself takes longer than a 90-minute block, so would have to be an after-school or weekend workshop,which may be too big a project for this year.

Obviously, we have not yet settled on what lesson(s) we want to present, and we’ll be doing brainstorming about it in early December, after the fall quarter winds down.  The hope is to have a lesson or two (probably one or two 50-minute classes followed by a hands-on block of 90 minutes) ready for testing with the high school classes by late January (to reinforce what they will have learned about DNA sequences and replication).

I promised the grad students some links to information about the AP bio curriculum, besides my earlier post about resources for teaching bioinformatics in high school.  Perhaps the most important is the AP bio home page for teachers, which has links to College Board’s resources and to the new AP biology curriculum (which affects AP bio tests starting in May 2013). Another is the BME 110 course at UCSC, which is an intro bioinformatics tools course for biologists.  It may be possible to adapt some of the assignments in that class to AP Bio, though the focus on that class is to teach the use of bioinformatics to students who have already had a few college biology courses, rather than to teach the fundamentals of biology, so most assignments will not be directly usable.

UPDATE (2011 Nov 19)

Ted pointed me to a page where PLoS Computational Biology is gathering resources relevant to bioinformatics in high school bio.  They have 4 things there currently (one the article I blogged about, the other three editorials that provide useful advice).

2011 November 7

A First Attempt to Bring Computational Biology into Advanced High School Biology Classrooms

Filed under: Uncategorized — gasstationwithoutpumps @ 00:32
Tags: , , , ,

As those who have been reading my blog fo awhile know, I’m on a task force attempting to get bioinformatics into high school biology (particularly AP bio) classes, and I have a series of posts about Advanced Placement Biology courses and the AP Bio exam. Of course, I’m not the only person interested in achieving this, and others have done far more than I have.

There is a nice paper in PLoS Computational Biology: A First Attempt to Bring Computational Biology into Advanced High School Biology Classrooms, that describes one attempt by researchers at University of Colorado, Boulder to get some understanding of BLAST and tree-building algorithms into high-school bio classrooms in a fairly minimal way (3 lessons of 1–2 class periods each).  They provide the curriculum they used and a “post-game” analysis, where they look at what they would do differently next time, based on the successes and failure of this first attempt.

The tools and basic approach they used seem reasonable, though I question the value of teaching what an algorithm is with the “make a peanut butter sandwich” example, classic as that is. Along with the students, I wonder about the relevance to an AP Bio class.  Doing the living computer exercise seems ok, but it might be better to apply it to a curriculum-relevant algorithm, such as creating a Punnett square.

This paper seems like an important resource, and so I have added it to the list I collected at Resources for bioinformatics in AP Bio.  Incidentally, the authors point to this blog, but to the early post Advanced Placement Bio changes announced, rather than to the more relevant Resources for bioinformatics in AP Bio.

Note to UCSC grad students: I have some contacts with local high school bio teachers, if some of you want to try one of these outreach experiments in education, and I would be glad to facilitate meeting and planning.

2011 November 2

Microbe DNA Swaps

Filed under: Uncategorized — gasstationwithoutpumps @ 02:00
Tags: , , , , ,

Science News has just reported on a new, but unsurprising result: that bacteria are more like to swap DNA with other organisms in the same environment [Nearness Key In Microbe DNA Swaps].

The paper they are reporting on

Ecology drives a global network of gene exchange connecting the human microbiome
Chris S. Smillie
, Mark B. Smith,Jonathan Friedman,Otto X. Cordero,Lawrence A. David, & Eric J. Alm
Nature (2011) doi:10.1038/nature10571

identified recently transferred genes as those having “blocks of nearly identical DNA (more than 500 nucleotides, more than 99% identity) in distantly related genomes (less than 97% 16S rRNA similarity)”.  Given the fairly rapid drift of protein-coding genes in the “wobble bases” (the third base of codons, changing which often does not change the amino acid coded for), which is about 25 times faster than the changes to 16S ribosomal RNA, this definition of recently transferred genes seems reasonable.  There will be a few false positives, but not too many—they estimate that about 99% of their putative horizontal transfers are genuine.  About 27% of their predicted transfers include known mobile elements (phages, plasmids, transposons), but most of the transferred genes (87%) seem to be other genes not associated with the mechanism of transfer.

The observed “most gene exchange occurring between isolates from ecologically similar, but geographically separated, environments.”  Of course, they observed the most exchange between human-associated bacteria, and people move around so much that “geographically separated” does not have much meaning for the bacteria they carry.  The mere fact that the samples could be collected indicates recent contact chains that could spread bacteria and allow DNA interchange between them.  Soil bacteria have slower transport, so it is not surprising that less DNA interchange was found there.

Because most of the DNA transport was found in human-associated bacteria and particularly between pathogenic bacteria, most of the study focused on those bacteria.  It is not clear to me whether these bacteria have more DNA exchange, or if the greater numbers of transfers is just due to the much heavier sampling of human-associated bacteria.  We might well find the similar rates of DNA transfer in other bacteria, if we had similarly large databases of their genomes, with as thorough coverage of the ecological niches. The authors claim to have corrected for this effect and still seen a 25-fold greater exchange rate among human-associated bacteria, but I’m not sure that a good correction can be made without more data on soil bacteria. Soil bacteria for sequencing have been deliberately chosen to get maximum phylogenetic diversity, which is not an easily corrected bias—just looking at raw numbers of genomes when one group is chosen for similarity (disease-causing bacteria) and another selected for diversity (soil bacteria) will underestimate the sampling bias.  As is common in Nature papers, the Methods section of the paper is completely inadequate for determining what the authors actually did.

The conventional wisdom about gene transfer is that phylogeny matters most (because many of the horizontal transfers use viruses, plasmids, and other fairly host-specific mechanisms) and geography next (because of observed geographic patterns of antibiotic resistance and disease strains).  Others have observed that there have been massive exchanges between archaeal and bacterial hyperthermophiles, which inhabit similar niches but are very far apart phylogenetically and spatially. [Aravind, L., Tatusov, R. L., Wolf, Y. I., Walker, D. R. & Koonin, E. V. Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet. 14, 442–444 (1998) http://dx.doi.org/10.1016/S0168-9525(98)01553-4].

I know that David Bernick has observed that two hyperthermophile archaeal species collected on opposite sides of the world (Pyrobaculum oguniense and Pyrobaculum arsentaticum) are very similar genetically (so much so that one even has CRISPR immune sequences protecting it from a virus that infects the other), despite not being able to live in the intervening environment and despite somewhat different metabolic requirements (one is a strict anærobe, the other can live in ærobic conditions).  If hyperthermophiles can have genetic exchange between hot springs separated by 1000s of kilometers (probably on a time scale of 10s of years or less, based on the CRISPR evidence), then exchanges between mesophilic bacteria carried around by humans should be much quicker.

So the new claim that environment matters most is hardly surprising, but some of the consequences are important.  For example, there is a large horizontal gene transfer between organisms in the guts of animals and those in the guts of humans, even if the bacteria themselves do not live in multiple hosts.  That means that antibiotic resistance genes that are strongly selected for in cattle fed antibiotics can easily transfer to human pathogens, spreading antibiotic resistance much more rapidly than previously assumed.  This looks to me like strong evidence that we should immediately prohibit the use of antibiotics in animal feed for any purpose other than the treatment of existing disease, just as we restrict antibiotics for human use.

 

2011 October 15

Orthology databases

Filed under: Uncategorized — gasstationwithoutpumps @ 11:18
Tags: , , , ,

I recently came across this list of orthology databases, which can be very useful for people trying to figure out what various proteins and genes do, and even more useful for those trying to make tools that rely on comparative genomics.

What is orthology and why is it important?

Let’s start with a more general concept: homology.  Two sequences are said to be homologous (to be homologs) if they are both descended from a common ancestral sequence.  The term can also be applied to phenotypic traits, if the genes and regulatory control regions producing the traits are homologous.  Note that it is generally incorrect to talk about “percent homology”—as sequences either are or are not homologous.  Generally, when people use that term they mean “percent identity” (the fraction of bases or amino acids in the sequence that are identical between the sequences) or occasionally “percent similarity” (the ratio of the similarity of the two sequences to the similarity of a sequence to itself, using some arbitrary definition of similarity such as the Smith-Waterman alignment score with a particular BLOSUM matrix).

It is, however, reasonable to talk about close or distant homology, referring to the evolutionary time back to the common ancestor.  Since close homologs usually have near-identical sequences and distant homologs have diverged much more, the percent identity between sequences is often a good proxy for the evolutionary distance.

Homologs arise in evolution by two main mechanisms: gene duplication within a genome and speciation. Biologists classify pairs of homologs according to which of these tow mechanisms occurred at the split from the common ancestor.  If the split between the sequences occurred as a gene duplication, the homologs are referred to as paralogs, while if the split was a speciation event, then the sequences are called orthologs.

It is not always easy to tell whether a pair of homologs are paralogs or orthologs, as there may have been several intervening gene duplication or speciation events on the lineage of either sequence.  The evolutionary history needs to be accurately reconstructed to tell which event occurred at the split between the sequences.  Furthermore, the whole notion of homologs, paralogs, and orthologs assumes that the sequence is an atomic object in evolution, and that its evolutionary history is a tree.  Many proteins are formed by arrangements of protein domains, each of which may have a different evolutionary history, and for distantly related proteins, only parts of the proteins may be homologous, not the entire proteins.

It can also be the case that several paralogs within one species are orthologous to a single protein in a different species, if the duplication of the genes forming the paralogs occurred more recently than the speciation event.  For that matter, it is possible to have several paralogs in both species, each of which is orthologous to all the paralogs in the other species.

Why does orthology matter?

If it is so difficult to tell paralogs from orthologs, why does anyone bother?

Biologists believe (on theoretical, rather than empirical grounds) that genes or proteins that are orthologous are more likely to have a common function than ones that are paralogous.  This orthology conjecture drives a lot of functional inference and annotation.  It has recently been challenged by those who claim that sequence identity (closeness of homology) is a better predictor of functional similarity than the ortholog/paralog distinction, but the evidence on both sides of the question is still rather weak, particularly since inference of orthology is done mainly by closeness of homology, and so the two models are hard to distinguish.

Why so many databases?

Lots of people have needed to annotate pairs of sequences as orthologs or paralogs, but because they have different applications and because the ortholog/paralog inference is difficult, they have ended up with subtly different solutions.  For example, people looking for functions of protein domains may want orthology information for individual domains, while those looking for functions of whole proteins may want to use the domain architecture of the whole protein both for orthology inference and function prediction.  Those looking for subtle relationships among sequences from recently diverged species (such as the primates) may have very different needs from those looking for relationships between eukaryotic and bacterial or archaeal sequences.

Why not roll your own?

If choosing orthologs is so application-dependent, why use someone else’s database, rather than defining your own set?  Indeed, the proliferation of orthology databases comes from just this approach.  But creating a set of orthologs is quite tricky, and the simple techniques that people generally use when they create their own sets of orthologous pairs (like reciprocal best BLAST) do not work very well.  So if there is a well-constructed and maintained database whose definitions and scope are suitable for the problem you are addressing, it is probably better to use the database than to try to redo the work yourself—unless (of course) you have come up with a better method for inferring orthology than the methods used in the existing databases.  If you have a better method, you should probably put your own database on the web for others to use.

 

 

« Previous PageNext Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 260 other followers

%d bloggers like this: