Gas station without pumps

2011 February 5

Adding bioinformatics to AP Bio

Filed under: Uncategorized — gasstationwithoutpumps @ 09:43
Tags: , , , ,

A few days ago, I posted New A.P. Biology Is Ready, and added in the comments a pointer to the  overview for the new AP Bio curriculum, which in turn has a pointer to a 95-page document describing the objectives of the curriculum in great detail.

For those with less reading stamina and who don’t need to teach the course, there is a short presentation of the curricular design that may be more digestible.  This 32-slide presentation makes the overall point of the redesign clear and gives some illustrative examples of the sorts of questions that will be on the new AP Bio exam, but does not have the details needed to design a course.

AP bio teachers are now actively thinking about the new curriculum and how they would teach it. They will probably be doing intensive course development this summer, with testing and tweaking for the next year.  Within 2 years the new AP bio courses will probably be pretty much frozen, the way the old AP bio courses have been with their “dirty dozen” labs.

This is the ideal moment to given AP Bio teachers ideas for how bioinformatics exercises or computational labs can support teaching the “four big ideas” and the “essential understandings” of the new curriculum.

As a member of the ISCB Task Force trying to get bioinformatics into high-school biology, I would love to be able to point teachers to web sites with lesson plans for bioinformatics lessons that clearly support the goals of the new curriculum, but I don’t know of any.  I certainly don’t have time to develop such exercises, and I probably don’t have the expertise needed, as I have never taught biology and have not even taught a bioinformatics-for-biologists course.  I don’t know what the kids have trouble understanding, nor what sort of skills they bring to the class.

My one data point on high school students, my son, is far from typical of AP bio students, so generalizing from how I would teach him would not work. Also, he has no interest in biology or bioinformatics, so I couldn’t test out ideas on him even if I thought it would help.

One approach is to train AP bio teachers in bioinformatics, who could then use their new knowledge to design lesson plans to support their own courses. This is certainly important, and I would like to see bioinformatics faculty around the world developing training courses for biology teachers, but it is a slower approach than giving the teachers almost-ready-to-use lesson plans that they can learn from and teach without extensive training.

The four big ideas that drive the new AP Bio curriculum are pretty basic:

  1. The process of evolution drives the diversity and unity of life.
  2. Biological systems utilize energy and molecular building blocks to grow, reproduce, and maintain homeostasis.
  3. Living systems retrieve, transmit, and respond to information essential to life processes.
  4. Biological systems interact, and these interactions possess complex properties.

I think that bioinformatics is going to be essential to teaching ideas 1 and 3, but I don’t have lesson plans at my fingertips to give to the AP bio teachers. Do any of the readers of this blog have ideas for exercises or labs that they can share?  If you know of a biology teacher using bioinformatics (at a college freshman or AP bio level), please ask them to share some ideas here.

If you have some ideas for good lessons, but lack access to a class to test them with, share that here also.  Perhaps we can pair up people who have ideas but no students with people who have students and want try new approaches to teaching the material.

Remember that the idea here is not to teach bioinformatics, but to use bioinformatics to teach biology.  Many of the lesson plans and exercises for existing bioinformatics classes are not suitable, as they assume that students have already had the biology and just need to learn the tools.  To integrate bioinformatics properly into lower-level biology classes, it is necessary for the tool use to be simple enough that it makes it easier to learn the biology.  Powerful tools are useless for this purpose if learning to use them is a bigger barrier than learning the biology without the tool.

36 Comments »

  1. A couple of years ago I wrote up the following bioinformatics exercise to go with a diabetes lab. I put it up here as an example of a lesson for bioinformatics in High School. Feedback would be appreciated.

    Bioinformatics of Diabetes and Insulin

    Finding the sequence for a Gene using today’s online databases.

    First go to the National Center for Biotechnology Information (NCBI) web site. http://www.ncbi.nlm.nih.gov/
    Follow the following steps to find the sequence of the insulin gene in NCBI.
    Under search, change the dropdown tab that reads “All Databases” to “Gene”. In the search box enter “Insulin homo sapiens” and press GO. The result is about 700 entries. Which do we choose? (Note: When doing this in the computer lab at school the results took 3-5 minutes to display for each student. My computer on a weekend took less than 10 seconds. Slow connection or busy servers during school day may account for this).

    (Answer questions 1-4)

    The top entry will change especially over time. It may read IGF1 or IGF2 which is the Official Symbol for Human insulin-like growth factor ½ or it may read IDDM a gene associated with Insulin dependent diabetes. These are not insulin, but in the case of IGF1 it is the same gene only it codes for a similar protein to insulin and is made in tissues other than the pancreas.

    Review the results of the first two pages of your search. Did you find a gene that is labeled Insulin? Go back and refine your search by entering in “INS” into the search box at the top of the page and press GO.

    (Answer questions 5-6)

    There are fewer results with this search, find the entry for Human Insulin (Homo sapiens). Click on the “INS” link. Things to take note of on this page:
    -The GeneID – only one ID per gene in the database.
    -The Lineage – this gives the complete taxonomy of the organism the gene came from.
    -The Summary – basic information on the gene’s function.
    -Genomic regions, transcripts, and products – Top line is the gene area where on the Chromosome the gene is at. Bottom line has introns in Blue, exons in red. The bold red line are the peptide A and peptide B portions of insulin. The fine red line is removed post-translationally. Right click on the link to the sequence viewer and open it in a new window. The top histogram show a vertical red line. This is the location of the INS gene. The details of the gene are in the bottom half of the screen. Close the window.
    Genomic context – Shows which chromosome and the location INS is on that chromosome. Right click on the MapViewer link and open it in a new window. This is another graphical representation of where the INS gene is in the genome. Note the ideogram on the left side of the page. Again the red line shows the location of the INS gene. Close this window.
    -GeneRIFs: Gene References Into Function – A listing of the articles published about the function of insulin.
    -HIV-1 protein interactions – Specific information on HIV.
    -Interactions – How insulin interacts with other proteins and drugs.
    -Genotypes and Phenotypes – describe the various alleles of the gene and their affects.
    -Pathways – A list of the metabolic pathways that insulin is involved with in the body.
    -NCBI Reference Sequences – all the sequence information.

    (Answer questions 7-10)

    Under the section header, “NCBI Reference Sequences” (About ¾ of the way down the web page). Under the sub-header “mRNA and Protein(s)” and find the “Consensus CDS” sequence data which will be a link that starts with “CCDS”. Select this link which is “CCDS 7729.1”. This is the page that provides the DNA sequence that is a consensus of all the submitted sequence data for this gene. The sequence IDs that were used to form the consensus sequence are listed. And then the chromosomal locations for the sequence is listed. Then we have the actual sequence data.

    Here are both the Nucleotide Sequence and the Translation or Amino Acid Sequence for the protein. With your mouse, click on a Nucleotide letter in the sequence, then an Amino Acid in the sequence. What do you observe? These sequences are copied here for reference.

    Human INS Nucleotide sequence:

    ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAACTACTGCAACTAG

    Human INS Amino Acid sequence:

    MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG
    GPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN

    (Answer questions 11-14)

    Now copy the Nucleotide sequence and paste it as unformatted text into a text editor such as Notepad. There is a menu option at the top of the page called BLAST, select this. You will get a list of genomes to search in. Select the link for “list all genomic BLAST databases”. This link provides a list of all the partial genomes that have been sequenced and submitted to NCBI. Since Pig insulin was used at one time to treat diabetes lets find the sequence for Pig insulin. Under vertebrates/mammals/other mammals, find “Sus scrofa (pig)”. Select the blast link next to the pig (the circle with a B in it). Enter your text copy of the Nucleotide sequence that you copied earlier into the area to enter a sequence. Make sure the radio button above the box is selected. Under “Database” select ‘RefSeq RNA’. For “Program select BLASTN and for “Expect” select ‘0.01’. Now click on “Begin Search” at the bottom of the page. You will get a page with your search information, click on “View report”. Once your report is formatted, it will be displayed, this may take a minute or so as the search is performed. Scroll down and you will see your results.

    At the top of your search results you see information on the database used for the search. Following this is the graphic summary, a description of the matching sequence and the alignment of the sequences.

    (answer questions 15-22)

    The BLAST results gives us an idea of how conserved the nucleotide sequence is between pig and human. The human DNA sequence is above the pig DNA sequence and there is a line between the nucleotide letters where the letters match. When there is no match there is no line and if there is a missing letter there is a dash in the sequence.

    (answer question 23-24)

    Use your back button to return to the CCDS page for insulin. For reference we can look at the total records for the Insulin gene by referencing the nucleotide and protein ID. Return to the sequence IDs at the top of the CCDS page. There are multiple sources for the CCDS information. The last two lines give the current Nucleotide and Protein IDs for the gene. Click on the Nucleotide report link on the NCBI source (last line) of the sequence information. Then click on the NM_000207 LOCUS report for the sequence where there is a lot of information and links on the insulin gene. Things to note:
    -The PUBMED lines are articles that have been written about this gene.
    -Read the summary under COMMENT about half way down the page. Where have you seen that before?
    -Under FEATURES find CDS.

    (Answer questions 25-26)

    Use your back button until you return to the CCDS page. From the NCBI source (last line) of the sequence IDs click on the Protein report for the gene ID NP_000198. This report is for the proinsulin precursor and the AA sequence is at the bottom of the page. There is a lot of information here and links to more about the protein. These are the official pages for the nucleotide and protein in NCBI. At the bottom of the page is the Amino Acid sequence.

    (Answer questions 27)

    We thus have the nucleotide sequence for Insulin and the CCDS page also displays the AA sequence that would be made from this DNA gene. Is the entire AA sequence the insulin protein? Are there introns and exons in this DNA sequence that would be cut out of the AA sequence prior to the translation into insulin? In other words, what part of this actually makes up the insulin protein? It’s hard to tell from the NCBI information what makes up the protein.

    Copy the last 21 AA of the protein sequence, it starts with the letters “give”. Open up a new page in your browser to the Protein Data Bank at http://www.pdb.org. Select the Advanced search tab in the upper left then choose the search database. In the drop down box select “Sequence (Blast/Fasta)” and paste in the 21 AA of the sequence, remove spaces, and click on “Evaluate Query” to start the search.

    (Answer questions 28)

    There are over 100 results and the pictures look different. How to know which one is correct? They all are! Each is a separate research result. A person doing research would need to sort through the results to come up with the structure they want to work with. To make things easy, enter the PDB ID “2omg”, a recent structure for insulin entered into PDB, in the search line at the top of the page and do a site search. The information on the 2omg study of insulin structure should appear. About half way down the page there is a Molecular description of the Asymmetric Unit. There is the Insulin A chain and the Insulin B chain. What this means is that there are two peptide chains and 3 copies of each chain in an insulin molecule. Thus the last 21 AA we entered in for the search is the A chain of insulin. How would you go about finding the AA sequence of the B chain? Click on the “Sequence Details” tab near the top of the page and to get the sequence information on the chains. Locate the B chain AAs in the overall sequence in CCDS. By highlighting the end AA of the chains in CCDS you can see the two peptides in the insulin gene.

    (Answer questions 29-31)

    Return back to the tab on structure summary in the PDB site. Under the picture of the protein on the right of the web page are a list of 3-dimensional viewers, click on MBT Protein Workshop. This will load the protein viewer in java. Play with the viewer and see what the options are. They are improving this viewer all the time and there are always new options.

    Insulin from http://www.pdb.org, PDB ID 2omg

    As you can see the actual insulin protein is a hexamer, 6 separate AA chains that make up the final protein. There are 3 copies each of the 2 different chains. On the right side of the viewer is a list of the chains. Click on chain A to expand to the list of AA. Click on each AA to highlight it in the protein. Expand the list of AA for Chain A. click on the last AA in the list. This should show you the atoms of the last AA in Chain A. You can rotate the picture by left clicking anywhere on the picture and while holding the mouse button down moving the mouse around. Try to get the AA you have highlighted to the top of the screen.

    Finally, we return to Pig insulin. At the top of the PDB page enter in the PDB ID 1ZNI and click ‘site search’. This is the insulin that was given to diabetes patents for many years. Select the tab at the top of the page for sequence details. Here are the two different chains that make up the protein.

    (Answer questions 32-37)

    Insulin sequence & structure questions:
    1. What is the first entry of your search results?
    2. Do you think the first entry is Insulin? Why or Why not?
    3. What species is the gene from? What species is the second entry from?
    4. Review the results of the first two pages of your search. Did you find a gene that is labeled Insulin? What is the name of the gene?
    5. After refining your search to INS, how many hits did you get?
    6. How many of the hits are “INS”? What is the difference between the entries that are “INS” only?
    7. Read the summary. What is the role of insulin?
    8. How many nucleotides long is the gene? (Note:subtract the smaller number from the larger number).
    9. What does post-translationally mean?
    10. What chromosome contains the insulin gene? Describe where this gene is located on the gene.
    11. What does the acronym for CCDS mean?
    12. Do all human genes have a specific identification like CCDS?
    13. How long is the nucleotide sequence for insulin? Why is this number different from the number you entered in question 8 above?
    14. How many Amino acids are in the chain?
    15. What does the graphic summary tell you about the match?
    16. How do we know that our results are from the pig genome?
    17. Review: How many nucleotides are there for each codon? From the answer to questions 13&14, is there the right amount of Amino Acids for the number of nucleotides? Why or why not?
    18. What does the acronym BLAST stand for?
    19. When we run BLAST what are we attempting to do?
    20. How many sequences did your BLAST search return?
    21. How many nucleotides were matched?
    22. How close percentage wise is the match between these segments?
    23. From what you know of Protein translation does it seem that the pig insulin protein will be close to the human insulin protein?
    24. What is the report based on? DNA, RNA, mRNA, or Protein? (look at the top of the page).
    25. Under FEATURES find CDS. What is the length of this segment? (You need to subtract again).
    26. How many amino acids are there in the protein report?
    27. Divide the number of bp in the nucleotide report by the number of bp in a codon, how many should we have in the protein report? Are they equal? Why or why not?
    28. How many structure hits did you get from the protein search?
    29. What is the AA sequence of the B chain?
    30. Refer back to the Amino Acid sequence on the peptide page for insulin n NCBI. What happened to the AA that are between the A and B chains?
    31. On the PDB Why do you think the letters ‘c’ highlighted in yellow in the sequences?
    32. How is pig insulin different from human insulin?
    33. Why did people have allergic reactions to pig insulin?
    34. Are the nucleotide differences between pig and human insulin significant?
    35. Has this exercise in looking at what is available online about insulin helpful?
    36. What more would you like to know about online resources for insulin?
    37. What could be changed to improve this lesson?

    Insulin sequence & structure Answers to questions:
    1. What is the first entry of your search results? IGF2 or IDDM10
    2. Do you think the first entry is Insulin? Why or Why not? No. This entry is for insulin-like growth factor or Insulin dependent dependency … Over time the research results will vary.
    3. What species is the IGF2 gene from? What species is the second entry from? Bos Taurus. This may vary over time.
    4. Review the results of the first two pages of your search. Did you find a gene that is labeled Insulin? What is the name of the gene? INS
    5. After refining your search to INS, how many hits did you get? 682 This will vary as time goes on.
    6. How many of the hits are “INS”? What is the difference between the entries that are “INS” only? They are for different species. The species names are in brackets [].
    7. Read the summary. What is the role of insulin? Binding of this mature form of insulin to the insulin receptor (INSR) stimulates glucose uptake.
    8. How many nucleotides long is the gene? (subtract the smaller number from the larger number) 1415 nucleotides.
    9. What does post-translationally mean? After translation which is after the mRNA has been made into an AA sequence on the ribosomes.
    10. What chromosome contains the insulin gene? 11
    11. What does the acronym for CCDS mean? Consensus CoDing Sequence (CCDS) Database.
    12. Do all genes have a specific identification like CCDS? Yes, all human genes now have a CCDS.
    13. How long is the nucleotide sequence for insulin? Why is this number different from the number you entered a couple of questions ago? 333 nt. The entire gene includes the introns and the expression control region of the gene.
    14. How many Amino acids are in the chain? 110
    15. What does the graphic summary tell you about the match? The red line illustrates that there is a good match for the whole length.
    16. How do we know that our results are from the pig genome? There are references to Sus scrofa.
    17. Review: How many nucleotides are there for each codon? Is there the right amount of Amino Acids for the number of nucleotides? Why or why not? 3: No: The last 3 nucleotides are a stop codon.
    18. What does the acronym BLAST stand for? Basic Local Alignment Search Tool
    19. When we run BLAST what are we attempting to do? Find nucleotide sequences that are identical or similar to the one we have.
    20. How many sequences did your BLAST search return? One
    21. How many nucleotides were matched? 283/333
    22. How close percentage wise is the match between these segments? 84%
    23. From what you know of Protein translation does it seem that the pig insulin protein will be close to the human insulin protein? The pig insulin would be very different. (The part that is different is in an intron.)
    24. What is the report based on? DNA, RNA, mRNA, or Protein? mRNA
    25. Under FEATURES find CDS. What is the length of this segment? 333 nt
    26. How many amino acids are there in the protein report? 110
    27. Divide the number of bp in the nucleotide report by the number of bp in a codon, how many should we have in the protein report? Are they equal? Why or why not? We should have 111. These are not equal. There is a stop codon in the nt sequence that is not translated into the AA sequence.
    28. How many structure hits did you get from the protein search? 129 (may change over time)
    29. What is the aa sequence of the B chain? FVNQHLCGSHLVEALYLVCGERGFFYTPKA
    30. What happened to the aa that were between the A and B chains? The missing portions were the introns and they were cleaved out leaving the A and B chains.
    31. Why are the letters ‘c’ highlighted in yellow in the sequences? These are the AAs that make the disulfide bonds connecting the two peptide strands.
    32. How is pig insulin different from human insulin? There is only one AA difference. This AA is insignificant and does not affect the chemistry of pig insulin.
    33. Why do you think people had allergic reactions to pig insulin? There were impurities, other parts/chemicals of the pig in with the insulin.
    34. Refer back to question 23 and your answer there. Are the nucleotide differences between pig insulin gene and human insulin gene significant? No, they code for the same amino acids, the differences are in the areas that are cleaved out.
    35. Has this exercise in looking at what is available online about insulin helpful? Answers will vary.
    36. What more would you like to know about online resources for insulin? Answers will vary.
    37. What could be changed to improve this lesson? Answers will vary.

    Please send feedback from the last 3 questions to mrkregear@charter.net.

    Comment by Archie Kregear — 2011 February 5 @ 10:47 | Reply

    • It is great to have an example lesson to show people. I’m curious to hear from bio instructors about the usefulness and appropriateness (in terms of level and support of the curricular goals) of this lesson.

      From bioinformatics people, what do you think of this order for the searches? Would you start with the NCBI gene database? I almost never start searches there when I want info on a protein: I start with the protein database or even go directly to PDB. If I wanted info on the gene, I’d probably start with the UCSC genome browser rather than with the NCBI gene database.

      An even bigger problem is that I could get essentially all the useful results of this long series of searches by looking up insulin on Wikipedia, plus a lot more. It might be better to start from the Wikipedia article, then ask the students to dig deeper: finding the differences between human and pig insulin and mapping the residues on the structure, for example, as in the later questions here.

      Comment by gasstationwithoutpumps — 2011 February 5 @ 13:47 | Reply

  2. I would be happy to help.

    I think there are 2 key ideas here:
    a) linear molecular evolution. Chance event in a single genome drive change. This would include base substitutions, deletions, inversions.
    b) non-linear evolution. Events drive by recombination between 2 genomes. This would include horizontal transfer, viral/plasmid integrations, or recombination in diploid organisms.

    I can talk to an A/P bio teacher on the best way to teach bioinformatics… soon….

    Comment by David Bernick — 2011 February 5 @ 11:23 | Reply

    • David, I’m sure you can come up with some great ideas, but remember that the goal is not to teach bioinformatics, but to teach biology by using bioinformatics. I think I agree with you that bioinformatics will be most useful in teaching evolution, which seems to be a hard concept for some students to grasp.

      Comment by gasstationwithoutpumps — 2011 February 5 @ 13:50 | Reply

  3. I noticed the name A. Malcolm Campbell in the link to curriculum design. I’m aware of his excellent reputation for teaching Biology at the Intro and Advanced levels (mainly Genomics) at Davidson College and of his commitment to education of youth. Although he could easily be an R1 PI, he prefers teaching at an undergrad-only college. His Web Site is here: http://www.bio.davidson.edu/people/macampbell/macampbell.html . He makes heavy use of the Web for teaching, so there might be exercises there. You also might contact him directly, since he is listed as a collaborator for the new AP course in the link on curricular design.

    Comment by Jim Tripp — 2011 February 5 @ 14:22 | Reply

  4. I got an email response which I will copy here (removing some identifying information):


    I like the idea of including bioinformatics in AP Biology, but these are the hurdles I face personally:
    – Although I have an M.S. (and a.b.d. PhD) in molecular biology (from ***), I don’t know a heck of a lot about the field.
    – In addition to knowing the subject area, I would need to develop a curriculum, lesson plans, activities and resources to accompany it.
    – The computer resources it would require would presumably mean upgrading our existing computers. We have five in-class computers and access to several computer labs, but I think we’d need in-class computers for each student.

    We did do one bioinformatics-related project recently. My students (I have 38 AP Bio students in two classes) extracted their mtDNA at **** (who then sequenced it for them). We ran our sequences through the Bioserver databases and the kids generated phylogenetic trees comparing their DNA to ancient and contemporary sequences. But after that I was stuck, I didn’t know how to extend that activity or what to attempt as a subsequent project. Any suggestions?

    Comment by gasstationwithoutpumps — 2011 February 5 @ 18:46 | Reply

    • Having their own mitochondrial data is cool, and I think that you are already ahead of the curve in having them do phylogenetic trees on this data.

      Do you have them find and annotate all the genes in their mitochondrial DNA? or was only part of the mtDNA sequenced? How was the sequence assembly done to get the mtDNA sequence from the sequencing reads? Did the students try mapping the individual reads to the assembled sequence?

      I don’t have many ideas what could be done with this data to support the AP bio “essential understandings” though. Could someone else help out with some ideas?

      Comment by gasstationwithoutpumps — 2011 February 5 @ 18:55 | Reply

      • wow.. this is really interesting.

        I just checked at mitoweb, and various mitochondrial alleles could be used to map population movement of a/p students. This area, of course, can be a beautiful study of human populations. Be very careful about this data ever being tied to individual students…human subjects review, etc.

        This kind of study can be done with other organisms that are not subject to these rules..the mycobacteriophage study is a really good example, and it seems it could easily be an a/p bio project. After collecting, isolating and sequencing phage, they can then be analyzed to determine gene content, relatedness, and reorganization of their genes—all wonderful bioinformatic studies.

        Comment by David Bernick — 2011 February 5 @ 19:32 | Reply

      • I got further clarification. They don’t sequence the entire the mitochondrial DNA, but only a short hypervariable sequence from it:

        The sequence that we look at is a hypervariable region that doesn’t code for any proteins. Given my druthers, I’d like to come up with a project that involves searching an organism’s genome database for a gene and another that involves comparing the genomes between two organisms.

        That sounds like a specific enough and common enough request that someone here should be able to come up with a some example assignments out of their homework files. Anyone?

        Comment by gasstationwithoutpumps — 2011 February 6 @ 07:28 | Reply

      • Take this data and what I have done is followup with the genographic Project video and discussions. https://genographic.nationalgeographic.com/genographic/index.html
        regards,
        Archie Kregear

        Comment by Archie Kregear — 2011 February 6 @ 19:48 | Reply

  5. How about constructing hypotheses about the phylogenies of species whose genomes/proteomes are available in a database then testing these hypotheses by constructing cladograms based on individual proteins/genes?
    If this is possible, I would need to know which species to use (which are available and which are appropriate for comparison), how to construct a cladogram using genetic/a.a data (is there software available that an AP student could EASILY learn? Would simply counting the number of differences be sufficient/realistic?), and how to analyze the data well enough to determine whether or not they support the hypothesis.
    Cladistic analysis is included in the framework, and I would love to use a quantitative lab to support phylogenetics in my classroom.

    Comment by Jake Schroeder — 2011 February 6 @ 05:53 | Reply

    • In my research, I rarely build trees, so I’m not familiar with the current tree-building software. I’ve used Felsenstein’s old Phylip package, which has fairly simple command-line programs like dnadist, neighbor, drawtree, and drawgram. This is a fairly widely used package, but I don’t know how easy it is to use by an AP Bio student, who may have no command-line computer skills.

      The documentation (at least in the version I have) is terrible, in that it assumes expertise in phylogenetic tree-building algorithms that few bioinformaticians or biologists would have, much less students in their first biology class. There do seem to be some more tutorial presentations now available on the web (like http://koti.mbnet.fi/tuimala/oppaat/phylip2.pdf), but none of them are short, so many AP bio students would probably find Phylip overwhelming.

      Whole-genome phylogenies are probably too computationally expensive for high-school computer resources, except for virus-sized genomes. Of course, there are huge numbers of viral sequences publicly available. NCBI claims to have 163 genomes for Human papillomavirus (but the same search technique only turns up 2 HIV genomes, when I know there are 1000s available, so I suspect that I am not using the genome database correctly or that their indexing really sucks).

      Comment by gasstationwithoutpumps — 2011 February 6 @ 07:17 | Reply

    • As a possible class assignment in phylogeny, we could ask how the alpha and beta globins came to be in vertebrates. These genes have been sequenced and are present in genbank. Jalview is free, and can construct trees from an alignment. These trees can be compared to each other, and to our understanding of speciation.

      This analysis shows both a gene duplication that gave rise to the alpha and beta lineages, and species divergence since the duplication. It also provides a nice display of evolution (molecular and organism) using bioinformatics as a method to analyze the molecular evidence.

      Comment by dbernick — 2011 February 6 @ 09:45 | Reply

  6. I got another link via email to http://www.grochbiology.org which has 4 bioinformatics activities for AP students. Here is the whole message (reproduced with permission):

    Hi, I have 4 bioinformatics activities that I’ve refined over the years one from a NABT article, one from Kim, and one I made up to augment Bio-Rad’s Fish Lab (Protein Profiler) and one on primates to support Human Evolution. I’ve received a lot of help from Swami’s NGWB at UC San Diego (from Mark Miller who made a tutorial for each activity). Here are the links to the activities and the tutorials.

    Bears and Pandas http://www.grochbiology.org/PandasBearsNewVersion.htm (new version) (comparing nucleotide sequences from http://www.ngbw.org/labs/bears/bears_rev3.pdf : Adapted from Maier, C.A. (2001) “Building Phylogenetic Trees from DNA Sequence Data: Investigating Polar Bear & Giant Panda Ancestry.” The American Biology Teacher. 63:9, Pages642-646.)

    Here is the information about the extension to the Bio-Rad lab: http://www.grochbiology.org/FishPostLabInformationRevD2010.pdf http://www.grochbiology.org/fish%20lab%20genbank.doc http://www.grochbiology.org/fish_protein_rubric.htm

    Walruses, Whales, Seals http://www.grochbiology.org/WhalesActivity.htm (direct link to online tutorial http://www.ngbw.org/labs/seals/seals.htm ) (use of clustal W-P, boxshade, clustal distance, Newick, drawtree) adapted from Kim Foglia

    Primate Activity http://www.grochbiology.org/PrimateActivityNewVersion.htm (new version) http://www.ngbw.org/labs/primates/primate_lesson.htm (help tutorial) (use of clustal W-P, boxshade, clustal distance, Newick data, drawtree).

    Doing the bioinformatics can seem overwhelming but my students run the tutorial animation at the same time as their assignments, a bit at a time. NGWB (new generation workbench) has all the tools for doing bioinformatics in one place no need to download anything just register to be a user. The only issue is that NGWB is a site used by researchers all over the world, check out the map on the front page ( http://www.ngwb.org) and if someone does a huge run (an entire bacterial genome) the site can crash… just send an email and they’ll reset the server.

    Robin Groch
    Accelerated Biology , Biology, AP Biology
    San Ramon Valley High School
    501 Danville Blvd. (formally 140 Love Lane as of August 1, 2007)
    Danville, CA 94526
    Web:http://www.grochbiology.org

    Comment by gasstationwithoutpumps — 2011 February 6 @ 09:46 | Reply

  7. This post prompted an e-mail response from the Bay Area Biotechnology Education Consortium (http://www.babec.org/), an organization that I was not aware of previously. Apparently they have done some bioinformatics workshops and modules for high-school biology teachers. I don’t know how eager they are to share their stuff nationwide, though (their mission is focused on the San Francisco Bay Area).

    Comment by gasstationwithoutpumps — 2011 February 7 @ 10:28 | Reply

  8. It has been pointed out to me the HHMI Holiday Lecture for 2010 on viruses emphasizes the important role for bioinformatics in biology.

    I blogged about the lectures before they happened, but I’ve never had the time to watch them.

    Comment by gasstationwithoutpumps — 2011 February 8 @ 14:00 | Reply

  9. It has also been suggested that I “check out the NWABR website. http://www.nwabr.org/education/itest.html I went to a workshop about this Bio-ITEST module, and am really excited to use it with my students. It includes bioinformatics and bioethics.”

    Comment by gasstationwithoutpumps — 2011 February 8 @ 14:09 | Reply

  10. Another recommendation:
    http://evolution.berkeley.edu/
    supposedly has a lot of educational resources, and searching “molecular phylogenetics” is supposed to get some useful bioinformatics labs.

    Comment by gasstationwithoutpumps — 2011 February 18 @ 08:42 | Reply

  11. http://www.indiana.edu/~ensiweb/evol.fs.html
    has been suggested as another useful bioinformatics lab.

    Comment by gasstationwithoutpumps — 2011 February 18 @ 08:43 | Reply

  12. Another pointer, from an AP bio teacher, which seems to be using bioinformatics on a virus genome:
    http://wiki.pingry.org/u/ap-biology/images/b/bb/TMV_bioinformatics.pdf

    Comment by gasstationwithoutpumps — 2011 March 3 @ 17:15 | Reply

  13. Here is a link that I found at NSTA.

    http://www.genome.gov/glossary/index.cfm?id=17

    It is very good at defining concepts in genomics.

    -Archie

    Comment by Archie Kregear — 2011 March 14 @ 21:29 | Reply

  14. Another site I came across recently http://ecsite.cs.colorado.edu/?page_id=353 has an example lesson they taught to high-school bio classes. It actually has 3 lessons: one on algorithms, one on BLAST, and one on phylogenetic trees.

    Comment by gasstationwithoutpumps — 2011 April 16 @ 11:38 | Reply

  15. […] Adding bioinformatics to AP Bio […]

    Pingback by Blogoversary « Gas station without pumps — 2011 June 5 @ 10:51 | Reply

  16. http://caseit.uwrf.edu is the site for the Case It molecular biology simulation, which is described as follows:


    CaseItv6.05 is a simulation that performs common laboratory procedures on any DNA or protein sequence. It has all of the capabilities of the earlier version (5.03), but adds new features such as bioinformatics analysis and autoloading of DNA and protein samples to speed loading of gels, blots, ELISA, and 96-well PCR.

    The bioinformatics analysis is actually done by another free package, MEGA4, from http://www.megasoftware.net/mega4/mega.html

    Unfortunately, both tools are Windows-only, so I’ll never use or test them, since all the machines I use are either Linux or Mac OS X.

    Comment by gasstationwithoutpumps — 2011 June 5 @ 19:36 | Reply

  17. Another pointer: http://www.dnalc.org/

    Comment by gasstationwithoutpumps — 2011 July 12 @ 23:42 | Reply

  18. Another pointer: http://www.silencinggenomes.org

    Comment by gasstationwithoutpumps — 2011 July 12 @ 23:55 | Reply

  19. […] bioinformatics in AP Bio classes, many of which I had informally gathered as comments on the post Adding bioinformatics to AP Bio. The comments there are worth reading, as I have not collected here all the ideas, just those that […]

    Pingback by Resources for bioinformatics in AP Bio « Gas station without pumps — 2011 July 23 @ 14:16 | Reply

  20. […] Adding bioinformatics to AP Bio […]

    Pingback by Blog year in review « Gas station without pumps — 2012 January 1 @ 14:16 | Reply

  21. […] Adding bioinformatics to AP Bio (gasstationwithoutpumps.wordpress.com) […]

    Pingback by PLoS Computational Biology: Bioinformatics: Starting Early « Gas station without pumps — 2012 February 25 @ 17:50 | Reply

  22. […] Adding bioinformatics to AP Bio […]

    Pingback by Second Blogoversary « Gas station without pumps — 2012 June 2 @ 18:15 | Reply

  23. […] Adding bioinformatics to AP Bio […]

    Pingback by 2012 in review « Gas station without pumps — 2012 December 31 @ 11:18 | Reply

  24. […] Adding bioinformatics to AP Bio […]

    Pingback by Post 1024 | Gas station without pumps — 2013 March 23 @ 19:59 | Reply

  25. […] Adding bioinformatics to AP Bio […]

    Pingback by Blogoversary 3 | Gas station without pumps — 2013 June 1 @ 20:00 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: