Gas station without pumps

2014 October 22

Banana Slug genome crowd funding

Filed under: Uncategorized — gasstationwithoutpumps @ 21:20
Tags: , , , , ,
T-shirt design from the first offering of the class.

T-shirt design from the first offering of the class. (click for high-res image)

A few years ago, I taught a Banana Slug Genomics course, based on some sequencing done for free as a training exercise for new technician.  I’ve mentioned the course occasionally on this blog:

The initial, donated sequencing runs did not produce enough date or high enough quality data to assemble the genome to an annotatable state, though we did get a lot of snippets and a reasonable estimate of the genome size (about 2.3GB total and about 1.2GB unique, so a lot of repeats).  All the class notes are in a wiki at https://banana-slug.soe.ucsc.edu/) and the genome size estimates are at https://banana-slug.soe.ucsc.edu/bioinformatic_tools:jellyfish.

I did manage to assemble the mitochondrion after the class ended (notes at https://banana-slug.soe.ucsc.edu/computer_resources:assemblies:mitochondrion), but I now think I made a serious error in doing the assembly, treating variants due to a heterogeneous mitochondrial population as repeats instead.  The mitochondrion was relatively easy, because it is much shorter than the nuclear genome (probably in the range 23kB to 36kB, depending on whether the repeats are real) and has many more copies in the DNA library, so coverage was high enough to assemble it—the hard part was just selecting the relevant reads out of the sea of nuclear reads.

Ariolimax dolichophallus at UCSC

Ariolimax dolichophallus at UCSC, from larger image at http://commons.wikipedia.org/wiki/File:Banana_slug_at_UCSC.jpg

The banana slug genomics class has not been taught since Spring 2011, because there was no new data, and we’d milked the small amount of sequence data we had for all that we could get for it.  I’ve played with the idea of trying to get more sequence data, but Ariolimax dolichophallus is not the sort of organism that funding agencies love: it isn’t a pathogen, it isn’t a crop, it isn’t an agricultural pest, and it isn’t a popular model organism for studying basic biology. Although it has some cool biology (only capable of moving forward, genital opening on the side of its head, penis as long as its body, sex for up to 24 hours, sometimes will gnaw off penis to separate after sex, …), funding agencies just don’t see why anyone should care about the UCSC mascot.

Obviously, if anyone is ever going to determine the genome of this terrestrial mollusk, it will UCSC, and the sequencing will be done because it is a cool thing to do, not for monetary gain.  Of course, there is a lot of teaching value in having new data on an organism that is not closely related to any of the already sequenced organisms—the students will have to do almost everything from scratch, for real, as there is no back-of-the-book to look up answers in.

At one point I considered asking alumni for donations to fund more sequence data, but our dean at the time didn’t like the idea (or perhaps the course) and squelched the plan, not allowing us to send any requests to alumni. When the University started getting interested in crowd funding, I started tentative feelers with development about getting the project going, but the development people I talked with all left the University, so the project fizzled.  I had a full teaching load, so did not push for adding starting a crowd-funding campaign and teaching a course based on it to my workload.

This fall, seemingly out of nowhere (but perhaps prompted by the DNA Day celebrations last spring or by the upcoming 50-year anniversary of UCSC), I was asked what it would take to actually get a complete draft genome of the slug—someone else was interested in pushing it forward!  I talked with other faculty, and we decided that we could make some progress for about $5k–10k, and that for $20k in sequencing we could probably create a draft genome with most of the genes annotated.  This is a lot cheaper than 5 years ago, when we did the first banana slug sequencing.

Although the top tentacles of the banana slug are called eyestalks and are light sensing, they do not have vertebrate-style eyes as shown in this cartoon.  Nor do they stick out quite that much.

Although the top tentacles of the banana slug are called eyestalks and are light sensing, they do not have vertebrate-style eyes as shown in this cartoon. Nor do they stick out quite that much.

And now there is a crowd funding campaign at http://proj.at/1rqVNj8 to raise $20k to do the project right!  They even put together this silly video to advertise the project:

Nader Pourmand will supervise students building the DNA library for sequencing during the winter, and Ed Green and I will teach the grad students in the spring how to assemble and annotate the genome.  Ed has much more experience at that than me, having worked with Neanderthal, Denisovan, polar bear, allligator, and other eukaryotic genomes, while I’ve only worked on tiny prokaryotic ones. (He’s also more famous and more photogenic, which is why he is in the advertising video.) We’re both taking on this class as overload this year (it will make my 6th course, in addition to my over-300-student advising load and administrative jobs), because we really like the project. Assuming that we get good data and can assemble the slug genome into big enough pieces to find genes, we’ll put up a genome browser for the slug.

I’m hoping that this time the class can do a better job of the Wiki, so that it is easier to find things on it and there is more background information.  I’d like to make the site be a comprehensive overview of banana-slug facts and research, as well as detailed lab notebook of the process we follow for constructing the genome.

Everyone, watch the video, visit the crowd funding site, read the info there (and as much of the Wiki as you can stomach), and tell your friends about the banana-slug-sequencing effort.  (Oh, and if you feel like donating, we’ll put the money to very good use.)

Update 30 Oct 2014: UCSC has put out a press release about the project.

Update 31 Oct 2014: It looks like they’ve made a better URL for the crowd-funding project: http://crowdfund.ucsc.edu/sluggenome

2012 March 20

Petridish, another science crowd-funder

Filed under: Uncategorized — gasstationwithoutpumps @ 11:08
Tags: , , , , , ,

Thanks to a post on the New Zealand blog misc.ience (Petridish – the new kid on the science crowdfunding block), I’ve found out about another crowd-funding service, in addition to SciFund, that I blogged about before. Petridish was created specifically for science funding, and I’m not yet sure what its advantages and disadvantages are compared to SciFund.

As with any funding source, the important questions include

  • How much money can be raised?
  • What is the probability of getting the funding?
  • How much effort is involved in trying to get that funding?
  • What strings are attached to the funding?

SciFund charges 4% and 4% for credit-card processing—I believe that they are also a for-profit company, since they don’t mention tax deduction anywhere.

Petridish is a for-profit company, and they take 5% of all donations (the research projects are also responsible for credit-card fees, which I believe run another 3–5% depending on the card used, and can be much higher for tiny transactions, due to fixed minimum fees). Petridish is looking into ways to make (part of) donations tax-deductible, but they are unlikely to be successful at that.

SciFund is a keep-it-all funder—the person requesting the funding gets everything that is raised (minus fees), whether or not they reach their funding goal.  This allows setting a higher goal, though there are some incentives in place for keeping the goals realistic. Many projects reach their funding deadline without coming close to their initial funding goals and some go well over—funding amounts seem to be in the range $10–$10000 ($300–$3000 if you remove a few outliers), almost independent of what the funder requested, with a median of about $1000.

PetriDish is a all-or-nothing funder: “Projects will only be funded if they reach their goal before the deadline set by the researcher.”  That means that researchers have to guess how successful the crowd-funding will be when setting their goals, despite having no access to information about how many people visit the site, nor what the success rate is for other projects. (That information may become available, once PetriDish has some history to share.)

Researchers who guess wrong are unlikely to get a second chance: “We hand select the most interesting and meaningful projects we find to be featured on our site and then allow you to get involved.”  So not only do researchers have to guess at the tastes of the general public, but they also have to guess at the tastes of an unknown review panel.  The panel may be easier to please than a typical funding agency panel, though, as PetriDish is not risking any money by accepting a proposal—just a little bit of credibility if the project is bad.

I think that the keep-it-all funding of SciFund makes more sense for science funding.   Crowd-funding will rarely pay for a complete project—it will almost always be a small add-on that will enable doing a little more, not making or breaking a project.  Forcing the scientists to gamble on how much to ask for seems silly in that context.

What strings are attached?  Projects must offer rewards to the individuals funding the project, just like SciFund:

Every reward is unique to its project. Some rewards offered on Petridish include:

  • Souvenirs from the field, like a rock from the highest peak in Madagascar or a vial of water from 400 feet below the surface.
  • Talks or dinners with famous researchers
  • Limited edition photographs or artistic renditions of the subject matter
  • Acknowledgements in journals
  • Naming rights for new discoveries, like new species
  • In person participation in a field project

In my earlier post about SciFund, I discussed the possibility of using it to get some funding for banana slug genomics—a project that has some potential for being achievable with only about $5000 or $10000 in funds (as long as no one is paid from the funds—even one quarter of grad student funding costs too much).  The expensive part of scientific research is nearly always the personnel, and I don’t see any way that crowd-funding will make the slightest dent in that cost.

I see SciFund and Petridish as more an opportunity for outreach and publicity for cool projects than as serious sources of funding for science. In that context, I’m seriously tempted to put together a funding request for banana slug genomics, which has a “coolness” factor that few of my other projects have.  What’s stopping me is mainly my fear of the University bureaucracy, who will prohibit me from attempting crowd-funding, soak up any money that comes in as “overhead”, or just make it so difficult to use the money that it would be less painful to fund things out of my retirement savings.

2011 November 3

SciFund crowd-sourcing science funding

Filed under: Uncategorized — gasstationwithoutpumps @ 02:25
Tags: , , , , ,

For those science projects that don’t need a lot of money, but you can’t get Federal funding agencies interested in (or for which the time and effort it would take to write a proposal and get it funded are out of proportion to the money needed), there is now an alternative: crowd-funding.

Here’s how it works at SciFund (part of RocketHub):

A “creative” proposes a project, describing it with text, pictures, and/or videos.  There are 3 required components: a funding goal, a deadline, and rewards for the “fuelers” who donate money.  The rewards can be anything legal, except investment opportunities, lotteries, revenue share, or equity.  They can be tangible (like t-shirts, copies of artistic works, … ) or experiences (like seminars, opportunities to participate in research, … ).

RocketHub collects a percentage of all donations (4% if you make your target goal, 8% if you don’t) and passes on shares of credit-card fees (for another 4%), thus keeping 8–12% of money collected, which is not a bad overhead (according to Charity Navigator, the median for charities is about 10%).

Ariolimax dolichophallus at UCSC

Ariolimax dolichophallus at UCSC. Image via Wikipedia

I’ve been wondering it it would be worthwhile to put together a funding request for reagents for finishing the banana slug sequencing.  I think that we need between $5000 and $10000 for that, which may be in the range of crowd funding.  I’d have to pin down the amount better and get commitments from volunteers to do the sequencing if the money comes through.  Rewards could be banana-slug genomics t-shirts or coffee mugs (though the donations would have to be big enough to pay for the extra cost of the rewards).   We could also offer a webinar about the banana slug and its sequencing for any level of donation.

Of course, one problem with this idea is that the most likely people to contribute to a crowd-funding campaign for sequencing Ariolimax dolichophallusare UCSC alumni who are proud of the unusual mascot, but the dean of the School of Engineering has told faculty not to contact alumni about the silly idea of sequencing the banana slug (or at least, I’ve heard rumors to that effect—I’m sure that the development office wants to keep a tight handle on the reins of any fund-raising).

We’d have to come up with a way to convince people that sequencing the banana slug is really cool, as the competition for “fuelers” is pretty stiff, and some of the projects on SciFund sound pretty cool.

2011 July 5

More on the banana slug mitochondrion

Last week I thought I was done with assembling the mitochondrial genome of UCSC’s beloved banana slug mascot Ariolimax dolichophallus. All that was left was finding some wet-lab volunteers to do some PCR to disambiguate a repeat region.  I even blogged about it.  Despite that, I’ve spent the past week still working on the genome.

First, I had to come up with primers for the wet-lab person to order to do the long-range PCR.  That took me longer than it should have, because I’d never designed primers before, and used the wrong set of parameters for Primer3. That is, I used the default set, but was later told that the SantaLucia 1998 set is preferable.  I also had to find out the favorite melting temperature for the people doing the wet-lab work (they preferred 55°C).  Even after that, the high AT content of the banana-slug mitochondrial genome required some fussing to get adequate GC content in the primer and a GC clamp at the 3′ end (all this was new to me, so it took me over a day to get primers that satisfied everyone).  I believe that one or two pairs of primers have now been ordered for doing the long-range PCR.  When that is working, I’ll have to design more primers for doing Sanger sequencing.

After doing the primer design, I decided to use my new “look-for-exit” Python program to see if the Illumina reads that I had identified as mitochondrial suggested any other variants on the assembled genome.  I’d already used the program to find lots of variants of the 615-long repeat, but I’d not applied it to the whole mitochondrial genome.  When I did, I found another region about 150 base pairs upstream of the long repeats that seemed to be another repeat region.  So I ended up spending another day or two extracting all the variants of that repeat (which seems to be about 20 copies of a 185-long sequence).  I also designed primers for doing long-range PCR over the short repeat region also, though that went much faster than my first attempt.

When I had all variants I could find of both the short and the long repeats, I did another mapping of all the Illumina reads with BWA, to identify the mitochondrial ones.  I expected the number of reads to go up slightly, as reads that matched the new repeat blocks were kept.  Instead the number went down!  Something was wrong. The single-ended merged reads behaved as expected, but fewer paired-end reads were being kept, even though I was asking BWA for read pairs in which either read mapped.  It turns out that if both ends map, but BWA doesn’t like the separation, it throws the pair out.  Because I now had more copies of the repeats, but probably in the wrong order, BWA sometimes found a better mapping, but then didn’t keep the pair, because the separation was too big.  So I had to redo the mapping setting the maximum insert size to be bigger than the length of the genome.

Now I have lots of reads, and look-for-exit does not find any more variants supported by more than 7 reads (there is one C->T change that has 7 reads supporting it, but over 500 supporting the C, so that may just be sequencing error or a rare somatic mutation due to deamination).

Notes on this stuff have been put on the mitochondrion page of the banana slug genomics wiki.

So I’m ready to put the mitochondrial genome aside and do some work on another project, right? 

Well, I thought so, but I woke up in the middle of the night with more things to do.

First I tried to figure out how to annotate the genes of the mitochondrial genome.  There does not seem to be a high-quality pipeline like RAST, which I’ve used for bacterial annotation.  There is a crude web tool called dogma, but it produced essentially unusable outputs, without even the option of saving the results as a GENBANK record.  It did suggest that I had all the standard protein genes except ATP8, which I already knew from doing BLAST searches.  (It turns out that losing ATP8 is quite common in mollusks, as well as some other clades, so losing it is no big surprise.)

I’ll probably have to write my own ORF finder to nail down the ends of the protein genes, since no one bothers with ORF finders for the 14 or so different mitochondrial genetic codes.  Most people apparently blast the genome to find the protein genes, then adjust the ends of the ORFs by hand—I might end up doing that myself, though my computer science soul cringes a bit at doing things by hand.  With only 12 or 13 proteins, though, it may not be worth the trouble of programming (unless I decide to do a population study that requires annotating lots of mitochondrial genomes).

I also ran the draft genome (with the repeat variants present but still not correctly ordered) through tRNAscan-SE to find the tRNAs.  The dogma server had found hundreds of tRNAs (the long repeat includes some tRNA genes), but tRNAscan-SE only found 7 tRNA genes.  I looked in mitochondrion genome papers and found that was a common result for mollusk mitochondria—people were using BLAST to find the tRNA genes (there are usually around 20).  I talked with Todd Lowe, the author of tRNAscan-SE, and he admitted that they had never developed good models for mitochondrial tRNAs.  Slugs and snails may be particularly difficult, as it seems there is some post-transcriptional modification to make the aminoacyl tails base pair.  I’ve put the tRNA search aside for now (though I’ll probably look for the tRNAs using BLAST), but Todd and I will look at it together later this summer, perhaps putting together some mitochondrial tRNA models to improve tRNAscan-SE.  There are lots of annotated mitchondrial genomes out there, so it should not be too hard to put together some decent covariance models of the tRNAs.

So the annotation is mostly on hold now.  Can I get some sleep or work on something else?

No, I can’t do that, because I’m still being bugged by the fact that I haven’t quite used all the data I have to try to order the repeats.  I’m now working on a new Python program to try mapping the paired-ends to the genome to try to come up with better ordering of the repeats.  BWA is not much use to me for this, as I’ve not figured out a way to get it to report multiple hits (one read is in repeat S04, and its partner is later in the genome in S02 or S08, for example).  So I’m doing crude mapping using suffix arrays and trying to figure out how to get the right balance between tolerating sequencing error and identifying subtly different repeat blocks.  I’m also trying to figure out how much I should use the quality information (if at all).  Maybe when I’ve got this program working, I’ll be able to sleep through the night and go back to working on a different project.

2011 June 21

Banana slug mitochondrial genome done (almost)

Filed under: Uncategorized — gasstationwithoutpumps @ 19:12
Tags: , , ,
Ariolimax dolichophallus at UCSC

Banana slug on UCSC campus (same species, but not same individual as the one being sequenced). Image via Wikipedia

Today I released a draft sequence for the banana slug (Ariolimax dolichophallus) mitochondrial genome on the Banana Slug Genomics wiki.  Some rough notes about the assembly are on the wiki on a page set aside for the mitochondrion and there is a link there to the fasta sequence file.

This unfunded project was a spinoff of the larger (also unfunded) project to sequence the entire genome for the banana slug.  We had one run of Illumina paired-end data, with a rather small fragment length donated by the UCSC sequencing center (it was a test or training run for a new technician or a new machine, I believe).  They have donated some other data for the project, but most has been too low coverage to be of any real use.

I have twice taught a class on assembling the banana slug genome, learning the material myself along with the grad students.  We have no where near enough data (particularly, no data from large DNA fragments) to assemble the whole genome: there are about 2Gbases in the genome and we’re getting an N50 of about 232 base pairs—less than the read length in some technologies!

The mitochondrion, however, is small (about 20k bases) and over represented in the data (probably about 175x coverage), so I thought it would be easy to pull out the mitochondrial reads and assemble them.

It was doable, but nowhere near as easy as I expected.  The tiny DNA fragments we had were not long enough to span even a single copy of a repeat in a nasty repeat region in the mitochondrial genome, and I had to write special-purpose software just to close the circle and get all the mitochondrial reads.  The closest previously sequenced mitochondria are so dissimilar that using them to select reads was not going to help.

After more than 40 attempts, I finally got a complete genome, with (I think) all the variants of the repeat sequence, but with no data to use to order the repeats.  I’ll be setting this project aside now, unless some wet-lab person volunteers to do buy some primers and do some PCR to disambiguate the repeats.

Update:

The closest sequenced mitochondrial genomes at NCBI are

NC_010220.1     Biomphalaria tenagophila mitochondrion, complete genome
>gb|EF433576.1| Biomphalaria tenagophila strain Taim-RS mitochondrion, complete genome

NC_005439.1 Biomphalaria glabrata mitochondrion, complete genome
>gb|AY380531.1| Biomphalaria glabrata strain 1742 mitochondrion, complete genome

name max score total score query coverage E-value max identity
Biomphalaria tenagophila 2679     4235     51%     0.0     71%
Biomphalaria glabrata 2562    3934     52%     0.0     69%
Next Page »

%d bloggers like this: