Today I released a draft sequence for the banana slug (Ariolimax dolichophallus) mitochondrial genome on the Banana Slug Genomics wiki. Some rough notes about the assembly are on the wiki on a page set aside for the mitochondrion and there is a link there to the fasta sequence file.
This unfunded project was a spinoff of the larger (also unfunded) project to sequence the entire genome for the banana slug. We had one run of Illumina paired-end data, with a rather small fragment length donated by the UCSC sequencing center (it was a test or training run for a new technician or a new machine, I believe). They have donated some other data for the project, but most has been too low coverage to be of any real use.
I have twice taught a class on assembling the banana slug genome, learning the material myself along with the grad students. We have no where near enough data (particularly, no data from large DNA fragments) to assemble the whole genome: there are about 2Gbases in the genome and we’re getting an N50 of about 232 base pairs—less than the read length in some technologies!
The mitochondrion, however, is small (about 20k bases) and over represented in the data (probably about 175x coverage), so I thought it would be easy to pull out the mitochondrial reads and assemble them.
It was doable, but nowhere near as easy as I expected. The tiny DNA fragments we had were not long enough to span even a single copy of a repeat in a nasty repeat region in the mitochondrial genome, and I had to write special-purpose software just to close the circle and get all the mitochondrial reads. The closest previously sequenced mitochondria are so dissimilar that using them to select reads was not going to help.
After more than 40 attempts, I finally got a complete genome, with (I think) all the variants of the repeat sequence, but with no data to use to order the repeats. I’ll be setting this project aside now, unless some wet-lab person volunteers to do buy some primers and do some PCR to disambiguate the repeats.
The closest sequenced mitochondrial genomes at NCBI are
NC_010220.1 Biomphalaria tenagophila mitochondrion, complete genome
>gb|EF433576.1| Biomphalaria tenagophila strain Taim-RS mitochondrion, complete genome
NC_005439.1 Biomphalaria glabrata mitochondrion, complete genome
>gb|AY380531.1| Biomphalaria glabrata strain 1742 mitochondrion, complete genome
|name||max score||total score||query coverage||E-value||max identity|
- Cyberslug t-shirt designs (gasstationwithoutpumps.wordpress.com)