Gas station without pumps

2019 February 17

Full-genome sequencing pricing

Filed under: Uncategorized — gasstationwithoutpumps @ 12:23
Tags: , ,

In the comments on Dante Labs is a scam, there has been some discussion on pricing of whole-genome sequencing.  There are a lot of companies out there with different business models, different pricing schemes, and subtly different offerings—all of which is undoubtedly confusing to consumers.  I’ve been trying to collect pricing information for the past year, and I’m still often confused by the offerings.

Consumers buy sequencing for two main purposes: to find out about their ancestry and to find out about the genetic risks to their health.

For ancestry, there is no real need for sequencing—the information from DNA microarrays (as used by companies like 23andme or is more than sufficient, and those companies have big proprietary databases that allow more precise ancestry information than the public databases accessible to companies that do full sequencing.  The microarray approach is currently far cheaper than sequencing, though the difference is shrinking.

The major, well-documented risk factors for health are also covered by the DNA microarrays, but there are thousands of risk factors being discovered and published every year, and the DNA microarray tests need to redesigned and rerun on a regular basis to keep up. If whole-genome sequencing is done, almost all of the data needed for analysis is collected at once, and only analysis needs to be redone.  (This is not quite true—long-read sequencing is beginning to provide information about structural rearrangements of the genome that are not visible in the older short-read technologies, and some of these structural rearrangements are clinically significant, though usually only in cancer tumors, not in the germ line.)

For most consumers mildly interested in ancestry and genetic risks, the 23andme $200 package is all they need.  If they are just interested in ancestry, there are even cheaper options ($100 from 23andme or—I have no idea which is better).

My interest in my genome is to try to figure out the genetics of my inherited low heart rate.  It is not a common condition, and it seems to be beneficial rather than harmful (at any rate, my ancestors who had it were mostly long-lived), so the microarrays are not looking for variants that might be responsible.  Whole genome sequencing would give me a much larger pool of variants to examine to try to track down the cause.  To get high probability of seeing every variant, I would need 30× sequencing of my whole genome.  If I thought that the problem was in a protein-coding gene, I could get 100× exome sequencing instead.

The problem with whole-genome sequencing is that everybody has about a million variants, almost all of which are irrelevant to any specific health question.  The variants that have already been studied and well documented are not too hard to deal with, but most of them are already in the DNA microarrays, so whole-genome sequencing doesn’t offer much more on them.  Looking for a rare variant that has not been well studied is much harder—which of the millions of base changes matters?

The popular, and expensive, approach in recent genomics literature is to do genome-wide association studies (GWAS).  These take a large population of people with and without the phenotype of interest, then looks for variants that reliably separate the groups.  If there are many possible hypotheses (generally in the thousands or millions), a huge population is needed to separate out the real signal from random noise.  Many of the early GWAS papers were later shown to have bogus results, because the researchers did not have a proper appreciation of how easy it was to fool themselves.

Earlier studies focussed on families, where there is a lot of common genetic background, and each additional person in the study cuts the candidate hypothesis pool almost in half.  To narrow down from a million candidate variants to only one would take a little over 20 closely related people (assuming that the phenotype was caused by just a single variant—always a dangerous assumption).  I can probably get 4 or 5 of my relatives to participate in a study like this, but probably not 20.  I don’t think I want to pay for 20 whole-genome sequencing runs out of my own pocket anyway.

I have some hope of working with a smaller number of samples, though, as there has been an open-access paper on inherited bradycardia implicating about 16 genes.  If I have variants in those genes or their promoters, they are likely to be the interesting variants, even if no one has previously seen or studied the variants.  Of course, the size of the region means I’m likely to have about 80 variants in those regions just by chance, so I’ll still need to have some of my relatives’ genomes to narrow down the possibilities, but 8 or 9 relatives may be enough to get a solid conjecture.  (Proving that the variant is responsible would be more difficult—I’d either need a much larger cohort or someone would have to do genetic experiments in animal models.)

How expensive is the whole-genome sequencing anyway?  It can be hard to tell, as different labs offer different packages and many require more than the advertised price.

A university research lab like UC Davis will do the DNA library prep and 30× sequencing for about $1000, but not the extraction of the DNA from a spit kit or cheek swabs.  That is a fairly cheap procedure (about $50, I think), but arranging for one lab to do the extraction and ship to another lab increased the complexity of the logistics, to the point where I don’t think I’d ever get around to doing it.  Storing the sequencing results (FASTQ files), doing the mapping of the reads to a reference genome to get BAM files, and calling variants to get VCF files adds to the cost, though cloud-based systems are available that make this reasonably cheap (I think about $50 a year for storage and about $50 for the analysis).  Interpreting the VCF files can be aided by using Promethease for $12 to find relevant entries in SNPedia. offers packages from $545 to $2900, with an extra $250 for analysis.  The most relevant package for what I want would be the 30× sequencing package for $1295, probably without their $250 analysis, which I suspect is not much more than consumer-friendly rewrite of the results from Promethease (which can be very hard to read, so most consumers would need the rewrite).  Their pricing is a little weird, as the 15× sequencing is less than half the price of 30×, while the underlying technology should make the 30× cheaper per base.  I’ll have to check on exactly what is included in the $1295 package, as that is looking like the best deal I can find right now.

BGI advertises bulk whole-genome sequencing at low prices for researchers, but never responded to my email (from my university account) trying to get actual prices.  A lot of other companies (like Novogene) also have “request a quote” buttons.  My usual reaction to that is that if you have to ask the price, you can’t afford it.  Secret pricing is almost always ridiculously high pricing, and I prefer not to deal with companies that have secret pricing.

Dante Labs advertises very low prices, but does not deliver results—they seem to be a scam.

Veritas Genetics offers a low price ($999), but that does not include giving you back your data—they want to hang onto it and sell you additional “tests” that cost ridiculously large amounts.  I believe they will sell the VCF file (but not the BAM or FASTQ files it is based on) for an additional fee.

Most of the other companies I’ve seen have 30× whole-genome sequencing priced at over $2000, which is a little out of my price range.


2014 September 30

Ebola genome browser

Filed under: Uncategorized — gasstationwithoutpumps @ 21:00
Tags: , , , , ,

For the past week, I’ve been watching the genome browser team (led by Jim Kent) scramble to get together an information resource to aid in the fight against the Ebola virus.  They went public today:

We are excited to announce the release of a Genome Browser and information portal for the Jun. 2014 assembly of the Ebola virus (UCSC version eboVir3, GenBank accession KM034562) submitted by the Broad Institute. We have worked closely with the Pardis Sabeti lab at the Broad Institute and other Ebola experts throughout the world to incorporate annotations that will be useful to those studying Ebola. Annotation tracks included in this initial release include genes from NCBI, B- and T-cell epitopes from the IEDB, structural annotations from UniProt and a wealth of SNP data from the 2014 publication by the Sabeti lab. This initial release also contains a 160-way alignment comprising 158 Ebola virus sequences from various African outbreaks and 2 Marburg virus sequences. You can find links to the Ebola virus Genome Browser and more information on the Ebola virus itself on our Ebola Portal page.

Bulk downloads of the sequence and annotation data are available via the Genome Browser FTP server or the Downloads page. The Ebola virus (eboVir3) browser annotation tracks were generated by UCSC and collaborators worldwide. See the Credits page for a detailed list of the organizations and individuals who contributed to this release and the conditions for use of these data.

Matthew Speir
UCSC Genome Bioinformatics Group

2012 June 21

Crowdfunding genome project

Filed under: Uncategorized — gasstationwithoutpumps @ 20:37
Tags: , , , ,

Manuel Corpas is trying to get the genome of 5 members of his family sequenced, so that he can release the data for public analysis and development of genome analysis tools.

Crowdfunding Genome Project] Day 2: BGI Officially Agrees Sequencing « Manuel Corpas’ Blog.

Donations Sought For Whole Genome Sequencing: 40 Days To Go!

He previously released the genotyping of the same 5 members of his family, so you know that he is serious about doing a public release of the data.

2011 June 20

Human mutation rates

Filed under: Uncategorized — gasstationwithoutpumps @ 10:02
Tags: , , , ,

I just finished reading an article on human mutation rates:

Variation in genome-wide mutation rates within and between human families by Donald F Conrad, Jonathan E M Keebler, Mark A DePristo, Sarah J Lindsay, Yujun Zhang, Ferran Casals, Youssef Idaghdour, Chris L Hartl, Carlos Torroja, Kiran V Garimella, Martine Zilversmit, Reed Cartwright, Guy A RouleauMark Daly, Eric A Stone, Matthew E Hurles,& Philip Awadalla for the 1000 Genomes Project
Nature Genetics
(2011) Published online 12 June 2011

The article computes mutation rates for two triples (father, mother, and child) who have been thoroughly re-sequenced as part of the 1000 genomes project.  For each triple, they identify possible sites of de novo mutations (appearing in the child but not inherited from either parent) using different methods, then re-examine each of the possible candidates with further sequencing, to try to separate out germ-line (inheritable) mutations from somatic (in the body) or cell-culture mutations.

They found that few of the observed de novo mutations in the sequencing were actually germ-line mutations (only about one in 20).  The final mutation rates they get were about 1e-8 (one change in 108 bases).  This rate is comparable with sex-averaged rates from other more population-based estimates, but at the low end.  They point out that mutation rates may vary between individuals (based on age and environmental conditions), and that a few high-mutation-rate individuals may  make the mean rate over many generations higher than the most frequently observed rate at the current time, so both the 1e-8 rate and the highest estimates (4e-8 for paternal mutations estimated from species-divergence from chimps) may still be consistent.  Other possible explanations for the wide spread are given—for example, that the divergence from chimp may be further back in time than the current best estimates.

If we take the 1e-8 error rate as typical, we would expect to see about 60 de novo mutations in each individual (remember that the 3Gbase human genome size is the haploid size, but humans are diploid, so we inherit about 6Gbases from our parents).  The variation from person to person could be quite wide though, even if there were no environmental factors affecting the mutation rate—a Poisson process has a standard deviation of the square root of the mean, so  mean 60 implies a standard deviation of about 8.

One surprising result they got was that for one of the triples, the paternal mutation rate was lower than the maternal one (most estimates have the paternal mutation rate around 4 times the maternal rate, attributed to higher numbers of replications of DNA in the male germ line).  The ages of the parents at conception was not recorded for either triple, but age almost certainly plays a major role in mutation rate. The 4 estimates of mutation rate they got (2 maternal and 2 paternal) had about an 8-to-1 range (much wider than the error bars on the individual estimates), so clearly many more triples need to be examined to get a broader picture of maternal and paternal mutation rates in the population as a whole.  It would be good to have triples in which the ages of the parents are recorded, and to have further generations sequenced to make germline/non-germline mutations easier to separate.

Estimated mutation rates, with previously published estimates above the green line, and new ones below it. Figure copied from the article.

2011 February 7

World’s largest genome?

Filed under: Uncategorized — gasstationwithoutpumps @ 11:20
Tags: , ,

The Scientist has just published a news article, Jaume and the Giant Genome, which announces the existence of a 150 gigabase genome.  Not surprisingly, this huge genome is in a plant, “the Japanese canopy plant (Paris japonica), native to the mountains surrounding Nagano.” The genome size was measured at Kew Gardens, which has a huge repository of diverse plants (perhaps 12% of the known plant species).

I doubt that anyone will be rushing to sequence such a huge genome (50 times larger than the human genome).  I wonder how long it will be before someone discovers a still longer plant genome.

Next Page »

%d bloggers like this: