Gas station without pumps

2018 February 24

Direct-to-consumer genome sequencing

Filed under: Uncategorized — gasstationwithoutpumps @ 10:31
Tags: , , ,

I’ve been thinking of getting my whole genome sequenced, along with my father’s and perhaps my siblings and my son’s (assuming I get consent on their part).  I’m primarily interested in seeing if I can determine what causes my Dad and me to have low resting heart rates (mine is around 50bpm).

The condition is known as bradycardia, which just means slow heart rate, and is applied to any resting heart rate under 60bpm.  It is not a dangerous condition, unless it gets extreme, in which case it can lead to passing out or falling asleep at inappropriate times. Most bradycardia is due to aging (collagen buildup in the heart or scarring from heart attacks), some is due to extreme exercise, but for a small fraction of cases it is inherited.  Since it affects me, my Dad, and some other relatives, despite very different diets and exercise levels, and I’ve had it since becoming an adult, it is almost certainly genetic, not environmental.

Treatment, if needed, is to use a pacemaker to maintain a minimum heart rate. I expect that I will need to get a pacemaker sometime in next 10–20 years, but with that support, I don’t expect any shortening of my lifespan.

My dad had to get a pacemaker installed in his 70s or 80s, but he is still going strong with it at 92.  There are debates about whether to set his minimum level at what his resting heart rate was for decades (around 50, as mine is) or around 70 (more typical for adults)—there may be some tradeoff between more alertness at the higher heart rate and better sleep at the lower heart rate.  Pacemakers have already gotten fairly sophisticated and try to distinguish between sleeping, resting, and active states.  I suspect that they will continue to get better, borrowing techniques from exercise-tracking wearables.

A survey article published last year, “Inherited bradyarrhythmia: A diverse genetic background” Journal of Arrhythmia 32 (2016) 352–358 by Taisuke Ishikawa, Yukiomi Tsuji, and Naomasa Makita, lists sixteen genes that have been associated with bradycardia, with most of the variants being autosomal dominant loss-of-function mutations.  Many of them are ion channels or calcium-handling genes.

I’d like to check my own genome to see whether I have variants in or near any of these genes known to be involved in heart-rate regulation.  There are several different sorts of genetic tests available:

  • Clinical tests for specific genetic variants (like the BRCA breast-cancer genes or Tay Sachs). These are generally old technology and provide small amounts of information.  They are often quite expensive.
  • DNA microarray panels. These look for a large set of known genetic variants, generally ones that are either fairly common or have been associated with genetic diseases.  This is what 23andme offers, as it is the cheapest technology.  Because inherited bradycardia is not common, and because it can have many different causes, there are no individual markers common enough to appear in standard SNP panels like the tests used by 23andme.
  • Transcriptome sequencing.  These sequence only the messenger RNA currently being produced for translation to proteins.  This method tells a lot about the state of the cells, but misses regulatory regions (which don’t code for the proteins) and any protein not currently being made.  The results are very different depending on what tissue is being sampled.  It is commonly done in research (including cancer/normal cell comparisons), but I don’t know any direct-to-consumer companies offering it, as the results are difficult to interpret outside the scope of specific experiments.
  • Exome sequencing.  This is targeted DNA sequencing that tries to sequence all the protein-coding portions of the genome.  It is probably the most common approach for direct-to-consumer sequencing, and is offered by several companies (including Helix, Novogene, …).  Some regulatory regions may be included in the sequencing, but most of the data is for protein-coding regions.  Helix has brought the cost of exome sequencing well below $1000, but they have a business model that makes the sequencing cheap and sells analysis apps for very high prices.  It is possible to buy the variant call data (though not the raw sequence data) from them and run standard analysis pipelines on Amazon Cloud, but you need to be a bioinformatician to figure out how to run the analyses—and interpretation is still a problem. The best price on whole-exome sequencing that includes getting all the data is probably from Dante Labs: $495
  • Whole genome sequencing.  This is currently the most expensive approach, and it tries to cover most of the genome (highly repetitive regions like the centromeres and telomeres produce data, but the reads can’t be mapped to a reference genome, because of the repetitions). It is also the only approach that can uncover novel variants in regulatory regions.  So far, the best price I’ve seen on whole genome sequencing is from Dante Labs: $695

Because I don’t know whether the variants are in the genes or in regulatory regions nearby, I’m considering getting whole-genome sequencing.  The Dante Labs website provides the most technical data of any of the direct-to-consumer sites I’ve seen:  they do 30X sequencing and return the raw data (FASTQ format), alignment to a reference genome (BAM format), and variant calls (gVCF format).  They don’t document what pipelines they use for mapping and variant calling (information needed for publication these days). They also don’t provide much interpretation of the variants, so far as I can tell from their website, just running through SNPeff, which is a reasonable first cut.  They do provide all the data in their price (many sites charge extra for you to get the data), and point to third-party websites like for interpretation.

With the gVCF file, I could do standard searches against variant databases such as dbSNP and OMIM (though I believe that SNPeff already does that), to get information about known variants, and I can also use a genome browser to look for variants that are near the genes known to be involved in bradycardia.  If I get the data for n very closely related genomes (me, my Dad, my siblings, my son, …), rather than just mine, I should be able to reduce the number of variants that are candidates by a factor of 2n (from an expected 3 million to about 190,000 for 4 genomes).  Proximity to the known cardiac pacemaker genes should reduce the candidates to around 240 with a single genome and around 25 with 4 or more genomes, even if the mutation is idiosyncratic to our family and not one of the already known variants related to bradycardia.

Note: more that 4 genomes will reduce the overall pool of candidate variants, but not the number near known genes, because variants near each other on the genome will be genetically linked—either both variants will be inherited or neither will be.  With 4 or 5 genomes, I can probably narrow down the candidates to just one of the 16 known cardiac pacemaker genes, but not to a specific variation near that gene, unless I get lucky and there is either an already known variant or there is a mutation in the coding region that would obviously disrupt function.  Of course, if I’m that lucky, I might be able to guess the relevant variant from my genome alone.

I think that my process will probably be to get my own genome sequenced and see what I can do with the data, then ask for my Dad’s genome.  After that, I can ask my siblings and my son (and perhaps my nephews and nieces) for more data, to see whether we can pin down the variant.  I find it interesting that this sort of analysis, which used to require million-dollar grants, is now accessible to citizen scientists at a price less than many spend on their hobbies.  The software has to be made more user-friendly and more easily accessible, but I think that is coming.

%d bloggers like this: