Gas station without pumps

2019 March 14

Another spit kit sent

Filed under: Uncategorized — gasstationwithoutpumps @ 00:09
Tags:

I sent in another spit kit 2019 Mar 13—this one to fullgenomes.com.  I had ordered from them shortly before Dante Labs sent me email saying that my data had been ready for months—Dante Labs had just neglected to tell me how to get the data.  Dante Labs also says that they are finally sending me the raw data I requested (not just a VCF file). Getting the raw data is good, as I can run it through the best mapping and variant-calling software pipelines, and check those calls against the ones made by the lab.

When I get the fullgenomes data, I’ll compare it with the Dante Labs data—I expect some differences, as they use different sequencing platforms (Illumina for fullgenomes, BGI for Dante Labs).

I’ve done a minimal comparison of the Dante Labs VCF file with the 23andme data—just looking at the top hits in the Promethease analysis of each.  They seem to be saying the same thing.  When I get some time, I’ll write a little script that goes through the VCF file looking for the genotype at each site in the 23andme data, to see how many discrepancies there are.

One problem with VCF files is that they only report the differences from the reference genome—there is no way to distinguish “not covered by enough reads” from “homozygous for the reference allele”.  The gVCF format attempts to fix that, but at the expense of an enormous increase in file size.

I think that there might be a use for a new format that provides terse genotype information (in a format like that used by 23andme) for every location in SNPedia (110,026 currently) or for every one in dbSNP (113,862,023 validated clusters in build 150).  Doing 114M locations in the format used by 23andme would take about 3GB, which is smaller than a gVCF file, but much larger than the variant-only VCF files (about 175MB).  The 23andme format is much less informative than the VCF file, as it just has the genotype call, with no information about how reliable the call is, so I’m not really sure it would be worthwhile to create a 3GB file in such a format.

It will be a few weeks before I do anything interesting with the genome data, as I have about 53 hours of grading still to over the next week and a half.

2019 February 28

Thirty-fourth weight progress report

Filed under: Uncategorized — gasstationwithoutpumps @ 08:04
Tags: , , , , , , , ,

This post is yet another weight progress report, continuing the previous one, part of a long series since I started in January 2015.

My weight continues to drop now that I’m teaching again and bicycling up to work every day.  I should be back in my self-imposed target range in another week, and in the middle of it by the middle of April.

My weight loss is not quite as fast or steady as the 2015 diet, but I’m making steady progress.  I’m not doing the large,raw-vegetable lunches this time, but just skipping lunch—it is much less trouble.

My exercise for February included about 4.85 miles/day of cycling, and I’m still doing 2km two–three times a week. That’s not going to get me to my goal of eventually running a marathon, but I’ve not injured myself again, and it helps keep me in shape.

I did have an occasion last week to be grateful for the running practice—I got to class on Wednesday (2019 Feb 20) and realized I had not picked up the quizzes from the print shop.  There were still eight minutes before class, so I ran from Thimann 1 to the basement of Baskin Engineering, picked up the quizzes and ran back, getting back to the lecture hall with two minutes to spare.  I was a little out of breath and sweaty, but the two minutes before lecture started were enough for my breathing to recover.

2019 February 17

Full-genome sequencing pricing

Filed under: Uncategorized — gasstationwithoutpumps @ 12:23
Tags: , ,

In the comments on Dante Labs is a scam, there has been some discussion on pricing of whole-genome sequencing.  There are a lot of companies out there with different business models, different pricing schemes, and subtly different offerings—all of which is undoubtedly confusing to consumers.  I’ve been trying to collect pricing information for the past year, and I’m still often confused by the offerings.

Consumers buy sequencing for two main purposes: to find out about their ancestry and to find out about the genetic risks to their health.

For ancestry, there is no real need for sequencing—the information from DNA microarrays (as used by companies like 23andme or ancestry.com) is more than sufficient, and those companies have big proprietary databases that allow more precise ancestry information than the public databases accessible to companies that do full sequencing.  The microarray approach is currently far cheaper than sequencing, though the difference is shrinking.

The major, well-documented risk factors for health are also covered by the DNA microarrays, but there are thousands of risk factors being discovered and published every year, and the DNA microarray tests need to redesigned and rerun on a regular basis to keep up. If whole-genome sequencing is done, almost all of the data needed for analysis is collected at once, and only analysis needs to be redone.  (This is not quite true—long-read sequencing is beginning to provide information about structural rearrangements of the genome that are not visible in the older short-read technologies, and some of these structural rearrangements are clinically significant, though usually only in cancer tumors, not in the germ line.)

For most consumers mildly interested in ancestry and genetic risks, the 23andme $200 package is all they need.  If they are just interested in ancestry, there are even cheaper options ($100 from 23andme or ancestry.com—I have no idea which is better).

My interest in my genome is to try to figure out the genetics of my inherited low heart rate.  It is not a common condition, and it seems to be beneficial rather than harmful (at any rate, my ancestors who had it were mostly long-lived), so the microarrays are not looking for variants that might be responsible.  Whole genome sequencing would give me a much larger pool of variants to examine to try to track down the cause.  To get high probability of seeing every variant, I would need 30× sequencing of my whole genome.  If I thought that the problem was in a protein-coding gene, I could get 100× exome sequencing instead.

The problem with whole-genome sequencing is that everybody has about a million variants, almost all of which are irrelevant to any specific health question.  The variants that have already been studied and well documented are not too hard to deal with, but most of them are already in the DNA microarrays, so whole-genome sequencing doesn’t offer much more on them.  Looking for a rare variant that has not been well studied is much harder—which of the millions of base changes matters?

The popular, and expensive, approach in recent genomics literature is to do genome-wide association studies (GWAS).  These take a large population of people with and without the phenotype of interest, then looks for variants that reliably separate the groups.  If there are many possible hypotheses (generally in the thousands or millions), a huge population is needed to separate out the real signal from random noise.  Many of the early GWAS papers were later shown to have bogus results, because the researchers did not have a proper appreciation of how easy it was to fool themselves.

Earlier studies focussed on families, where there is a lot of common genetic background, and each additional person in the study cuts the candidate hypothesis pool almost in half.  To narrow down from a million candidate variants to only one would take a little over 20 closely related people (assuming that the phenotype was caused by just a single variant—always a dangerous assumption).  I can probably get 4 or 5 of my relatives to participate in a study like this, but probably not 20.  I don’t think I want to pay for 20 whole-genome sequencing runs out of my own pocket anyway.

I have some hope of working with a smaller number of samples, though, as there has been an open-access paper on inherited bradycardia implicating about 16 genes.  If I have variants in those genes or their promoters, they are likely to be the interesting variants, even if no one has previously seen or studied the variants.  Of course, the size of the region means I’m likely to have about 80 variants in those regions just by chance, so I’ll still need to have some of my relatives’ genomes to narrow down the possibilities, but 8 or 9 relatives may be enough to get a solid conjecture.  (Proving that the variant is responsible would be more difficult—I’d either need a much larger cohort or someone would have to do genetic experiments in animal models.)

How expensive is the whole-genome sequencing anyway?  It can be hard to tell, as different labs offer different packages and many require more than the advertised price.

A university research lab like UC Davis will do the DNA library prep and 30× sequencing for about $1000, but not the extraction of the DNA from a spit kit or cheek swabs.  That is a fairly cheap procedure (about $50, I think), but arranging for one lab to do the extraction and ship to another lab increased the complexity of the logistics, to the point where I don’t think I’d ever get around to doing it.  Storing the sequencing results (FASTQ files), doing the mapping of the reads to a reference genome to get BAM files, and calling variants to get VCF files adds to the cost, though cloud-based systems are available that make this reasonably cheap (I think about $50 a year for storage and about $50 for the analysis).  Interpreting the VCF files can be aided by using Promethease for $12 to find relevant entries in SNPedia.

Fullgenomes.com offers packages from $545 to $2900, with an extra $250 for analysis.  The most relevant package for what I want would be the 30× sequencing package for $1295, probably without their $250 analysis, which I suspect is not much more than consumer-friendly rewrite of the results from Promethease (which can be very hard to read, so most consumers would need the rewrite).  Their pricing is a little weird, as the 15× sequencing is less than half the price of 30×, while the underlying technology should make the 30× cheaper per base.  I’ll have to check on exactly what is included in the $1295 package, as that is looking like the best deal I can find right now.

BGI advertises bulk whole-genome sequencing at low prices for researchers, but never responded to my email (from my university account) trying to get actual prices.  A lot of other companies (like Novogene) also have “request a quote” buttons.  My usual reaction to that is that if you have to ask the price, you can’t afford it.  Secret pricing is almost always ridiculously high pricing, and I prefer not to deal with companies that have secret pricing.

Dante Labs advertises very low prices, but does not deliver results—they seem to be a scam.

Veritas Genetics offers a low price ($999), but that does not include giving you back your data—they want to hang onto it and sell you additional “tests” that cost ridiculously large amounts.  I believe they will sell the VCF file (but not the BAM or FASTQ files it is based on) for an additional fee.

Most of the other companies I’ve seen have 30× whole-genome sequencing priced at over $2000, which is a little out of my price range.

 

2019 February 15

Why do I write?

Filed under: Circuits course — gasstationwithoutpumps @ 19:56
Tags: ,

O Why Do You Write? Charles French asks

I have a question for all you out  there who write, and that includes writers of books, poetry, plays, nonfiction, and blogs. If I left out any kind of writing, you are included also.

Why do you write?

I wrote my textbook Applied Analog Electronics because I was creating a course for which I could find no suitable textbook. I wanted a college-level introduction to electronics that was focused on designing things, not on applied math. I don’t have an objection to math (there is plenty in my textbook), but I wanted it to be there to solve a particular design problem, not just with sterile exercises. The central theme of the book had to be iterative engineering with design, construction, and debugging of interesting circuits, with almost everything else as support for that activity.

All I could find on the market either delayed design until the third or fourth course (which seems to be the standard approach in EE departments) or was very hand-holding—telling students exactly what to wire and leaving no electronics design to the students.

When I started the book writing, I already had a fairly thorough set of lab handouts and felt that the book would be a simple rewrite with a bit of additional material. Boy, was I wrong!

The book has taken over much of my life (when I’m not teaching the course from it or grading student work) for the past few years. I had a “finished” draft at the beginning of January, but students in my class have pointed out about 170 problems with it, and they are only halfway through the book. A lot of the problems were tiny copy-editing things (commas, spaces, spelling errors), but some were substantive. I have about 50 to-do notes accumulated for me to work on this summer.

I think that this year’s students have been motivated to find errors by the token amount I pay for each error found (25¢) and by the “leaderboard” on Piazza, where I keep track of what I owe each student. To encourage more feedback, I try to be generous in allocating the quarters—something doesn’t have to be a real mistake, if I agree that the wording can be improved or something needs to be rewritten for clarity or completeness.  Students can ask questions about something they don’t understand, and if that triggers a specific idea for a change to the book, I give credit for that also.  (Having question-triggered corrections means that even students at the bottom of the class can get credit for book corrections.)

The question of why I write on this blog is a harder one.  Sometimes I am trying to share something I learned, sometimes I’m asking for help finding a solution to a problem, sometimes I’m motivating myself by making something public (like my weight and exercise records), sometimes I’m just thinking out loud (like many of my posts about the design of my course).  I’d like to say that I blog for the social connections, but so few people respond to my posts that I can’t really pretend even to myself that I am having a conversation.

I think that a few of my posts have been valued (at least Google thinks enough of them for people to come to them with searches), so I have some incentive to keep on writing.

2019 February 1

Thirty-third weight progress report

Filed under: Uncategorized — gasstationwithoutpumps @ 20:08
Tags: , , , , , , , ,

This post is yet another weight progress report, continuing the previous one, part of a long series since I started in January 2015.

Starting teaching again, with the daily commute up the hill and skipping lunches has helped start bring my weight back down.

I still have about 10 pounds to lose to get back to the weight I want—where I was a year ago.

My exercise for January included about 4.44 miles/day of cycling, and I’m now doing 2km every other day. That’s not going to get me to my goal of eventually running a marathon, but it should keep mr from injuring myself again, while maintaining enough muscle and flexibility so that I don’t have to start all over next summer.

My speed is not great (equivalent to an 8–8:30-minute mile), but I’m not exhausted at the end of 2km.

« Previous PageNext Page »

%d bloggers like this: