Gas station without pumps

2019 March 31

Thirty-fifth weight progress report

Filed under: Uncategorized — gasstationwithoutpumps @ 14:21
Tags: , , , , , , , ,

This post is yet another weight progress report, continuing the previous one, part of a long series since I started in January 2015.

I have found it harder than I expected to keep my weight within my target range.

My weight continues to drop (slowly), and I’m back to the weight I was a year ago. I hope to be able to pull the weight down a little further this spring, and I really want to avoid the big summer&fall weight gains I had last year.

My exercise for March included about 3.67 miles/day of cycling, which is a drop from February—probably attributable to exam week and Spring break, during which I was not commuting to work every day, but grading from home.

I’m now doing 2.5km runs two–three times a week. That’s not going to get me to my goal of eventually running a marathon, but I’ve not injured myself again, and it helps keep me in shape so that I can try upping the distance over the summer.  I’m going to try again for a 15km run in August or September.  There seems to be about 1 pay-to-enter run a month in Santa Cruz (starting around March), so I shouldn’t have any trouble finding something to enter.


I’m maintaining about an 8:20 mile pace (3.2 m/s, 11.6 k/hr) at the current distance.


2019 March 27

Comparing 23andme and Dante Labs data

Filed under: Uncategorized — gasstationwithoutpumps @ 06:22
Tags: , , , ,

I got the grading for the last lab of winter quarter done yesterday (I took me several days longer than I expected, even allowing for an hour per paper—they took me more like 2 hours each). I have to turn in grades today, and I just found out last night that the graders had not finished grading homework 11, so I need to grade that also.

Before I found out that I have unexpectedly even more grading to do, I had taken an hour to write a short Python program to compare my data from 23andme with my data from Dante Labs. The two seem very concordant, and so I now believe I have gotten good data from Dante Labs:

23_and_me                              | vcf from Dante Labs
638531 genotype_sites                  | 3499617 genotype_sites
  chr   no-call haploid diploid matches|   chr   no-call haploid diploid mismatches
  chrM    306    3995       0      0   |   chrM      0      21       0      2   
  chr1   1177       0   48337  11756   |   chr1      0       0  267787     65   
  chr2   1121       0   50654  12303   |   chr2      0       0  287274     59   
  chr3   1013       0   42011  10184   |   chr3      0       0  240607     43   
  chr4   1006       0   38468   9719   |   chr4      0       0  261070     56   
  chr5    885       0   36147   8733   |   chr5      0       0  212325     41   
  chr6    956       0   43067   8505   |   chr6      0       0  204102     53   
  chr7    857       0   33500   8348   |   chr7      0       0  203647     44   
  chr8    626       0   31057   7690   |   chr8      0       0  183159     21   
  chr9    700       0   25746   6431   |   chr9      0       0  148637     65   
  chr10   656       0   29869   7494   |   chr10     0       0  177729     35   
  chr11   605       0   30337   7523   |   chr11     0       0  174926     28   
  chr12   677       0   28755   7366   |   chr12     0       0  165987     26   
  chr13   392       0   21688   5571   |   chr13     0       0  133928     30   
  chr14   472       0   19489   4871   |   chr14     0       0  111553     23   
  chr15   452       0   18554   4757   |   chr15     0       0  103635     16   
  chr16   504       0   19893   5019   |   chr16     0       0  107064     20   
  chr17   510       0   18891   4370   |   chr17     0       0   89744     24   
  chr18   307       0   17368   4591   |   chr18     0       0  100640     12   
  chr19   551       0   14366   3554   |   chr19     0       0   75073     29   
  chr20   295       0   14486   3603   |   chr20     0       0   73566     19   
  chr21   227       0    8380   2261   |   chr21     0       0   52060     18   
  chr22   244       0    8671   2073   |   chr22     0       0   45153     12   
  chrX   1033   14970     527   3663   |   chrX      0   74099    1801     36   
  chrY    506    3226       1    161   |   chrY      0    1393    2637      2   

  total 16078   22191  600262 150546   |   total     0   75513 3424104    779                     

Count of types of genotype
   CT   41086                          |    CT  696904
   AG   40444                          |    AG  696669
   CC  142899                          |    CC  358678
   GG  142411                          |    GG  358891
   TT  104122                          |    TT  320081
   AA  104648                          |    AA  319335
   GT    9760                          |    GT  173342
   AC    9810                          |    AC  172678
   CG     321                          |    CG  178718
   AT     215                          |    AT  148808
   C     5797                          |    C    18824
   G     5495                          |    G    19064
   T     5343                          |    T    18873
   A     5272                          |    A    18752
   --   16078                          |    --       0
   II    3245                          |    II       0
   DD    1259                          |    DD       0
   I      195                          |    I        0
   D       89                          |    D        0
   DI      42                          |    DI       0

There are only 779 sites where both 23andme and DanteLabs call a variant and disagree about what it is—a 0.5% disagreement, which is lower than I would have expected given the differences in the technology and the error rates of DNA chips. I think that 23andme is being fairly conservative and not calling many of the low-quality hybridization reads.

The biggest difference seems to be that Dante Labs does not cover the mitochondrion—the very small number of variant calls there could be mismapping of reads from homologous regions of the nuclear genome. Of course, 23andme does extremely thorough coverage of the mitochondrion, in order to get as much maternal haplotype data as feasible. If you are looking for maternal ancestry information or mitochondrial variants related to disease, the Dante Labs whole-genome sequencing is not the way to go.

The 23andme data also has a lot of coverage of the Y chromosome, in an attempt to get as much paternal haplotype information as possible, but the VCF file has few calls on the Y chromosome, and many of them are diploid calls, probably from homology to the X chromosome (the 23andme sites appear to be carefully chosen to avoid the homologous regions of the X and Y chromosomes, which may or may not be reasonable, depending on what is going on in those regions). Again, if you are mainly interested in ancestry information, the Dante Labs whole-genome sequencing is probably not the way to go.

The Dante Labs vcf file does not include deletion and insertion genotypes (the I and D codes in the 23andme data), but I think that the full data Dante Labs sent me on disk may have that information in a different VCF file. It may be a while before I have time to examine that more detailed data.

There are about 5.5 times as many SNPs in the VCF file as in the 23andme file, but only about a quarter of the 23andme sites are matched by the Dante Labs variants—the rest may be places where I am homozygous for the reference allele, which the VCF file does not report, or they may be places where Dante Labs had insufficient coverage to do a variant call. It will take a lot more work for me to analyze the Dante Labs data to figure out which is correct. The 23andme genotype data has a lot more homozygous calls than heterozygous ones, so I suspect that the bulk of the differences will be just that I am homozygous for the reference allele.

The most common SNP variants in the Dante Labs VCF file are CT (or the equivalent on the other strand AG), which is to be expected, as C⇒T conversion is common in DNA, because of C⇒U deamination and subsequent treatment of U as T in replication.

The Dante Labs data shows a lot higher proportion of CG and AT variants than the 23andme data—I don’t know how to interpret that. Perhaps when I get the fullgenomes data, which uses a different sequencing technology, I’ll be able to compare VCF files and see if there is technology effect here.

I clearly have a lot more work to do to interpret the data, but this preliminary look convinces me that I have good data from Dante Labs.

I retract my former claim that Dante Labs is a scam with apologies to them—it appears that they just had very bad delivery times and poor customer service. If they are now delivering data, they may actually be a good deal, as their prices are much lower than other whole-genome sequencing services. (Of course, it is still possible that they are only delivering data to a fraction of their customers, but I have no information about that—only that the data they eventually sent me seems to be good.)

2019 March 17

Sabbaticals until retirement revisited

Filed under: Uncategorized — gasstationwithoutpumps @ 10:58
Tags: , ,

In Sabbaticals until retirement, three years ago, I outlined a sabbatical plan for using up my sabbatical credits slowly:

year Fall Winter Spring credits left
2015–16 +1 +1 +1 20
2016–17 –6 +1 +1 16
2017–18 –6 +1 +1 12
2018–19 –6 +1 +1 8
2019–20 –6 +1 +1 4
2020–21 +1 –5 +1 1

I followed that plan through this year, but I won’t be able to continue with it, due to a misunderstanding on my part of the rules for sabbatical leave.  I can turn in n credits for n/9 salary, but only for n≥6, so the n=5 plan for 2020–21 cannot be made to work.  I found this out when I tried this year to modify the plan to

year Fall Winter Spring credits left
2019–20 –5 +1 +1 5
2020–21 –5 +1 +1 2

Because I can’t take 5/9 salary, I am going to switch to taking a leave without pay this fall, and then full-salary sabbatical in 2020:

year Fall Winter Spring credits left
2019–20 -0 +1 +1 10
2020–21 –9 +1 +1 3

I’ve decided that I need the break from grading more than I need the money—if I taught all three quarters next year with the number of hours per week I’ve been putting in this year, I’d burn out and retire a year earlier, which would cost me more.

The new plan will cost me about $5000 in extra insurance premiums (the University pays a share for medical, dental, and vision care insurance for sabbatical leave, but not leave without pay) in addition to losing a sabbatical-leave credit (worth about $5000 before taxes, or $3500 after taxes). Doing the leave without pay this fall allows me to take full salary for Fall 2020 sabbatical, using one more sabbatical-leave credit than if I took 8/9 pay this Fall.  If I had known about the 6/9 minimum earlier, I would have revised the plan for Fall 2018 to take 7/9 pay, rather than 6/9.

I can’t contribute to my HSA (Health Savings Account) while on leave without pay, so I need to change my contributions for the months that I will not be on leave.  The insurance premiums for the health care do count as allowable expenses for the HSA.

The Sabbaticals until retirement post also discussed the possibility of doing a “service buy-back” to buy service credit on my retirement for the foregone salary.  At the time it looked like a good investment, but the paperwork involved was daunting (I thought I had done it all and sent it in, but all that triggered was them sending me the paperwork to do all over again).  I’ll have to decide again on the service buyback this spring or early summer, since there is a 3-year limit on doing the buyback at a reasonable rate—after that they charge so much that it is clearly not a good investment.   The buyback I could do this year would get me 1/3 year extra service credit, which would increase my retirement salary by 0.83% of my HAPC (highest average plan compensation—essentially my annual salary at full time). I can use annuity calculators to figure out about how much that is worth and compare it to what the University would charge me.

2019 March 14

Another spit kit sent

Filed under: Uncategorized — gasstationwithoutpumps @ 00:09

I sent in another spit kit 2019 Mar 13—this one to  I had ordered from them shortly before Dante Labs sent me email saying that my data had been ready for months—Dante Labs had just neglected to tell me how to get the data.  Dante Labs also says that they are finally sending me the raw data I requested (not just a VCF file). Getting the raw data is good, as I can run it through the best mapping and variant-calling software pipelines, and check those calls against the ones made by the lab.

When I get the fullgenomes data, I’ll compare it with the Dante Labs data—I expect some differences, as they use different sequencing platforms (Illumina for fullgenomes, BGI for Dante Labs).

I’ve done a minimal comparison of the Dante Labs VCF file with the 23andme data—just looking at the top hits in the Promethease analysis of each.  They seem to be saying the same thing.  When I get some time, I’ll write a little script that goes through the VCF file looking for the genotype at each site in the 23andme data, to see how many discrepancies there are.

One problem with VCF files is that they only report the differences from the reference genome—there is no way to distinguish “not covered by enough reads” from “homozygous for the reference allele”.  The gVCF format attempts to fix that, but at the expense of an enormous increase in file size.

I think that there might be a use for a new format that provides terse genotype information (in a format like that used by 23andme) for every location in SNPedia (110,026 currently) or for every one in dbSNP (113,862,023 validated clusters in build 150).  Doing 114M locations in the format used by 23andme would take about 3GB, which is smaller than a gVCF file, but much larger than the variant-only VCF files (about 175MB).  The 23andme format is much less informative than the VCF file, as it just has the genotype call, with no information about how reliable the call is, so I’m not really sure it would be worthwhile to create a 3GB file in such a format.

It will be a few weeks before I do anything interesting with the genome data, as I have about 53 hours of grading still to over the next week and a half.

%d bloggers like this: