Gas station without pumps

2020 July 16

Updated plot for COVID-19

Filed under: Uncategorized — gasstationwithoutpumps @ 19:36
Tags: , , , ,

My previous COVID plot showed New York having reached its peak and California doing really well, but things have changed a lot for California in the past month, and even more dramatically for Southern states:

Santa Cruz has shot up in the past month, but is still doing better than much of California.

I added Santa Cruz (county) as a possible location to highlight, but I’m having to manually copy data from the Santa Cruz website, which is a pain, as they update it daily with corrections extending back a week or more. I probably should try to find where the data exists in downloadable form.  Santa Cruz is still a month or two behind California as a whole, but seems to be catching up.  We’ll probably hit a peak just as school starts.

Florida has now reached the top of the leaderboard in terms of cases/million each day, as Arizona seems to have moved past its own peak.  Louisiana is probably the only state that is seeing a second wave (rather than a delayed first wave), having brought the new cases down for quite a while.  It probably won’t be long before Arizona and Louisiana surpass New York in total cases per capita.

Bay Tree Bookstore at UCSC has come out with “Fiat Face Mask”:

It isn’t a very creative design, but it has a certain appeal to it. I’m getting one to add to my rotation of cloth masks.

2020 June 14

New plot for COVID-19

Filed under: Uncategorized — gasstationwithoutpumps @ 22:29
Tags: , , ,

It has been a while since the post I did on exponential and logistic growth models not working.  I’ve continued to scrape data from websites and plot the curves with gnuplot, but they have been very uninteresting—I was seeing almost linear growth in both US and CA curves, both for confirmed cases and for deaths.

I was getting a little bored with the manual data entry, and I did not have a good set of California data, because I had been too lazy to enter it daily.  So today I decided to waste a little time cloning the JHU github repository of data, and write a Python program to extract data from it.  This turned out to be messier than I thought, as JHU has changed the format of the files and data a couple of times,

I started by parsing the US-only files, because they seemed to be pretty clean and uniform, but they only go back 63 days (since 2020 April 12), so miss the early days of the pandemic.  I then started parsing the world-wide data files, which have a lot more rows (more than one per county for California) but fewer columns.  I needed to write routines that would merge data from multiple rows if I wanted state-wide numbers, and the format changed at least once, so that I had to recognize “San Diego County, CA” in “Province/State” as being the same state as “California” in “Province_State”.

It has also been a while since I’ve used matplotlib, so it took me some time to figure out how to do such simple things as requesting that logarithmic axes use plain numbers rather than 10^2 and 10^3.

Anyway, I think I’ve finally gotten the files parsed and been able to extract and plot some data.  I chose for my first plot just to plot the new cases/day vs total cases for each state, which I could not do with gnuplot (because it doesn’t provide an easy way to take the differences between adjacent days nor to do rolling-window averages.

I highlighted two states here: California, because that is the one I live in, and New York, because it has been hit the hardest with COVID-19.

New York has clearly peaked and has a declining new-case rate, while California is still slowly growing. I don’t think that the numbers, even with the per-capita scaling, are really comparable between California and New York, because the California fraction of tests that are positive has remained relatively small, and the new-case rate has tracked with the number of tests fairly well. I think that a lot of the growth in California has been due to increased testing and confirming a larger fraction of the cases, rather than an increase in the actual rate of new infections. (The hospitalization reports plotted by the LA Times indicate a slow decrease in California hospitalizations lately.)

2020 April 25

Worse than flu?

Filed under: Uncategorized — gasstationwithoutpumps @ 09:26
Tags: , ,

When people write about COVID-19, they often compare it to influenza—either to the annual flu season or to the 1918–1920 pandemic of “Spanish flu”.  There is often a underlying motive in what comparison they make—comparing to the annual flu, which is deadly but not scary, is used to push back against shelter-in-place restrictions, while comparison with the 1918–1920 pandemic is used to support increases in the restrictions. Some people have even been conjecturing that all the hand washing and social distancing would reduce normal flu deaths to the point where there would be less mortality than usual.

While we don’t know yet how bad COVID-19 is going to get, we can certainly look at how bad it has gotten so far.  Does it look more like Spanish flu or like seasonal flu?  First, we can look raw numbers.  The latest number of deaths attributed to COVID-19 in the US is 52,356, and going up at about 2,000 deaths per day, while Spanish flu killed about 675,000 in the US [] over its two-year course and seasonal flu kills 12,000–61,000 per year in the US [].  So right now, COVID-19 looks worse that the worst flu season in the past 10 years, but perhaps not as bad as Spanish flu. Even if all the precautionary measures completely eliminated flu, COVID-19 is still going to cause more deaths than the any savings we might see.

We should also be aware that the first spike of Spanish flu was not nearly as bad as the second wave, and we are still in the first spike of COVID-19, so there is plenty of time for COVID-19 to get worse.

For recent years, we can also look at death rates per week.  The New Atlantic has done a comparison of COVID-19 with recent flu seasons and with the main causes of death:

COVID-19 causes more deaths per week in the US than any other cause. The 2017–18 flu season used for comparison was the worst in the past decade. Copied from

On problem with many of the plots of deaths due to various causes is that reporting of deaths is often delayed and the cause of death recorded on the death certificate is often inaccurate. Different places are recording COVID-19 deaths differently, with some underestimating the numbers substantially by only counting deaths in hospitals of patients who had a positive test for SARS-CoV-2 virus. To correct for this sort of error, some statisticians like to look at “excess deaths”—the number of people dying per week compared to the number who normally die in that week of the year. Plotted over a few years, excess mortality plots usually point out particularly bad flu seasons or natural disasters (like hurricane Maria in Puerto Rico in 2017).

One site that plots excess mortality (for Europe) is, which has the following graphs:

The recent spike in excess deaths is clear in this plot copied from The dip at the very end is probably due to delays in getting data, rather than an actual dip in deaths.

Death rate per week tells you how bad things are at the moment, but a short spike of high death rate may not be as bad as much longer one of more moderate death rate, so it is useful to look at the cumulative excess mortality:

Cumulative excess deaths in Europe, copied from We can see that the cumulative effect of the spike so far is already as bad as a really bad flu season, and the spike is nowhere near over.

The euromomo site allows you to look at individual countries (not all are seeing that excess mortality—just the ones with bad COVID-19 infection rates). They also break the numbers into four different age ranges—the spike is occurring in adult deaths, but not child or teen deaths.

I have not yet found a recent plot of excess mortality in the US—the ones I’ve found all end in early April, before many COVID-19 deaths had occurred.

2020 April 24

Exponential and logistic models aren’t working

Filed under: Uncategorized — gasstationwithoutpumps @ 00:01
Tags: , , , ,

In Exponential and logistic growth I showed how both exponential and logistic models fit the data available then (28 March), but we couldn’t really determine where the logistic function would saturate—we were early enough in the growth of the pandemic that everything still looked exponential.  I’ve been continuing to plot the data for US confirmed cases and US deaths, and I’ve added the Santa Cruz County confirmed cases as well.

The US death rate still seems to be reasonably modeled by a logistic function, but the total-case rate is not.  I’ve added a curve for the confirmed case rate in Santa Cruz County and a few points for confirmed cases in California.

I did not expect a logistic model to fit the confirmed-case data well, as we do not have a single population under uniform infection conditions (which is what gives rise to the logistic function), but instead have a sum of several different populations, each with a different infection rates, and we have changing infection rates with time, as communities impose different isolation rules. What is surprising is how well the logistic model still fits the death-rate data.

The logistic model may be failing for another reason on the confirmed-case date—saturation of testing. We can see this effect somewhat more clearly if we use a linear scale for the y axis:

The growth of confirmed cases is linear now, neither exponential nor logistic.  The attempt to fit a logistic function results in the early curve being way too high and the late curve coming down too low, in an attempt to stretch out the inflection point to match the straight-line growth.  Both California and Santa Cruz County confirmed cases seem to be growing linearly rather than logistically also—probably because of the very limited testing in California.

If we are limited by the number of tests done, then the growth of confirmed cases is limited by the number of tests, rather than the number of actual cases.

Some people have suggested looking at the fraction of tests that are positive to track the progress of the pandemic, under the assumption that testing is being rationed more or less the same in different places. (That doesn’t seem to be true—in some parts of the country it is almost impossible for someone to get the test unless they are hospital worker or about to be admitted to a hospital, while other places are testing more aggressively.)

Nate Silver of FiveThirtyEight has made an attempt to compute the positive rate for different states in Coronavirus Cases Are Still Growing In Many U.S. States.  There is a lot of difficulty doing this, as states have been even sloppier about reporting negative test numbers than positive ones, and so Nate attempted to smooth out the numbers.  He reports positive rates from 1% (Hawaii) to 54% (New Jersey), with California fairly low at 7%. Santa Cruz County is very low, with about 3.7% positive tests (114 confirmed cases in 3067 total tests).

I suspect that part of the low rate for California is that much of the testing is being spent on retesting asymptomatic health-care workers, because there aren’t enough symptomatic cases to saturate the testing capacity, while New Jersey is overwhelmed with cases and only testing the very ill.

In Death rate from COVID-19 by county, I commented that the AP map in Death rate from COVID-19 by county (same name, but different links!) looked a little strange, with empty areas right next to hot spots.  It seems that a lot of the strangeness was due to incomplete data sets.  For quite a while, almost all the counties in Virginia were blank, despite the death rate in Virginia as a whole going up, then the county data finally made it to the database used and the map was quickly filled in.  Rhode Island county-level data is still inaccurate—the state has death rates comparable to the surrounding states, despite the counties all being reported with tiny numbers.  The hot spot in southwestern Georgia doesn’t seem to be getting much press—maybe because it is a rural, black population in a state controlled by white Republicans.

California is still looking fairly good, though four counties now have death rates over 50/million, with Los Angeles the worst at 78.9/million (Santa Cruz County is still fairly low with 2 deaths or 7.3/million).  In contrast, the highest death rate is New York City at 1941/million, which exceeds that of any country (San Marino is highest at 1191/million).

2020 April 1

Best visualizations of the COVID-19 spread

Filed under: Uncategorized — gasstationwithoutpumps @ 10:03
Tags: , ,

This is a followup on my post Exponential and logistic growth, which was intended mainly as as teaching opportunity for showing the value of log scales on graphs.

In the comments, Miguel Aznar pointed to videos from MinutePhysics ( and TomRocksMaths (, and whatisron pointed to and

The best static visualization I’ve seen is at, which allows you to choose log or linear y-axis scales, plotting confirmed case, active cases, new cases/day, deaths, or recoveries, and (most importantly) having a choice of plotting either raw numbers or normalized by population size.  The normalization by population size is important for comparing efficacy of different approaches, as the total numbers mainly tell you how big the country or state is, and now how much of an impact the COVID-19 pandemic is having.  All the plots have time on the x axis, but start the clock at different times for different countries or states, with time=0 being where the case count=20, death count=5, case rate=1/million, or death rate=1/million.  One could probably get a denser clustering of the curves by being more sophisticated about the definition of time=0, but this method has the advantage of simplicity.  Another nice feature of this visualization is that you can choose which country or state to highlight, so you can, for example highlight your own state to see how it compares with the cloud of others.

Some very interesting outliers are the countries with very slow spread (Taiwan and Japan), or initial rapid spread followed by shutting down the spread (China and South Korea).  Iraq has har fairly slow spread, but Spain and Turkey have had extremely rapid spread.  The United States has pretty much been following Italy’s curve for confirmed cases, but fortunately not for deaths (Italy is at over 200 deaths per million and has not plateaued yet).  Spain and Belgium have had the fastest growth in per-capita deaths, and Spain may still overtake Italy—both are at the point where their health-care capacities are exceeded.  Taiwan and Japan have been so successful at slowing the spread of cases that they don’t even appear on the death plots, not having reached the 1 death/million threshold that the graph makers were using.

If the US reaches 200 deaths/million (as seems likely given how we are following Italy’s curve for per-capita cases, we can expect to see 66,000 deaths in the US.  Since Italy’s death toll is still going up fairly rapidly, and there is currently no evidence that the US is doing any better than Italy at slowing the spread of COVID-19, we can probably expect 2–3 times that death toll (consistent with the optimistic scenarios from the US government).

At the state level, New York, Michigan, and New Jersey  stand out for very rapid growth of cases and of deaths (so it isn’t just more testing).  Oregon and Vermont stand out for slow growth of deaths.  California is low for confirmed cases per capita, but middle of the pack for deaths per capita, so California is probably way behind on testing.

Next Page »

%d bloggers like this: