Gas station without pumps

2020 April 24

Exponential and logistic models aren’t working

Filed under: Uncategorized — gasstationwithoutpumps @ 00:01
Tags: , , , ,

In Exponential and logistic growth I showed how both exponential and logistic models fit the data available then (28 March), but we couldn’t really determine where the logistic function would saturate—we were early enough in the growth of the pandemic that everything still looked exponential.  I’ve been continuing to plot the data for US confirmed cases and US deaths, and I’ve added the Santa Cruz County confirmed cases as well.

The US death rate still seems to be reasonably modeled by a logistic function, but the total-case rate is not.  I’ve added a curve for the confirmed case rate in Santa Cruz County and a few points for confirmed cases in California.

I did not expect a logistic model to fit the confirmed-case data well, as we do not have a single population under uniform infection conditions (which is what gives rise to the logistic function), but instead have a sum of several different populations, each with a different infection rates, and we have changing infection rates with time, as communities impose different isolation rules. What is surprising is how well the logistic model still fits the death-rate data.

The logistic model may be failing for another reason on the confirmed-case date—saturation of testing. We can see this effect somewhat more clearly if we use a linear scale for the y axis:

The growth of confirmed cases is linear now, neither exponential nor logistic.  The attempt to fit a logistic function results in the early curve being way too high and the late curve coming down too low, in an attempt to stretch out the inflection point to match the straight-line growth.  Both California and Santa Cruz County confirmed cases seem to be growing linearly rather than logistically also—probably because of the very limited testing in California.

If we are limited by the number of tests done, then the growth of confirmed cases is limited by the number of tests, rather than the number of actual cases.

Some people have suggested looking at the fraction of tests that are positive to track the progress of the pandemic, under the assumption that testing is being rationed more or less the same in different places. (That doesn’t seem to be true—in some parts of the country it is almost impossible for someone to get the test unless they are hospital worker or about to be admitted to a hospital, while other places are testing more aggressively.)

Nate Silver of FiveThirtyEight has made an attempt to compute the positive rate for different states in Coronavirus Cases Are Still Growing In Many U.S. States.  There is a lot of difficulty doing this, as states have been even sloppier about reporting negative test numbers than positive ones, and so Nate attempted to smooth out the numbers.  He reports positive rates from 1% (Hawaii) to 54% (New Jersey), with California fairly low at 7%. Santa Cruz County is very low, with about 3.7% positive tests (114 confirmed cases in 3067 total tests).

I suspect that part of the low rate for California is that much of the testing is being spent on retesting asymptomatic health-care workers, because there aren’t enough symptomatic cases to saturate the testing capacity, while New Jersey is overwhelmed with cases and only testing the very ill.

In Death rate from COVID-19 by county, I commented that the AP map in Death rate from COVID-19 by county (same name, but different links!) looked a little strange, with empty areas right next to hot spots.  It seems that a lot of the strangeness was due to incomplete data sets.  For quite a while, almost all the counties in Virginia were blank, despite the death rate in Virginia as a whole going up, then the county data finally made it to the database used and the map was quickly filled in.  Rhode Island county-level data is still inaccurate—the state has death rates comparable to the surrounding states, despite the counties all being reported with tiny numbers.  The hot spot in southwestern Georgia doesn’t seem to be getting much press—maybe because it is a rural, black population in a state controlled by white Republicans.

California is still looking fairly good, though four counties now have death rates over 50/million, with Los Angeles the worst at 78.9/million (Santa Cruz County is still fairly low with 2 deaths or 7.3/million).  In contrast, the highest death rate is New York City at 1941/million, which exceeds that of any country (San Marino is highest at 1191/million).

2020 April 1

Best visualizations of the COVID-19 spread

Filed under: Uncategorized — gasstationwithoutpumps @ 10:03
Tags: , ,

This is a followup on my post Exponential and logistic growth, which was intended mainly as as teaching opportunity for showing the value of log scales on graphs.

In the comments, Miguel Aznar pointed to videos from MinutePhysics ( and TomRocksMaths (, and whatisron pointed to and

The best static visualization I’ve seen is at, which allows you to choose log or linear y-axis scales, plotting confirmed case, active cases, new cases/day, deaths, or recoveries, and (most importantly) having a choice of plotting either raw numbers or normalized by population size.  The normalization by population size is important for comparing efficacy of different approaches, as the total numbers mainly tell you how big the country or state is, and now how much of an impact the COVID-19 pandemic is having.  All the plots have time on the x axis, but start the clock at different times for different countries or states, with time=0 being where the case count=20, death count=5, case rate=1/million, or death rate=1/million.  One could probably get a denser clustering of the curves by being more sophisticated about the definition of time=0, but this method has the advantage of simplicity.  Another nice feature of this visualization is that you can choose which country or state to highlight, so you can, for example highlight your own state to see how it compares with the cloud of others.

Some very interesting outliers are the countries with very slow spread (Taiwan and Japan), or initial rapid spread followed by shutting down the spread (China and South Korea).  Iraq has har fairly slow spread, but Spain and Turkey have had extremely rapid spread.  The United States has pretty much been following Italy’s curve for confirmed cases, but fortunately not for deaths (Italy is at over 200 deaths per million and has not plateaued yet).  Spain and Belgium have had the fastest growth in per-capita deaths, and Spain may still overtake Italy—both are at the point where their health-care capacities are exceeded.  Taiwan and Japan have been so successful at slowing the spread of cases that they don’t even appear on the death plots, not having reached the 1 death/million threshold that the graph makers were using.

If the US reaches 200 deaths/million (as seems likely given how we are following Italy’s curve for per-capita cases, we can expect to see 66,000 deaths in the US.  Since Italy’s death toll is still going up fairly rapidly, and there is currently no evidence that the US is doing any better than Italy at slowing the spread of COVID-19, we can probably expect 2–3 times that death toll (consistent with the optimistic scenarios from the US government).

At the state level, New York, Michigan, and New Jersey  stand out for very rapid growth of cases and of deaths (so it isn’t just more testing).  Oregon and Vermont stand out for slow growth of deaths.  California is low for confirmed cases per capita, but middle of the pack for deaths per capita, so California is probably way behind on testing.

2020 March 28

Exponential and logistic growth

Filed under: Uncategorized — gasstationwithoutpumps @ 16:37
Tags: , , ,

I was just thinking that the current COVID-19 crisis provides a teachable moment for showing the advantage of semilogy plots (log scale on the y axis, linear on the x axis) for showing exponential data.  So I grabbed information from one of the many sources reporting the number of cases and number of deaths due to COVID-19 [] and plotted them in different ways.

Here is what the data looks like on a linear graph:

This shows the rapid growth of the cases, but is hard to project into the future. It is also hard to see whether the growth rate is changing or how the deaths are related to the cases.

My next step was to use a log scale and to fit exponential curves to the data (actually fitting the log of the model to the log of the data, to avoid being biased by just the most recent data). I fit the data from March 2 on, since there were small-number effects before then.

The doubling time is alarmingly short for both the number of cases and for the number of deaths. Of course, exponential growth models don’t work well when projected indefinitely far into the future—at some point the infection rate predicted by the model exceeds 100%, which makes no sense. A better model is a logistic model: A \frac{2^{(t-t_c)/\lambda}}{1+2^{(t-t_c)/\lambda}}, which has three parameters:
A, the eventual fraction of the population affected
t_c, the time at which half of A is affected
\lambda, the doubling time.

I did fits of the logistic-growth model with three different assumptions about the eventual fraction affected: everyone, 10%, and 1% of the population, then fit the other parameters.

Right now, we can’t distinguish between any of these models, but by April 15 we may be able to see which curve we are on. I’ve added data points for Santa Cruz County and for California, which are both about five days behind the national curve.

I made no attempt to make projections for the deaths. That number seems to be growing at a slower rate than the total number of cases, which probably indicates that recent testing has been uncovering a larger fraction of the actual cases than earlier testing, and that the growth rate for the actual number of cases is doubling about every 3.2 days, not every 2.4 days. If that is correct, we may see a slow-down of the number of cases that does not indicate saturation of the logistic model, but just testing catching up with the backlog of cases.

2017 February 27

Understanding semilog plots

Filed under: Circuits course — gasstationwithoutpumps @ 10:22
Tags: , , ,

I realized in doing the homework grading this weekend that few of the students in my electronics course had any intuitive understanding of semilog plots (that is, plots in which one axis is linear and the other logarithmic).  I had been assuming that providing the following plot

Forward voltage as a function of current for a 1N914BTR diode.

Forward voltage as a function of current for a 1N914BTR diode.

would let students see that the voltage grows approximately with the logarithm of the current, and that means that a voltage difference corresponds to a current ratio. Very few students got that from the picture, the formulas, or the description in the text. They almost all wanted to pretend that the diode was a linear device with voltage proportional to current (i.e., that it was a resistor), so that a 6% change in current would result in a 6% change in voltage.  The whole point of using the diode was to introduce the exponential non-linearity, so this confusion definitely needs to be cleared up.

I was going to try to explain semilog plots and exponential/logarithmic relationships in class today, but my cold has gotten so bad that I had to cancel class today. That means I have another 2 days to figure out how to explain the concepts. If any of my readers can think of ways to get students to interpret semilog plots correctly, please let me know. I think that the relationships are too obvious to me for me to help students past their misunderstandings—I can’t get far enough into their mindsets to lead them out of confusion.

%d bloggers like this: