Gas station without pumps

2020 April 24

Exponential and logistic models aren’t working

Filed under: Uncategorized — gasstationwithoutpumps @ 00:01
Tags: , , , ,

In Exponential and logistic growth I showed how both exponential and logistic models fit the data available then (28 March), but we couldn’t really determine where the logistic function would saturate—we were early enough in the growth of the pandemic that everything still looked exponential.  I’ve been continuing to plot the data for US confirmed cases and US deaths, and I’ve added the Santa Cruz County confirmed cases as well.

The US death rate still seems to be reasonably modeled by a logistic function, but the total-case rate is not.  I’ve added a curve for the confirmed case rate in Santa Cruz County and a few points for confirmed cases in California.

I did not expect a logistic model to fit the confirmed-case data well, as we do not have a single population under uniform infection conditions (which is what gives rise to the logistic function), but instead have a sum of several different populations, each with a different infection rates, and we have changing infection rates with time, as communities impose different isolation rules. What is surprising is how well the logistic model still fits the death-rate data.

The logistic model may be failing for another reason on the confirmed-case date—saturation of testing. We can see this effect somewhat more clearly if we use a linear scale for the y axis:

The growth of confirmed cases is linear now, neither exponential nor logistic.  The attempt to fit a logistic function results in the early curve being way too high and the late curve coming down too low, in an attempt to stretch out the inflection point to match the straight-line growth.  Both California and Santa Cruz County confirmed cases seem to be growing linearly rather than logistically also—probably because of the very limited testing in California.

If we are limited by the number of tests done, then the growth of confirmed cases is limited by the number of tests, rather than the number of actual cases.

Some people have suggested looking at the fraction of tests that are positive to track the progress of the pandemic, under the assumption that testing is being rationed more or less the same in different places. (That doesn’t seem to be true—in some parts of the country it is almost impossible for someone to get the test unless they are hospital worker or about to be admitted to a hospital, while other places are testing more aggressively.)

Nate Silver of FiveThirtyEight has made an attempt to compute the positive rate for different states in Coronavirus Cases Are Still Growing In Many U.S. States.  There is a lot of difficulty doing this, as states have been even sloppier about reporting negative test numbers than positive ones, and so Nate attempted to smooth out the numbers.  He reports positive rates from 1% (Hawaii) to 54% (New Jersey), with California fairly low at 7%. Santa Cruz County is very low, with about 3.7% positive tests (114 confirmed cases in 3067 total tests).

I suspect that part of the low rate for California is that much of the testing is being spent on retesting asymptomatic health-care workers, because there aren’t enough symptomatic cases to saturate the testing capacity, while New Jersey is overwhelmed with cases and only testing the very ill.

In Death rate from COVID-19 by county, I commented that the AP map in Death rate from COVID-19 by county (same name, but different links!) looked a little strange, with empty areas right next to hot spots.  It seems that a lot of the strangeness was due to incomplete data sets.  For quite a while, almost all the counties in Virginia were blank, despite the death rate in Virginia as a whole going up, then the county data finally made it to the database used and the map was quickly filled in.  Rhode Island county-level data is still inaccurate—the state has death rates comparable to the surrounding states, despite the counties all being reported with tiny numbers.  The hot spot in southwestern Georgia doesn’t seem to be getting much press—maybe because it is a rural, black population in a state controlled by white Republicans.

California is still looking fairly good, though four counties now have death rates over 50/million, with Los Angeles the worst at 78.9/million (Santa Cruz County is still fairly low with 2 deaths or 7.3/million).  In contrast, the highest death rate is New York City at 1941/million, which exceeds that of any country (San Marino is highest at 1191/million).

5 Comments »

  1. There are so many unknowns in the testing results. I liked the approach of using total deaths and comparing to last year. I didn’t manage to find this data myself but NYT did a good job digging up numbers https://www.nytimes.com/interactive/2020/04/21/world/coronavirus-missing-deaths.html

    Still those numbers should be underestimates of the real covid19 deaths since we’re all getting in fewer car accidents.

    Comment by Rfon — 2020 April 24 @ 07:44 | Reply

    • Yes, I like the “excess deaths” method also, though it is scarier (most places have been reporting much smaller number than the estimate from excess deaths). You are right that it is still an underestimate, not just because of reduced car crashes, but also because social distancing has reduced flu deaths to extremely low levels (the smart-thermometer people showed flu-like illnesses dropping to almost zero, from the usual 2% at this time of year). Transmission of flu seems to be easier to disrupt than transmission of COVID-19.

      Comment by gasstationwithoutpumps — 2020 April 24 @ 08:39 | Reply

  2. I asked Dr. David Ghilarducci during his Santa Cruz Local presentation if the Santa Cruz “total case” numbers were really reporting more about “numbers of tests” than “numbers of cases” and he seemed to confirm this. But testing is shifting. So an increase in test availability, which would be a good thing, would potentially be reflected by yielding a corresponding increase in daily new cases. An increasing % of people infected would also yield increased new daily cases if the testing otherwise was the same. But a shift towards repeat-testing healthcare workers could appear to lower “new cases”. Essentially, I think it means that while the daily “new cases” number is what SC Pub Health is reporting, and people are probably most closely watching — it may not be super informative without knowing about testing capacity changes and test application changes and other factors.

    Comment by whatisron — 2020 April 24 @ 08:16 | Reply

    • Both the death counts and the case counts are probably huge underestimates of the actual spread of the disease, but the death counts are probably not as far off.

      I’m curious how much the tests are costing, given that anything medical in the US has greatly inflated prices.

      Comment by gasstationwithoutpumps — 2020 April 24 @ 08:43 | Reply

  3. […] has been a while since the post I did on exponential and logistic growth models not working.  I’ve continued to scrape data from websites and plot the curves with gnuplot, but they […]

    Pingback by New plot for COVID-19 | Gas station without pumps — 2020 June 14 @ 22:29 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: