Gas station without pumps

2020 July 16

Updated plot for COVID-19

Filed under: Uncategorized — gasstationwithoutpumps @ 19:36
Tags: , , , ,

My previous COVID plot showed New York having reached its peak and California doing really well, but things have changed a lot for California in the past month, and even more dramatically for Southern states:

Santa Cruz has shot up in the past month, but is still doing better than much of California.

I added Santa Cruz (county) as a possible location to highlight, but I’m having to manually copy data from the Santa Cruz website, which is a pain, as they update it daily with corrections extending back a week or more. I probably should try to find where the data exists in downloadable form.  Santa Cruz is still a month or two behind California as a whole, but seems to be catching up.  We’ll probably hit a peak just as school starts.

Florida has now reached the top of the leaderboard in terms of cases/million each day, as Arizona seems to have moved past its own peak.  Louisiana is probably the only state that is seeing a second wave (rather than a delayed first wave), having brought the new cases down for quite a while.  It probably won’t be long before Arizona and Louisiana surpass New York in total cases per capita.

Bay Tree Bookstore at UCSC has come out with “Fiat Face Mask”:

It isn’t a very creative design, but it has a certain appeal to it. I’m getting one to add to my rotation of cloth masks.

2020 June 14

New plot for COVID-19

Filed under: Uncategorized — gasstationwithoutpumps @ 22:29
Tags: , , ,

It has been a while since the post I did on exponential and logistic growth models not working.  I’ve continued to scrape data from websites and plot the curves with gnuplot, but they have been very uninteresting—I was seeing almost linear growth in both US and CA curves, both for confirmed cases and for deaths.

I was getting a little bored with the manual data entry, and I did not have a good set of California data, because I had been too lazy to enter it daily.  So today I decided to waste a little time cloning the JHU github repository of data, and write a Python program to extract data from it.  This turned out to be messier than I thought, as JHU has changed the format of the files and data a couple of times,

I started by parsing the US-only files, because they seemed to be pretty clean and uniform, but they only go back 63 days (since 2020 April 12), so miss the early days of the pandemic.  I then started parsing the world-wide data files, which have a lot more rows (more than one per county for California) but fewer columns.  I needed to write routines that would merge data from multiple rows if I wanted state-wide numbers, and the format changed at least once, so that I had to recognize “San Diego County, CA” in “Province/State” as being the same state as “California” in “Province_State”.

It has also been a while since I’ve used matplotlib, so it took me some time to figure out how to do such simple things as requesting that logarithmic axes use plain numbers rather than 10^2 and 10^3.

Anyway, I think I’ve finally gotten the files parsed and been able to extract and plot some data.  I chose for my first plot just to plot the new cases/day vs total cases for each state, which I could not do with gnuplot (because it doesn’t provide an easy way to take the differences between adjacent days nor to do rolling-window averages.

I highlighted two states here: California, because that is the one I live in, and New York, because it has been hit the hardest with COVID-19.

New York has clearly peaked and has a declining new-case rate, while California is still slowly growing. I don’t think that the numbers, even with the per-capita scaling, are really comparable between California and New York, because the California fraction of tests that are positive has remained relatively small, and the new-case rate has tracked with the number of tests fairly well. I think that a lot of the growth in California has been due to increased testing and confirming a larger fraction of the cases, rather than an increase in the actual rate of new infections. (The hospitalization reports plotted by the LA Times indicate a slow decrease in California hospitalizations lately.)

2020 April 24

Exponential and logistic models aren’t working

Filed under: Uncategorized — gasstationwithoutpumps @ 00:01
Tags: , , , ,

In Exponential and logistic growth I showed how both exponential and logistic models fit the data available then (28 March), but we couldn’t really determine where the logistic function would saturate—we were early enough in the growth of the pandemic that everything still looked exponential.  I’ve been continuing to plot the data for US confirmed cases and US deaths, and I’ve added the Santa Cruz County confirmed cases as well.

The US death rate still seems to be reasonably modeled by a logistic function, but the total-case rate is not.  I’ve added a curve for the confirmed case rate in Santa Cruz County and a few points for confirmed cases in California.

I did not expect a logistic model to fit the confirmed-case data well, as we do not have a single population under uniform infection conditions (which is what gives rise to the logistic function), but instead have a sum of several different populations, each with a different infection rates, and we have changing infection rates with time, as communities impose different isolation rules. What is surprising is how well the logistic model still fits the death-rate data.

The logistic model may be failing for another reason on the confirmed-case date—saturation of testing. We can see this effect somewhat more clearly if we use a linear scale for the y axis:

The growth of confirmed cases is linear now, neither exponential nor logistic.  The attempt to fit a logistic function results in the early curve being way too high and the late curve coming down too low, in an attempt to stretch out the inflection point to match the straight-line growth.  Both California and Santa Cruz County confirmed cases seem to be growing linearly rather than logistically also—probably because of the very limited testing in California.

If we are limited by the number of tests done, then the growth of confirmed cases is limited by the number of tests, rather than the number of actual cases.

Some people have suggested looking at the fraction of tests that are positive to track the progress of the pandemic, under the assumption that testing is being rationed more or less the same in different places. (That doesn’t seem to be true—in some parts of the country it is almost impossible for someone to get the test unless they are hospital worker or about to be admitted to a hospital, while other places are testing more aggressively.)

Nate Silver of FiveThirtyEight has made an attempt to compute the positive rate for different states in Coronavirus Cases Are Still Growing In Many U.S. States.  There is a lot of difficulty doing this, as states have been even sloppier about reporting negative test numbers than positive ones, and so Nate attempted to smooth out the numbers.  He reports positive rates from 1% (Hawaii) to 54% (New Jersey), with California fairly low at 7%. Santa Cruz County is very low, with about 3.7% positive tests (114 confirmed cases in 3067 total tests).

I suspect that part of the low rate for California is that much of the testing is being spent on retesting asymptomatic health-care workers, because there aren’t enough symptomatic cases to saturate the testing capacity, while New Jersey is overwhelmed with cases and only testing the very ill.

In Death rate from COVID-19 by county, I commented that the AP map in Death rate from COVID-19 by county (same name, but different links!) looked a little strange, with empty areas right next to hot spots.  It seems that a lot of the strangeness was due to incomplete data sets.  For quite a while, almost all the counties in Virginia were blank, despite the death rate in Virginia as a whole going up, then the county data finally made it to the database used and the map was quickly filled in.  Rhode Island county-level data is still inaccurate—the state has death rates comparable to the surrounding states, despite the counties all being reported with tiny numbers.  The hot spot in southwestern Georgia doesn’t seem to be getting much press—maybe because it is a rural, black population in a state controlled by white Republicans.

California is still looking fairly good, though four counties now have death rates over 50/million, with Los Angeles the worst at 78.9/million (Santa Cruz County is still fairly low with 2 deaths or 7.3/million).  In contrast, the highest death rate is New York City at 1941/million, which exceeds that of any country (San Marino is highest at 1191/million).

2020 March 28

Exponential and logistic growth

Filed under: Uncategorized — gasstationwithoutpumps @ 16:37
Tags: , , ,

I was just thinking that the current COVID-19 crisis provides a teachable moment for showing the advantage of semilogy plots (log scale on the y axis, linear on the x axis) for showing exponential data.  So I grabbed information from one of the many sources reporting the number of cases and number of deaths due to COVID-19 [] and plotted them in different ways.

Here is what the data looks like on a linear graph:

This shows the rapid growth of the cases, but is hard to project into the future. It is also hard to see whether the growth rate is changing or how the deaths are related to the cases.

My next step was to use a log scale and to fit exponential curves to the data (actually fitting the log of the model to the log of the data, to avoid being biased by just the most recent data). I fit the data from March 2 on, since there were small-number effects before then.

The doubling time is alarmingly short for both the number of cases and for the number of deaths. Of course, exponential growth models don’t work well when projected indefinitely far into the future—at some point the infection rate predicted by the model exceeds 100%, which makes no sense. A better model is a logistic model: A \frac{2^{(t-t_c)/\lambda}}{1+2^{(t-t_c)/\lambda}}, which has three parameters:
A, the eventual fraction of the population affected
t_c, the time at which half of A is affected
\lambda, the doubling time.

I did fits of the logistic-growth model with three different assumptions about the eventual fraction affected: everyone, 10%, and 1% of the population, then fit the other parameters.

Right now, we can’t distinguish between any of these models, but by April 15 we may be able to see which curve we are on. I’ve added data points for Santa Cruz County and for California, which are both about five days behind the national curve.

I made no attempt to make projections for the deaths. That number seems to be growing at a slower rate than the total number of cases, which probably indicates that recent testing has been uncovering a larger fraction of the actual cases than earlier testing, and that the growth rate for the actual number of cases is doubling about every 3.2 days, not every 2.4 days. If that is correct, we may see a slow-down of the number of cases that does not indicate saturation of the logistic model, but just testing catching up with the backlog of cases.

2017 February 27

Understanding semilog plots

Filed under: Circuits course — gasstationwithoutpumps @ 10:22
Tags: , , ,

I realized in doing the homework grading this weekend that few of the students in my electronics course had any intuitive understanding of semilog plots (that is, plots in which one axis is linear and the other logarithmic).  I had been assuming that providing the following plot

Forward voltage as a function of current for a 1N914BTR diode.

Forward voltage as a function of current for a 1N914BTR diode.

would let students see that the voltage grows approximately with the logarithm of the current, and that means that a voltage difference corresponds to a current ratio. Very few students got that from the picture, the formulas, or the description in the text. They almost all wanted to pretend that the diode was a linear device with voltage proportional to current (i.e., that it was a resistor), so that a 6% change in current would result in a 6% change in voltage.  The whole point of using the diode was to introduce the exponential non-linearity, so this confusion definitely needs to be cleared up.

I was going to try to explain semilog plots and exponential/logarithmic relationships in class today, but my cold has gotten so bad that I had to cancel class today. That means I have another 2 days to figure out how to explain the concepts. If any of my readers can think of ways to get students to interpret semilog plots correctly, please let me know. I think that the relationships are too obvious to me for me to help students past their misunderstandings—I can’t get far enough into their mindsets to lead them out of confusion.

%d bloggers like this: