Gas station without pumps

2020 October 10

Two maps

Filed under: Uncategorized — gasstationwithoutpumps @ 08:34
Tags: , ,

These maps seem to be red in many of the same places. Coincidence or causal connection?

2020 April 24

Exponential and logistic models aren’t working

Filed under: Uncategorized — gasstationwithoutpumps @ 00:01
Tags: , , , ,

In Exponential and logistic growth I showed how both exponential and logistic models fit the data available then (28 March), but we couldn’t really determine where the logistic function would saturate—we were early enough in the growth of the pandemic that everything still looked exponential.  I’ve been continuing to plot the data for US confirmed cases and US deaths, and I’ve added the Santa Cruz County confirmed cases as well.

The US death rate still seems to be reasonably modeled by a logistic function, but the total-case rate is not.  I’ve added a curve for the confirmed case rate in Santa Cruz County and a few points for confirmed cases in California.

I did not expect a logistic model to fit the confirmed-case data well, as we do not have a single population under uniform infection conditions (which is what gives rise to the logistic function), but instead have a sum of several different populations, each with a different infection rates, and we have changing infection rates with time, as communities impose different isolation rules. What is surprising is how well the logistic model still fits the death-rate data.

The logistic model may be failing for another reason on the confirmed-case date—saturation of testing. We can see this effect somewhat more clearly if we use a linear scale for the y axis:

The growth of confirmed cases is linear now, neither exponential nor logistic.  The attempt to fit a logistic function results in the early curve being way too high and the late curve coming down too low, in an attempt to stretch out the inflection point to match the straight-line growth.  Both California and Santa Cruz County confirmed cases seem to be growing linearly rather than logistically also—probably because of the very limited testing in California.

If we are limited by the number of tests done, then the growth of confirmed cases is limited by the number of tests, rather than the number of actual cases.

Some people have suggested looking at the fraction of tests that are positive to track the progress of the pandemic, under the assumption that testing is being rationed more or less the same in different places. (That doesn’t seem to be true—in some parts of the country it is almost impossible for someone to get the test unless they are hospital worker or about to be admitted to a hospital, while other places are testing more aggressively.)

Nate Silver of FiveThirtyEight has made an attempt to compute the positive rate for different states in Coronavirus Cases Are Still Growing In Many U.S. States.  There is a lot of difficulty doing this, as states have been even sloppier about reporting negative test numbers than positive ones, and so Nate attempted to smooth out the numbers.  He reports positive rates from 1% (Hawaii) to 54% (New Jersey), with California fairly low at 7%. Santa Cruz County is very low, with about 3.7% positive tests (114 confirmed cases in 3067 total tests).

I suspect that part of the low rate for California is that much of the testing is being spent on retesting asymptomatic health-care workers, because there aren’t enough symptomatic cases to saturate the testing capacity, while New Jersey is overwhelmed with cases and only testing the very ill.

In Death rate from COVID-19 by county, I commented that the AP map in Death rate from COVID-19 by county (same name, but different links!) looked a little strange, with empty areas right next to hot spots.  It seems that a lot of the strangeness was due to incomplete data sets.  For quite a while, almost all the counties in Virginia were blank, despite the death rate in Virginia as a whole going up, then the county data finally made it to the database used and the map was quickly filled in.  Rhode Island county-level data is still inaccurate—the state has death rates comparable to the surrounding states, despite the counties all being reported with tiny numbers.  The hot spot in southwestern Georgia doesn’t seem to be getting much press—maybe because it is a rural, black population in a state controlled by white Republicans.

California is still looking fairly good, though four counties now have death rates over 50/million, with Los Angeles the worst at 78.9/million (Santa Cruz County is still fairly low with 2 deaths or 7.3/million).  In contrast, the highest death rate is New York City at 1941/million, which exceeds that of any country (San Marino is highest at 1191/million).

2020 April 16

Death rate from COVID-19 by county

Filed under: Uncategorized — gasstationwithoutpumps @ 20:37
Tags: , ,

The Associated Press has finally released a more interesting map of the COVID-19 distribution in the US: Death rate from COVID-19 by county, which shows deaths/100,000. Previously they were doing a map by raw numbers, which showed hot spots wherever the population was high, but did not distinguish between true hot spots, like New York City, and relatively mild outbreaks, like Los Angeles. This map is also based on death counts, which are more uniformly counted than confirmed cases (though still subject to varying definitions).

The high death rates in New York, New Jersey, Louisiana, Michigan, and Massachusetts have been in the news a lot, but I’ve not seen much about the hot spot in southwestern Georgia.  Also, the geographic distribution is a little strange—some areas have very high death rates, while neighboring ones are still near zero (contrast Rhode Island with its neighbors New York and Massachusetts).

This new map has some problems of its own—some “hot spots” are due to small-sample effects in unpopulated areas, like Oldham County, TX, with one death giving a rate of 47.85/100,000 and the highest death rate in California being Mono County, where one death gives 7.06/100,000.  The highest death rate in California that isn’t just a small-sample artifact is Los Angeles County, at 4.53 deaths/100,000, which is way less than New York City’s 136/100,000, which exceeds the death rate of any country (even tiny San Marino, the country with the world’s highest COVID-19 death rate, has only 113.2/100,000).

Despite the problems with small samples for lightly populated counties, this map gives a much more revealing picture of where there are serious problems with COVID-19 than other maps I have seen.

2011 July 4

Interactive Maps

Filed under: Uncategorized — gasstationwithoutpumps @ 15:45
Tags: , , , , ,

In recent months, I’ve noticed a growing trend for newspapers, government agencies, and think tanks to present data on interactive maps,which are a fun way to explore certain types of data.  Here are three recent examples:

An interactive map can be a very handy way to look at something that varies from place to place, but can be highly misleading, if the variation is due to some cause other than location.

For example, the broadband availability map claims to be about broadband availability for schools. I searched for Santa Cruz County and got a pretty map:

Broadband map from data.ed.gov showing Santa Cruz County.

Unfortunately you have to drill down to the individual schools, clicking on each one, to find out that there is no data whatsoever about the connectivity of the schools. They have taken a map of what some (unspecified) vendors have claimed is available in the community and overlain it with the locations of the schools. The implied statement about the availability of broadband at the schools is pure conjecture.
This sort of interactive map, which seems to be providing one sort of data, while actually providing something else, is such bad data display as to border on fraud. An academic researcher who tried to publish something like that using a government grant would probably get investigated for scientific fraud (but the government’s own publications, produced at much higher cost, are not subject to any sort of review).

The New York Times has some strange things on their interactive map also. Because they treat all data in units of a county, the resolution of the map varies enormously. At 20,105 square miles, San Bernadino County has more area than 9 states (Rhode Island, Delaware, Connecticut, New Jersey, New Hampshire, Vermont, Massachusetts, Hawaii, and Maryland).

So the resolution of the map in California is terrible—does it matter?  The map makers intended for people to infer correlation (about metropolitan or rural unemployment, for example) from the map, and so provided selection for metropolitan areas. Note that the whole central valley, which is primarily agriculture, has been listed as a “metropolitan area”, because the total population of the large counties exceeds the NY Times cutoff—even Shasta County has been considered a metropolitan area.  This map is essentially useless for separating rural from metropolitan areas, and so the main correlation that the map-maker wanted us to see is essentially not possible (at least not in the large-county states—it may work in New York State, which is probably all that the map-maker thought about).

Map of unemployment in "metropolitan areas" in California, from the NYTimes interactive map.

Maps are useful for seeing geographic patterns, which sometimes lead to conjectures about causation. For example, combining the NY Time unemployment map with the Brookings Institute map of population age distributions leads to the question “Is unemployment low in the Great Plains, because everyone of working age has moved out?”

One of several images availabe from the Brookings Instiute interactive map.

Maps may be good for coming up with conjectures, but they are no substitute for scatter diagrams or histograms for following up on the conjecture.  Unfortunately, with some data at the state level, some at the county level, some at the Congressional District level, and some at the census tract level, it can be very difficult to put together convincing arguments from disparate data sources.

Luckily, as a blogger, I’m not responsible for creating such data analyses.

2011 February 8

World Family Names

Filed under: Uncategorized — gasstationwithoutpumps @ 06:48
Tags: , ,

There is a moderately interesting graphical presentation of geographical distribution of family names at World Family Names. The presentation is with a Flash viewer, so difficult to capture other than as screen shots

Map of Austria with high incidence of family name

Here is a screenshot of the map of Austria with my family name showing high frequency in a surprising region.

 

 

Interestingly, it gives my rather rare family name a high incidence in Feldkirchen in Kärnten, Hermagor, Austria.  I’m not aware of any relatives there, so either someone has moved or there is a lost branch of the family (which seems unlikely given how complete the family tree is).  I suppose that the data cannot be expected to be very accurate for rare names.

Next Page »

%d bloggers like this: