How to map the covid-19 outbreak
/By comparison to yesterday’s map from the CDC, here’s a map with granularity that the CDC has the data to produce to inform the public meaningfully. It was published on the US-news website of The Guardian newspaper (London):
The states are important to show, because of the importance of state governments in treating and limiting the pandemic. At the same time, the foci of recorded cases are shown by simple graduated circles that give a clear sense of where within a state are infected people.
[update 16 March] The quality of the data about infection has not been very good in the US, because of the lack of testing equipment and policy decisions. And now, as the virus spreads through communities, cases are likely outstripping the ability to test and therefore to record. As people are encouraged to stay home and self-isolate if they express symptoms, only the really sick are being tested as they go to hospitals. In this respect, it makes sense to no longer map the outbreak in as granular a manner as The Guardian did last week. Here’s the map from this morning:
Now don’t everyone rush off to West Virginia!
Also, The Guardian has a set of maps of the outbreak by region and then a “repeated multiple” of maps to show the spread. It’s now almost global;
[update 20 March 2020] Today’s maps indicate why the Guardian’s approach to mapping the incidence of covid-19 infections in the US is better than that of the CDC. Here is the current CDC map, with data through 19 March:
Notice two things. First, in order to maintain the same number of categories (8, actually one or two more than the preferred 5–7 to permit ready discrimination by viewers) as the magnitude of the values has increased, the CDC has had to change the data range for each category. This is sort-of ok, in that it gives a broad idea of the incidence of the virus, but it precludes comparing each day’s maps to develop an analysis over time, except for the apparent severity of the infection in each state compared to the others. But, again, by using the choropleth to show absolute values, this map does not really show actual severity (bearing in mind, of course, that the quality of the data set is still not great).
Second, the category ranges have clearly been chosen to emphasize variation in the lower end of the data. The lowest category has a range of 4 cases (1–5), the highest 3999 (1,001–5000). Overall, the classification scheme is geometric so that the ranges of the categories increase exponentially, but inconsistently so, and with variable number of values in each category: 4 (n=3), 44 (n=26), 49 (n=12), 99 (n=3), 299 (n=7), 499 (n=1), 3999 (n=2). In principle, a geometrically increasing classification scheme emphasizes differences within a data set that is heavily skewed to the low end. In this case, however, the CDC has not implemented it well, and about half of the states and territories are grouped in a single category (6–50) that does, just, include the median (47, I think).
I am struck how unhelpful the CDC map is in context of the big news today that the virus is exploding in New York state, more particularly in New York city and its suburbs, so that New York has outstripped California and Washington. Then I looked at today’s Guardian map, and the degree of the problem is much more apparent:
Wow! This brings the story home. I must admit that the Guardian’s map lacks a scale for the graduated circles, so it’s hard to know how impressionistic it is and whether the newspaper as followed the CDC in altering its classification/scaling to match the increasing range of data values. We will see in a couple of days’ time when the circles for New York and its neighboring states start to overlap.
[update 22 March 2020] Wow, New York is getting hit hard: 5,000 or so cases two and a half days ago, almost 16,000 this afternoon. Ufda! Here’s the latest map from the Guardian: