Acceleration of COVID-19 Cases

We can visualize how an epidemic is growing in different countries by plotting the number of new cases on a map. However, we might also be interested in the second derivative, or how fast the number of new cases itself is growing.

I am not sure if there is a common name out there for such a metric, but for the purpose of this blog post let’s call it acceleration.

I was curious to see what the acceleration of reported COVID19 cases is among different countries, so I plotted the second derivative of reported case count (\(\frac{d^2 n}{dt^2}\)).

I created this map using publicly available data on March 3rd, 2020.

Reported numbers represent the difference between the number of the new COVID-19 cases between March 2nd, and March 1st, 2020. The unit is case/day2. Positive numbers show an increase in the rate or speed of detecting new cases (acceleration), while negative values suggest a decrease in the speed of case detection (deceleration).

The second map shows the number of newly reported cases (\(\frac{dn}{dt}\)) on the same day:

The number of cases can still be growing even if a country is experiencing a negative acceleration. Take China. The number of reported cases was decelerating fastest in China, however China still ranked 4th in the number of new cases reported on March 2nd. What this means is that new cases are being detected, albeit at a slowing pace compared to the previous days.

A change in the speed of the case detection can reflect many different things. It can be due to an actual change in disease transmission rate or simply be an artifact of a change in the number of people getting tested, due to changes in policy or availability of tests. It is important to underscore that we cannot make much sense out of this data, in the absence of context and subject knowledge.

The two maps above are based on the publicly available dataset on Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE.

We can use the same dataset to produce similar maps in R. One big advantage of using R is that we can create a pipeline from the data source to the visualization that can easily update plots and maps as newer data becomes available.

The first code snippet shown below downloads latest data from Johns Hopkins University in order to calculate the average rate and acceleration of cases during the past three days:

library(tidyverse)
library(lubridate)
library(ggthemes)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
library(RColorBrewer)

#downloading latest data from JHU

 caseType <- "confirmed_global"
      url <- paste0("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_", caseType, ".csv")
      time_series_19_covid_Confirmed <- read_csv(url)
      
      covidCases <- time_series_19_covid_Confirmed %>% rename (name = "Country/Region") %>%
#        filter (name != "US" | (name == "US" & `Province/State` %in% state.name)) %>% group_by(name) %>%
                group_by(name) %>%
                summarise_at(vars(5:(length(time_series_19_covid_Confirmed)-1)), sum, na.rm = FALSE)  %>% mutate(name = replace(name, name == "US", "United States"))

Now we define a function, called derivative that takes the number of cases in each day, and calculates the first order derivative of it. Applying it once gives us the rate, and applying it twice gives us the second order derivative, or the acceleration.

# Calculating the average acceleration during the past three days

derivative <- function(sites) {
  for (i in 1:dim(sites)[1]) {
    for (j in length(sites):6){
      sites[i, j] <- as.numeric(sites[i, j] - sites[i, j-1])
    }  
  }
  return(sites)
}  

covidRate         <- derivative(covidCases)
covidAcceleration <- derivative(derivative(covidCases))

We can then average both the rate and the acceleration over the past three days and merge this data with geograhical data to create a map, as shown below:

# averaging acceleration over the past 3 days
covidAcceleration[, "threeDayAcceleration"] <- as.vector((covidAcceleration[length(covidAcceleration)] + covidAcceleration[length(covidAcceleration)-1] + covidAcceleration[length(covidAcceleration)-2])/3)

# merging datasets and plotting the map of Coronavirus
world <- ne_countries(scale = "medium", returnclass = "sf")

covidAccelerationWorld <- world %>% left_join(covidAcceleration)

world_points<- st_centroid(world)
world_points <- cbind(world, st_coordinates(st_centroid(world$geometry)))

ggplot(data = covidAccelerationWorld) +
  geom_sf(aes(fill=threeDayAcceleration)) + scale_fill_distiller(type = "div", palette = "RdBu", aesthetics = "fill") +
  coord_sf(xlim = c(-30, 130), ylim = c(-0, 70), expand = FALSE) +
  ggtitle("Acceleration of Reported COVID-19 Cases", paste0("as of ", lubridate::now(), " PST")) +
  theme_tufte()

A similar piece of code can be used to create a map of the rate of reported cases:

Alternatively, we can visualize this using the Leaflet package, to give the map a more sleek look and interactive features:

## Plotting with Leaflet
library(leaflet)

covidAccelerationWorldLeaf <- covidAccelerationWorld %>% filter (!is.na(threeDayAcceleration))
map <- leaflet(data = covidAccelerationWorldLeaf) %>%
  addTiles() %>%  # Add default OpenStreetMap map tiles
  addPolygons(fillColor = topo.colors(10, alpha = NULL), stroke = FALSE)

# Create a continuous palette function
clrs <- rev(brewer.pal(11, "RdBu"))
scale_range <- c(-1, 1) * max(abs(covidAccelerationWorldLeaf$threeDayAcceleration))
         
pal <- colorNumeric(
       palette = clrs,
      domain = scale_range)

# Apply the function to provide RGB colors to addPolygons
labels <- sprintf(
  "<strong>%s</strong><br/>%g cases/day<sup>2</sup>",
   covidAccelerationWorldLeaf$name,  round(covidAccelerationWorldLeaf$threeDayAcceleration, 1)
) %>% lapply(htmltools::HTML)

map %>%
  addPolygons(smoothFactor = 0.2, fillOpacity = 0.7,
              fillColor = ~pal(threeDayAcceleration), weight = 0.5, color = "white", opacity = 1, dashArray = "3", 
  highlight = highlightOptions(
    weight = 0.5,
    color = "#666",
    dashArray = "",
    fillOpacity = 0.7,
    bringToFront = TRUE), 
  label = labels,
  labelOptions = labelOptions(
    style = list("font-weight" = "normal", padding = "3px 8px"),
    textsize = "15px",
    direction = "auto"))  %>% 
  addLegend("bottomright", pal = pal, values = ~threeDayAcceleration,
    title = "Acceleration",
    opacity = 0.6
  )

3-Day Average Acceleration of Reported COVID-19 Cases as of 2020-03-24 11:06:40 PST

Similarly, we can create a Leaflet map for the rate of reported cases:

3-Day Average Rate of Reported COVID-19 Cases as of 2020-03-24 11:06:44 PST

A bar chart can make it a bit easier to compare rate and acceleration among certain countries:

#selecting countries for bar chart
targetBar <- c("Australia",   
            "Austria",
            "Canada",
            "China",
            "France",
            "Germany",
            "Hong Kong",
            "Iran",
            "Italy",
            "Japan",
            "Kuwait",
            "Malaysia",
            "Netherlands",
            "Singapore",
            "South Korea",
            "Spain",
            "United Kingdom",
            "United States")

# plottting the bar chart
barChartDataAcceleration <- covidAcceleration %>% filter (name %in% targetBar & !is.na(threeDayAcceleration))
ggplot(data = barChartDataAcceleration) +
  geom_col(aes(y = threeDayAcceleration, x = reorder(name, threeDayAcceleration), fill=threeDayAcceleration)) +
        scale_fill_distiller(type = "div", palette = "RdBu",  limits = c(-1, 1) * max(abs(barChartDataAcceleration$threeDayAcceleration)), aesthetics = "fill")+
  coord_flip() + xlab ("") + ylab ("acceleration") + 
  ggtitle("Acceleration of Reported COVID-19 Cases") +
  labs(caption = paste0("(Rolling 3-day average as of ", lubridate::now(), " PST)")) + 
  theme_tufte()

A similar piece of code can be used to produce the bar chart for the rate of reported cases:

All of our visualizations so far showed rate and acceleration at one point in time. We may also be interested in the trend of acceleration over the past few weeks:

# Plotting accelration over time
targetLine <- c("Canada", 
            "China",
            "France",
            "Iran",
            "Germany",
            "Italy",
            "South Korea",
            "United States")

rollingAvg <- function(data) {
  for (i in 1:dim(data)[1]) {
    for (j in length(data):4){
      data[i, j] <- as.numeric(((data[i, j] + data[i, j-1] + data[i, j-2])/3))
    }  
  }
  return(data)
} 

lineDataAcceleration <- rollingAvg(barChartDataAcceleration) %>% select(-threeDayAcceleration) %>% pivot_longer(cols = -1, names_to = "date", values_to = "acceleration") %>% mutate(date=mdy(date)) %>%filter(date>"2020-02-21" & name %in% targetLine)

# The palette with black:
colourBlindPal <- c("#000000", "#E69F00", "#56B4E9", "#009E73",
          "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

ggplot(data = lineDataAcceleration, aes(x=date, y=acceleration, colour = name)) +
  geom_line(size=1) + xlab ("") + ylab ("cases/day^2") +
  ggtitle("Acceleration of Reported COVID-19 Cases") + 
  labs(caption = paste0("(Rolling 3-day average as of ", lubridate::now(), " PST)")) + 
  scale_colour_manual(values=colourBlindPal) +
  theme_tufte() +
  theme(legend.title=element_blank()) 

And for the trend of rate: