Patrick Ball, Human Rights Data Analysis Group
Kristian Lum, Human Rights Data Analysis Group
Tarak Shah, Human Rights Data Analysis Group
Megan Price, Human Rights Data Analysis Group
Right now, the world is eager for information about the impact of the coronavirus and its disease, COVID-19. How many people will be infected? How many people will ultimately die?
These are largely questions for epidemiologists. We are not epidemiologists, we’re statisticians who work for the Human Rights Data Analysis Group. We use data to learn about the world, and we study mortality caused due to deliberate violence, not infectious disease. But many of the tools and concepts we use in our work – specialized statistical models designed to help us read incomplete and noisy data collected in situations with high uncertainty – give us insight into the models, assumptions, and data used by scientists to learn about the currently unfolding epidemic. That said, we want to be clear: what we are doing here is interpreting the good work of other scientists, not reporting our own. This essay is based on two extraordinarily good studies1 which were recently discussed on the blog of one of the most eminent statisticians in the US. You can read the conversation here.
An important aspect of the disease that we all would like to understand is how deadly it is. Of those people who become infected, how many die? If we had complete data on every infection and every death, this would be simple. We could simply divide (number who die) / (number infected) to get our answer. Unfortunately, the data are far from complete.
The World Health Organization gets data from different countries and regions about how many people are diagnosed with COVID-19, and how many of those people have died. But not all people who are infected are diagnosed. As has been pretty clear in the US recently, not very many people are being tested, even when they are exhibiting symptoms. Many of the people who are infected only develop mild symptoms, or no symptoms at all, and therefore never get tested. If we try to come up with a fatality rate based on the numbers we do have, we’ll underestimate the number of people who are infected, and therefore we will overestimate how deadly the virus is.
What we need to know is how many people are actually infected in a given region – those who have been diagnosed plus those who have not. Specifically, we need to estimate what proportion of infections go on to be diagnosed. This is sometimes called the reporting rate.
Two recent studies try to answer this question. One study by Li et al. (cited below) bases its estimates of the undiagnosed infected population around data on the number of confirmed infections and deaths in cities across China, and data on how much travel took place between each of those cities. Li et al. estimate that only about a sixth of all infected people are diagnosed. One of the authors of the Li et al. study estimates that the likelihood that an infected person dies is approximately 0.24–0.48 per cent, on average.
In a different study, Riou et al. (also cited below) build a model from a series of assumptions about how the disease progresses. The assumptions are not wild guesses, they are based on measures made by the Chinese public health agency plus what we know from previous epidemics. Interestingly, they also make use of data from infections of passengers on a cruise ship. This data is unique, because due to the small self-contained population of a stranded cruise ship they were able to get accurate measurements of the full population, passengers and crew. In this unique case, it was actually possible to know how many infected people there were, as all people on the ship could be tested.
In the real world, we cannot know the true number of infected people because not all people can be tested. Riou et al. used the information they learned from the cruise ship population to inform their model about the much larger general population. They put their models together and estimated a total fatality rate of about 1.6 per cent. That means they estimate that 1.6 per cent of the people infected will die, on average.
But the average masks the real risk to each person. All mortality risks are age specific, and it’s very clear that elderly people who are infected with this strain of coronavirus are at much greater risk of death than younger people (both papers analyze the risk for different ages). Similarly, people with hypertension, diabetes and other diseases are also at relatively greater risk.
These studies use different models, and each model includes a different set of assumptions, both of which seem to us reasonable given what we know at the moment. Yet the two papers come to very different conclusions about the deadliness of the disease. We think one key difference between the studies (but not the only one) is that Li et al. estimate that only about 14 per cent of all infected people are documented, while Riou et al. estimate that about 27 per cent of infections were accurately documented by the Chinese public health authorities.
These differences reflect just how much uncertainty there is right now. It is right that the scientific models differ – because we genuinely don’t have enough data to know which set of assumptions is more realistic. The best data would include tests of people who have symptoms and some who don’t. That would give us a better sense of how many people are infected but not sick. However, testing is expensive and in short supply, so in the short-term, testing will be mostly for those with symptoms. We know some asymptomatic people are being tested (especially if they have a known exposure or live in areas with high testing rates) but we don’t know how many. In time, we are likely to have more and better data through better tracking and more effective data sharing among national public health authorities (for more information on how testing is happening, see this summary). With better data, we will learn which models make more sense, and we will improve the modeling of the fatality rate – and how to decrease it.
What is certain now are the practices we can take in our day-to-day lives to slow the spread of COVID–19. Neither the infectiousness nor the deadliness of the disease is set in stone. By washing our hands and engaging in social distancing, we know that we can slow down how quickly the disease spreads. That in turn will reduce the burden on our medical systems, which means more care for people who are infected and gravely ill—and more care means fewer deaths.
1. The first study is Riou, J, MJ Counotte, A Houser, and CL Althaus. 2020. “Adjusted age-specific case fatality ratio during the COVID–19 epidemic in Hubei, China, January and February 2020.” in medRxiv here. The second study is Ruiyun L, S Pei, B Chen, Y Song, T Zhang, W Yang, and J Shaman. 2020. “Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (COVID–19).” in medRxiv here.↩
Image © National Institute of Allergy and Infectious Diseases (NIAID)