Leaked Chinese Virus Database Covers 230 Cities, 640,000 Updates
New information may offer insight into the honesty of China’s coronavirus numbers.
Beijing claims that since the coronavirus pandemic began at the end of last year, there have been only 82,919 confirmed cases and 4,633 deaths in mainland China. Those numbers could be roughly accurate, and in that case a detailed account would be an important tool in judging the spread of the virus. But it’s also possible that the numbers presented to the rest of the world are vastly understated compared to Beijing’s private figures. The opaqueness and mistrust of outsiders in the Chinese Communist Party’s system makes it hard to judge—but learning more about the coronavirus data used directly by Chinese officials is invaluable for governments elsewhere. A dataset of coronavirus cases and deaths from the military’s National University of Defense Technology, leaked to Foreign Policy, offers insight into how Beijing has gathered coronavirus data on its population. The source of the leak, who asked to remain anonymous because of the sensitivity of sharing Chinese military data, said that the data came from the university. The school publishes a data tracker for the coronavirus: The online version matches with the leaked information, except it is far less detailed—it shows just the map of cases, not the distinct data.
The dataset, though it contains inconsistencies—and though it may not be comprehensive enough to contradict Beijing’s official numbers—is the most extensive dataset proved to exist about coronavirus cases in China. But more importantly, it can serve as a valuable trove of information for epidemiologists and public health experts around the globe—a dataset that Beijing has almost certainly not shared with U.S. officials or doctors. (The World Health Organization and the U.S. Centers for Disease Control and Prevention did not immediately respond to requests for comment.)
While not fully comprehensive, the data is incredibly rich: There are more than 640,000 updates of information, covering at least 230 cities—in other words, 640,000 rows purporting to show the number of cases in a specific location at the time the data was gathered. Each update includes the latitude, longitude, and “confirmed” number of cases at the location, for dates ranging from early February to late April.
For locations in and around the center of the outbreak in Wuhan, Hubei province, the data also includes deaths and those who “recovered.” It’s unclear how the dataset’s authors define “confirmed” and “recovered”: Like other countries, China has updated its counting methods, as demonstrated in mid-February when Hubei’s reported cases spiked because officials announced they were including patients diagnosed with CT scans. Unlike in other countries, China’s outbreak peaked before rigorous testing methods were widely available, and the Communist Party often manipulates data for political purposes.
The data reviewed by Foreign Policy includes hospital locations, but it also includes place names corresponding to apartment compounds, hotels, supermarkets, railway stations, restaurants, and schools across the breadth of the country. The dataset reports one case of coronavirus in a KFC in the eastern city of Zhenjiang on March 14, for example, while a church in the northeastern provincial capital of Harbin saw two cases on March 17. (The data does not include the names of the individuals who contracted or died from the disease, and the reports of the cases in the dataset could not be independently verified.)
It’s unclear as yet how the university gathered the data. The online version says that they aggregated the data from China’s health ministry, the National Health Commission, media reports, and other public sources. According to its website, the university, based in the central Chinese city of Changsha, is “under the direct leadership of the Central Military Commission,” the body that oversees China’s military. The military has played a large role in mobilizing against the virus: It has helped enforce quarantines, transport supplies, and treat patients. A propaganda message on a prominent military website in China reads, “In the fight against the epidemic, the people’s army is on the move!”
The man most responsible for building the database appears to be Zhang Haisu, a director at the school’s Information and Communication Department. In a May press release, the university credits Zhang for building the “Fight the Virus to Return to Work Database” and praises his dedication. A note on the data tracker’s website reads, “Currently our country is taking forceful measures, and the epidemic situation is being strictly managed and controlled. Please correctly understand that to use the relevant data.” The site features a contact email for a Zhang Haisu; no one responded when Foreign Policy reached out. The university did not respond to a request for comment.
Foreign Policy and 100Reporters, who are co-publishing this piece, are not making the database publicly available for now for reasons of security, but are exploring ways to make the data available for researchers studying the spread of the coronavirus.
For its popular coronavirus tracker, John Hopkins University gathers its data on Chin from DXY, a Chinese medical platform that aggregates cases in the country. But DXY provides information at only the provincial level. Richer information would benefit researchers, and ordinary people who are eager to know more about how the disease has affected other countries and spread. Patterns in the data could add to what is known about the disease, and the ways Beijing manipulates its numbers. Medical researchers expressed skepticism in mid-April, after Wuhan revised the number of coronavirus deaths from 2,579 to 3,869—an increase of exactly 50 percent.
Why does Beijing restrict access to its coronavirus data? Possibly because of malice or mistrust toward the United States, at a time when tensions are running high. Possibly because of bureaucratic errors. And possibly because Beijing fears that outside researchers will learn of its extensive cover-up, destroying the narrative that an authoritarian nation like China is better equipped to protect its people against a pandemic. Even the public version of the National University of Defense Technology dataset sporadically restricts American IP addresses. To access the military university’s website hosting the map for the first time, one of the present authors had to use a virtual private network to pretend he was browsing in Uruguay.
Maria Krol Sinclair is an independent researcher living in Washington DC. She primarily researches space and technology policy.