Bin Yu and team curating COVID-19 data repository and forecasting county-level death counts in the US

April 20, 2020

The Yu Group at UC Berkeley Statistics and EECS has compiled, cleaned and continues to update a large corpus of hospital- and county-level data from a variety of public sources to aid data science efforts to combat COVID-19 (see covidseverity.com). At the hospital level, our data include the location of the hospital, the number of ICU beds, the total number of employees, and the hospital type. At the county level, our data include COVID-19 cases/deaths from USA Facts and NYT, automatically updated every day, along with demographic information, health resource availability, COVID-19 health risk factors, and social mobility information.

An overview of each data set in this corpus is provided here. We will be adding more relevant data sets as they are found. We prepared this data to support healthcare supply distribution efforts through short-term (days) prediction of COVID-19 deaths (and cases) at the county level. We are using the predictions and hospital data to arrive at a covid Pandemic Severity Index (c-PSI) for each hospital. This project is in partnership with Response4Life.

Yu et al - COVID-19 deaths mapA paper on the current approaches can be found at this link. The more detailed information with data source descriptions is provided on the github.

Curating a COVID-19 data repository and forecasting county-level death counts in the United States
April 17, 2020  |  Pending at arXiv.org
Nick Altieri, Rebecca Barter, James Duncan, Raaz Dwivedi, Karl Kumbier, Xiao Li, Robert Netzorg, Briton Park, Chandan Singh, Yan Shuo Tan, Tiffany Tang, Yu Wang, Bin Yu



Featured Fellows

Bin Yu

Statistics, UC Berkeley
Faculty Affiliate

Rebecca Barter

Statistics
Alumni - Data Science Fellow