Curating a COVID-19 data repository and forecasting county-level death counts in the United States

Nick Altieri, Rebecca Barter, James Duncan, Raaz Dwivedi, Karl Kumbier, Xiao Li, Robert Netzorg, Briton Park, Chandan Singh, Yan Shuo Tan, Tiffany Tang, Yu Wang, Bin Yu

arXiv.org
May 16, 2020

Abstract: In recent weeks, the novel Coronavirus causing COVID-19 has dramatically changed the shape of our global society and economy to an extent modern civilization has never experienced. In this paper, we collate a large data repository containing COVID-19 information from a range of di↵erent sources.1. We use this data to develop several predictors for forecasting the short-term (e.g., over the next week) trajectory of COVID-19-related recorded deaths at the county-level in the United States using data from January 22, 2020, to April 8, 2020. Specifically, we produce several different predictors and combine their forecasts using ensembling techniques, resulting in an ensemble we refer to as Combined Linear and Exponential Predictors (CLEP). Our individual predictors include county-specific exponential and linear predictors, an exponential predictor that pools data together across counties, and a demographics-based exponential predictor. We also incorporate a linear predictor and demographic features into our ensemble. The hope is that an understanding of the expected number of deaths over the next week or so will help guide necessary county-specific decision-making and provide a realistic picture of the direction in which we are heading.

Download/print the PDF Article.



Featured Fellows

Rebecca Barter

Statistics
Alumni - Data Science Fellow

Bin Yu

Statistics, UC Berkeley
BIDS Senior Fellow