Harnessing Google Health Trends API Data for Epidemiologic Research: A Methodological Approach

Krista Neumann, Susan M. Mason, Kriszta Farkas, N Jeanie Santaularia, Jennifer Ahern, Corinne A. Riddell

2021 Interdisciplinary Association for Population Health Science (IAPHS) Conference
October 20, 2021

Berkeley Computational Social Science Fellow Krista Neumann, a PhD Student in Epidemiology & Biostatistics in the School of Public Health at UC Berkeley, presented this poster at the 2021 Interdisciplinary Association for Population Health Science (IAPHS) Conference, held virtually on October 19-21. 

Introduction: Data from the Google Health Trends Application Programming Interface (GHT-API) can be useful for characterizing epidemiological patterns of exposure/disease. To access, researchers specify the search term(s), geographic region, and time period of interest, and the GHT-API returns an estimated scaled proportion of all Google searches. However, there is little formal guidance about how to craft a GHT-API search strategy that will most accurately measure a construct of interest. Specific challenges include: 1) Data is suppressed when the number of searches is below a specific, undocumented threshold; and 2) Sampling variation due to the fact that GHT-API estimates proportions from a uniformly distributed random sample that is updated once a day. Our objective is to describe best practices when using GHT-API to measure a construct of interest.

Motivating Case Study: To examine trends in child abuse and neglect during the COVID-19 pandemic. Of concern is the possibility that pandemic-related challenges (e.g. school closures) may reduce the number of detected child abuse and neglect cases via traditional data sources, even if incidence of abuse and neglect increased. We thus investigated GHT-API as a real-time data source to capture state-level trends in child abuse and neglect.

Featured Fellows

Krista Neumann

Public Health, UC Berkeley
Computational Social Science Fellow