Privacy-Preserving Publishing of Individual-Level Pandemic Data Based on a Game Theoretic Model

Abinitha Gourabathina,Chao Yan,Zhiyu Wan,Bradley A Malin,J Thomas Brown

doi:10.1109/bibm55620.2022.9995513

Abstract

Sharing individual-level pandemic data is essential for accelerating the understanding of a disease. For example, COVID-19 data have been widely collected to support public health surveillance and research. In the United States, these data need to be de-identified before being released to the public due to privacy concerns. However, current data publishing approaches for individual-level pandemic data, such as those adopted by the U.S. Centers for Disease Control and Prevention (CDC), have not flexed over time to account for the dynamic nature of infection rates. Thus, the policies generated by these strategies may either raise privacy risks or impair the data utility (or usability). To optimize the tradeoff between privacy risk and data utility, we introduce a game theoretic model that adaptively generates policies to publish individual-level COVID-19 data according to infection dynamics. We model the data publishing process as a two-player Stackelberg game between a data publisher and a data recipient and then search for the best strategy for the publisher. In this game, we consider 1) the average accuracy of predicting future case counts for all demographic groups, and 2) the mutual information between the original data and the released data. We use COVID-19 case data from Vanderbilt University Medical Center from March 2020 to December 2021 to demonstrate our model and evaluate its effectiveness. The experimental results show that our game theoretic model outperforms all baseline approaches, including those adopted by CDC, while maintaining low privacy risk.

Full Text