Balancing Accuracy and Privacy in Federated Queries of Clinical Data Repositories: Algorithm Development and Validation.

Yun William Yu,Griffin M Weber

doi:10.2196/18735

Abstract

BackgroundOver the past decade, the emergence of several large federated clinical data networks has enabled researchers to access data on millions of patients at dozens of health care organizations. Typically, queries are broadcast to each of the sites in the network, which then return aggregate counts of the number of matching patients. However, because patients can receive care from multiple sites in the network, simply adding the numbers frequently double counts patients. Various methods such as the use of trusted third parties or secure multiparty computation have been proposed to link patient records across sites. However, they either have large trade-offs in accuracy and privacy or are not scalable to large networks.ObjectiveThis study aims to enable accurate estimates of the number of patients matching a federated query while providing strong guarantees on the amount of protected medical information revealed.MethodsWe introduce a novel probabilistic approach to running federated network queries. It combines an algorithm called HyperLogLog with obfuscation in the form of hashing, masking, and homomorphic encryption. It is tunable, in that it allows networks to balance accuracy versus privacy, and it is computationally efficient even for large networks. We built a user-friendly free open-source benchmarking platform to simulate federated queries in large hospital networks. Using this platform, we compare the accuracy, k-anonymity privacy risk (with k=10), and computational runtime of our algorithm with several existing techniques.ResultsIn simulated queries matching 1 to 100 million patients in a 100-hospital network, our method was significantly more accurate than adding aggregate counts while maintaining k-anonymity. On average, it required a total of 12 kilobytes of data to be sent to the network hub and added only 5 milliseconds to the overall federated query runtime. This was orders of magnitude better than other approaches, which guaranteed the exact answer.ConclusionsUsing our method, it is possible to run highly accurate federated queries of clinical data repositories that both protect patient privacy and scale to large networks.

Highlights

Widespread adoption of electronic health records has generated vast amounts of data, which are increasingly being used in clinical, epidemiological, and public health research [1]
We propose a new method for combining data from sites in a federated clinical data network, based on the HyperLogLog (HLL) probabilistic sketching algorithm [19]
The hospitals determine which of their patients match the query and return a result to the hub

Summary

Introduction

Background Widespread adoption of electronic health records has generated vast amounts of data, which are increasingly being used in clinical, epidemiological, and public health research [1]. An alternative approach is to create federated clinical data research networks, which broadcast queries to multiple sites, run analyses locally, and combine the results. In this way, sites retain control over their patient data. Because patients can receive care from multiple sites in the network, adding the numbers frequently double counts patients. Various methods such as the use of trusted third parties or secure multiparty computation have been proposed to link patient records across sites. They either have large trade-offs in accuracy and privacy or are not scalable to large networks

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Medical Internet Research	Publication Date: Nov 3, 2020
Citations: 10	License type: cc-by

R Discovery Prime

R Discovery Prime

Balancing Accuracy and Privacy in Federated Queries of Clinical Data Repositories: Algorithm Development and Validation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Medical Internet Research

Lead the way for us

Similar Papers

Efficient privacy-preserving aggregation for demand side management of residential loads
Emilio J Palacios-Garcia ... Geert Deconinck
Applied Energy | VOL. 328
Emilio J Palacios-Garcia, et. al.Emilio J Palacios-Garcia ... Geert Deconinck
26 Oct 2022
Applied Energy | VOL. 328

Peer-to-peer secure multi-party numerical computation facing malicious adversaries
Danny Bickson ... Danny Dolev
Peer-to-Peer Networking and Applications | VOL. 3
Danny Bickson, et. al.Danny Bickson ... Danny Dolev
10 Jun 2009
Peer-to-Peer Networking and Applications | VOL. 3

Correction Methods for Organic Carbon Artifacts When Using Quartz-Fiber Filters in Large Particulate Matter Monitoring Networks: The Regression Method and Other Options
Francesco Maimone ... Andrea Polidori
Journal of the Air & Waste Management Association | VOL. 61
Francesco Maimone, et. al.Francesco Maimone ... Andrea Polidori
01 Jun 2011
Journal of the Air & Waste Management Association | VOL. 61

Applications of Homomorphic Encryption in Secure Computation
Elissa Mollakuqe ... Arber Parduzi
Open Research Europe | VOL. 4
Elissa Mollakuqe, et. al.Elissa Mollakuqe ... Arber Parduzi
29 Jul 2024
Open Research Europe | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Balancing Accuracy and Privacy in Federated Queries of Clinical Data Repositories: Algorithm Development and Validation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Medical Internet Research