Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

Kassaye Yitbarek Yigzaw,Johan Gustav Bellika,Antonis Michalas

doi:10.1186/s12911-016-0389-x

Abstract

BackgroundTechniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step.MethodsWe designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network.ResultsThe security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N − 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem.ConclusionsThe proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians.

Highlights

The focus of this paper is the reuse of health data horizontally partitioned between data custodians, such that each data custodian provides the same attributes for a set of patients
Security analysis We prove the security of the proposed protocol in the presence of corrupt data custodians or a corrupt coordinator who tries to learn information as a result of the protocol execution
For an adversary that controls a set of data custodians, the adversary’s view is a combination of the corrupt data custodians’ views

Summary

Introduction

The focus of this paper is the reuse of health data horizontally partitioned between data custodians, such that each data custodian provides the same attributes for a set of patients. Reusing data from multiple data custodians provides a sufficient number of patients who satisfy the inclusion criteria of a particular study. When data are collected across multiple data custodians, the data of a heterogeneous mix of patients can be reused. There has been substantial interest in the reuse of EHR data for public health surveillance, which requires data from multiple data custodians covering the geographic area of interest [5,6,7]. The increased adoption of EHR systems has led, and continues to lead, to the collection of large amounts of health data [1]. Survey, and registry data are being collected. These data could aid in the development of

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Informatics and Decision Making	Publication Date: Jan 3, 2017
Citations: 150	License type: open-access

R Discovery Prime

R Discovery Prime

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making

Lead the way for us

Similar Papers

Secure data analysis environments: can we agree on criteria for “Appropriate secure access” to linked health data?
Louisa R Jorm ... David Ford
International Journal of Population Data Science | VOL. 3
Louisa R Jorm, et. al.Louisa R Jorm ... David Ford
03 Sep 2018
International Journal of Population Data Science | VOL. 3

Accuracy of probabilistic and deterministic record linkage: the case of tuberculosis.
Gisele Pinto De Oliveira ... Rejane Sobrino Pinheiro
Revista de saude publica | VOL. 50
Gisele Pinto De Oliveira, et. al.Gisele Pinto De Oliveira ... Rejane Sobrino Pinheiro
01 Jan 2015
Revista de saude publica | VOL. 50

Privacy-preserving recommendation system based on user classification
Junwei Luo ... Fengling Han
Journal of Information Security and Applications | VOL. 79
Junwei Luo, et. al.Junwei Luo ... Fengling Han
02 Nov 2023
Journal of Information Security and Applications | VOL. 79

Blockchain-Based Deduplication and Integrity Auditing Over Encrypted Cloud Storage
Mingyang Song ... Hejiao Huang
IEEE Transactions on Dependable and Secure Computing | VOL. 20
Mingyang Song, et. al.Mingyang Song ... Hejiao Huang
01 Nov 2023
IEEE Transactions on Dependable and Secure Computing | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making