Distance-based linkage of personal microbiome records for identification and its privacy implications

Rudolf Mayer,Markus Hittmeir,Andreas Ekelhart

doi:10.1016/j.cose.2023.103538

Rudolf Mayer, Markus Hittmeir + Show 1 more

Open Access

https://doi.org/10.1016/j.cose.2023.103538

Copy DOI

Journal: Computers & Security	Publication Date: Oct 16, 2023
Citations: 1	License type: cc-by

Affiliation: SBA Research, TU Wien

Abstract

Due to its high potential for analysis in clinical settings, research on the human microbiome has been flourishing for several years. As an increasing amount of data on the microbiome is gathered and stored, analysing the temporal and individual stability of microbiome readings, and the succeeding privacy risks, has gained importance. In 2015, Franzosa et al. demonstrated the feasibility of matching and linking individuals in microbiome-based datasets from the Human Microbiome Project, which could lead to re-identification of individuals, and thus poses privacy implications for microbiome study designs. Their technique is based on the construction of body site-specific metagenomic codes that maintain a certain stability over time.In this paper, we establish a distance-based technique for personal microbiome identification, which is combined with a solution for avoiding spurious, false positive matches. In a direct comparison with the approach from Franzosa et al., which assumes that information is available as microbial records, rather than at the more detailed (but less likely to be shared) nucleic acid level, our method improves upon the identification results on most of the considered datasets. Our main finding is an increase of the average percentage of true positive identifications of 30% on the widely studied microbiome of the gastrointestinal tract. While we particularly recommend our method for application on the gut microbiome, we also observed substantial identification success on other body sites. Our results demonstrate the potential of privacy threats in microbiome data gathering, storage, sharing, and analysis, and thus underline the need for solutions to protect the microbiome as personal and sensitive medical data. We also show that the method is robust to various hyper-parameter settings.Based on our observations, we further identify challenges in personal microbiome identification research, specifically, the scarcity of benchmark data and associated data analysis tasks. Based on our experience, we propose solutions for a more systematic and comparable evaluation, considering also aspects of costs entailed with applying privacy-preserving methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distance-based linkage of personal microbiome records for identification and its privacy implications

Abstract

Talk to us

Similar Papers

More From: Computers & Security

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distance-based linkage of personal microbiome records for identification and its privacy implications

Abstract

Talk to us

Similar Papers

More From: Computers &amp; Security

More From: Computers & Security