Abstract

Due to its high potential for analysis in clinical settings, research on the human microbiome has been flourishing for several years. As an increasing amount of data on the microbiome is gathered and stored, analysing the temporal and individual stability of microbiome readings, and the succeeding privacy risks, has gained importance. In 2015, Franzosa et al. demonstrated the feasibility of matching and linking individuals in microbiome-based datasets from the Human Microbiome Project, which could lead to re-identification of individuals, and thus poses privacy implications for microbiome study designs. Their technique is based on the construction of body site-specific metagenomic codes that maintain a certain stability over time.In this paper, we establish a distance-based technique for personal microbiome identification, which is combined with a solution for avoiding spurious, false positive matches. In a direct comparison with the approach from Franzosa et al., which assumes that information is available as microbial records, rather than at the more detailed (but less likely to be shared) nucleic acid level, our method improves upon the identification results on most of the considered datasets. Our main finding is an increase of the average percentage of true positive identifications of 30% on the widely studied microbiome of the gastrointestinal tract. While we particularly recommend our method for application on the gut microbiome, we also observed substantial identification success on other body sites. Our results demonstrate the potential of privacy threats in microbiome data gathering, storage, sharing, and analysis, and thus underline the need for solutions to protect the microbiome as personal and sensitive medical data. We also show that the method is robust to various hyper-parameter settings.Based on our observations, we further identify challenges in personal microbiome identification research, specifically, the scarcity of benchmark data and associated data analysis tasks. Based on our experience, we propose solutions for a more systematic and comparable evaluation, considering also aspects of costs entailed with applying privacy-preserving methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.