PPML-Omics: A privacy-preserving federated machine learning method protects patients' privacy in omic data.

Juexiao Zhou,Siyuan Chen,Yulian Wu,Haoyang Li,Bin Zhang,Longxi Zhou,Yan Hu,Zihang Xiang,Zhongxiao Li,Ningning Chen,Wenkai Han,Chencheng Xu,Di Wang,Xin Gao

doi:10.1126/sciadv.adh8601

Abstract

Modern machine learning models toward various tasks with omic data analysis give rise to threats of privacy leakage of patients involved in those datasets. Here, we proposed a secure and privacy-preserving machine learning method (PPML-Omics) by designing a decentralized differential private federated learning algorithm. We applied PPML-Omics to analyze data from three sequencing technologies and addressed the privacy concern in three major tasks of omic data under three representative deep learning models. We examined privacy breaches in depth through privacy attack experiments and demonstrated that PPML-Omics could protect patients' privacy. In each of these applications, PPML-Omics was able to outperform methods of comparison under the same level of privacy guarantee, demonstrating the versatility of the method in simultaneously balancing the privacy-preserving capability and utility in omic data analysis. Furthermore, we gave the theoretical proof of the privacy-preserving capability of PPML-Omics, suggesting the first mathematically guaranteed method with robust and generalizable empirical performance in protecting patients' privacy in omic data.

Full Text