Abstract

The artificial intelligence revolution has been spurred forward by the availability of large-scale datasets. In contrast, the paucity of large-scale medical datasets hinders the application of machine learning in healthcare. The lack of publicly available multi-centric and diverse datasets mainly stems from confidentiality and privacy concerns around sharing medical data. To demonstrate a feasible path forward in medical image imaging, we conduct a case study of applying a differentially private federated learning framework for analysis of histopathology images, the largest and perhaps most complex medical images. We study the effects of IID and non-IID distributions along with the number of healthcare providers, i.e., hospitals and clinics, and the individual dataset sizes, using The Cancer Genome Atlas (TCGA) dataset, a public repository, to simulate a distributed environment. We empirically compare the performance of private, distributed training to conventional training and demonstrate that distributed training can achieve similar performance with strong privacy guarantees. We also study the effect of different source domains for histopathology images by evaluating the performance using external validation. Our work indicates that differentially private federated learning is a viable and reliable framework for the collaborative development of machine learning models in medical image analysis.

Highlights

  • Ized server which facilitates training a shared model and addresses critical issues such as data privacy, security, access rights, and ­heterogeneity[6]

  • The MEM model is locally trained through Differential privacy (DP)-SGD to provide quantitative privacy bounds, and the local MEM models are centrally aggregated through FedAvg

  • We proposed differentially private federated learning as a potential method for learning from decentralized medical data such as histopathology images

Read more

Summary

Introduction

Ized server which facilitates training a shared model and addresses critical issues such as data privacy, security, access rights, and ­heterogeneity[6]. In FL, every client locally trains a copy of the centralized model, represented by the model weights ω, and reports its updates back to the server for aggregation across clients, without disclosing local private data. The central server receives the updated weights ωti+1 ← ω ni n ωti+1 , where n is t + 1 i=1 from all participating clients and averages them to update the central model, t + 1 ← i = 1 n the number of data points used by client i. Li et al.[12] proposed a new framework for robust FL where the central server learns to detect and remove malicious updates using a spectral anomaly detection model, leading to targeted defense. Li et al.[15] tackles the problem of domain adaptation with a physics-driven generative approach to disentangle the information about model and geometry from the imaging ­sensor[6]

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call