Medical imaging deep learning with differential privacy

Alexander Ziller,Rickmer Braren,Dmitrii Usynin,Georgios Kaissis,Daniel Rueckert,Marcus Makowski

doi:10.1038/s41598-021-93030-0

Abstract

The successful training of deep learning models for diagnostic deployment in medical imaging applications requires large volumes of data. Such data cannot be procured without consideration for patient privacy, mandated both by legal regulations and ethical requirements of the medical profession. Differential privacy (DP) enables the provision of information-theoretic privacy guarantees to patients and can be implemented in the setting of deep neural network training through the differentially private stochastic gradient descent (DP-SGD) algorithm. We here present deepee, a free-and-open-source framework for differentially private deep learning for use with the PyTorch deep learning framework. Our framework is based on parallelised execution of neural network operations to obtain and modify the per-sample gradients. The process is efficiently abstracted via a data structure maintaining shared memory references to neural network weights to maintain memory efficiency. We furthermore offer specialised data loading procedures and privacy budget accounting based on the Gaussian Differential Privacy framework, as well as automated modification of the user-supplied neural network architectures to ensure DP-conformity of its layers. We benchmark our framework’s computational performance against other open-source DP frameworks and evaluate its application on the paediatric pneumonia dataset, an image classification task and on the Medical Segmentation Decathlon Liver dataset in the task of medical image segmentation. We find that neural network training with rigorous privacy guarantees is possible while maintaining acceptable classification performance and excellent segmentation performance. Our framework compares favourably to related work with respect to memory consumption and computational performance. Our work presents an open-source software framework for differentially private deep learning, which we demonstrate in medical imaging analysis tasks. It serves to further the utilisation of privacy-enhancing techniques in medicine and beyond in order to assist researchers and practitioners in addressing the numerous outstanding challenges towards their widespread implementation.

Highlights

The successful training of deep learning models for diagnostic deployment in medical imaging applications requires large volumes of data
Artificial Intelligence (AI) is a heavily data-centric domain: the success of machine learning (ML) models depends on the quality and quantity of data that is available during training
We present a technical implementation of the Differential privacy (DP)-Stochastic Gradient Descent (SGD) algorithm based on parallelised execution, which makes our framework universally compatible with any neural network layer while enabling substantial performance improvements

Summary

Introduction

The successful training of deep learning models for diagnostic deployment in medical imaging applications requires large volumes of data. Formally an information-theoretic privacy guarantee, in practice DP is typically achieved through computationally secure means, that is, an addition of carefully calibrated noise to the training process, making individual contributions indistinguishable from each other In their seminal paper, Abadi et al.[12] demonstrated the successful application of DP in the training of deep neural networks, termed differentially private stochastic gradient descent (DP-SGD). The authors of this and subsequent works noted that the utilisation of DP-SGD unavoidably negatively affects the utility of the resulting models, a well-known effect termed the privacy-utility trade-off[13] Addressing this trade-off[14] and enabling the widespread real-world utilisation of privacy-preserving ML in medical imaging and beyond requires the introduction of robust software tools, suitable for implementation within widely-used deep learning libraries and implementing current best practices. We present a technical implementation of the DP-SGD algorithm based on parallelised execution, which makes our framework universally compatible with any neural network layer while enabling substantial performance improvements

Methods

Results

Conclusion