Abstract

BackgroundAnalysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data. Some privacy-preserving methods are known for analyzing horizontally-partitioned data, where different organisations have similar data on disjoint sets of people. Technically more challenging is the case of vertically-partitioned data, dealing with data on overlapping sets of people. We use an emerging technology based on cryptographic techniques called secure multi-party computation (MPC), and apply it to perform privacy-preserving survival analysis on vertically-distributed data by means of the Cox proportional hazards (CPH) model. Both MPC and CPH are explained.MethodsWe use a Newton-Raphson solver to securely train the CPH model with MPC, jointly with all data holders, without revealing any sensitive data. In order to securely compute the log-partial likelihood in each iteration, we run into several technical challenges to preserve the efficiency and security of our solution. To tackle these technical challenges, we generalize a cryptographic protocol for securely computing the inverse of the Hessian matrix and develop a new method for securely computing exponentiations. A theoretical complexity estimate is given to get insight into the computational and communication effort that is needed.ResultsOur secure solution is implemented in a setting with three different machines, each presenting a different data holder, which can communicate through the internet. The MPyC platform is used for implementing this privacy-preserving solution to obtain the CPH model. We test the accuracy and computation time of our methods on three standard benchmark survival datasets. We identify future work to make our solution more efficient.ConclusionsOur secure solution is comparable with the standard, non-secure solver in terms of accuracy and convergence speed. The computation time is considerably larger, although the theoretical complexity is still cubic in the number of covariates and quadratic in the number of subjects. We conclude that this is a promising way of performing parametric survival analysis on vertically-distributed medical data, while realising high level of security and privacy.

Highlights

  • Analysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data

  • In this article we focus on the latter scenario and propose a privacy-preserving version of the Cox proportional hazards (CPH) [6] model for vertically-partitioned data

  • A benefit of the MPyC platform facilitates asynchronous evaluation of multi-party computation (MPC) protocols, implying that a party can perform local computations while she is waiting for other players’ information. Another main reason for using MPyC is its communication-efficient protocol for performing inner products, which is a key operation in our protocol for training the CPH model

Read more

Summary

Introduction

Analysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data. We use an emerging technology based on cryptographic techniques called secure multi-party computation (MPC), and apply it to perform privacy-preserving survival analysis on vertically-distributed data by means of the Cox proportional hazards (CPH) model. In biomedical research, linking data from multiple sources can make analyses more robust and allows studies to take into account additional information. These combined datasets might reveal patterns that the data in isolation cannot. Combining data can lead to discoveries with profound clinical implications

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.