Abstract

The adoption of the advanced data analytics methods has been limited in industries governed by strict data reuse regulations, such as healthcare. Barriers to data access and sharing have affected numerous research and development initiatives in healthcare resulting in major delays, extensive use of resources for data access and findings originating from datasets that are too small to be generalizable. Federated machine learning presents a solution to the problems health data analytics projects are facing by providing a way of complying with strict regulatory requirements without sacrificing privacy. Computing frameworks supporting federated machine learning are still in their infancy and their performance in realistic settings has been studied only to a limited extent. To expand the existing knowledge on federated learning in realistic deployment settings three groups of experiments comparing the performance of a neural network-based model trained in federated manner to that of an equivalent baseline model trained on centralized data storage were designed. Experiments were conducted on the MIMIC-III dataset and modelled a binary classification problem predicting in-hospital mortality. The effect that varying amounts of data, number of computational nodes, and data distribution in the federated network had on model performance and on training and inference durations were studied. Experiments demonstrated predictive performance comparable to that of the baseline for models trained in federated settings in terms of area under the ROC and F1 scores. Data distribution across computing nodes showed minimal to no effect on model performance or on training and inference durations. However, federated model training and inference took approximately 9 and 40 times longer, respectively, than the equivalent tasks executed in centralized settings. These results indicate that federated learning is a viable solution for enabling advanced data analytics in environments regulated by strict privacy requirements.

Highlights

  • Advanced data analytics methods are revolutionizing industries by making business processes increasingly data-driven

  • A fixed number (n = 32) of PySyft workers were allocated an increasing amount of data used for model training

  • Federated learning provides a means of harnessing the power of machine learning (ML) in a regulation-compliant way and accelerating continuous learning from data generated in routine care—the Learning Healthcare System [18]–[20]

Read more

Summary

Introduction

Advanced data analytics methods are revolutionizing industries by making business processes increasingly data-driven. While the use of machine learning (ML) and artificial intelligence (AI) have reached high market penetration and scale in industries such as retail, finance, manufacturing, and education, their use in the field of healthcare is lagging. The potential for using AI in a healthcare context is highly debated and hyped [1], [2]. If this potential is to be realized, it is important to acknowledge that the healthcare sector faces challenges unlike those affecting other industries. The complexity of the healthcare landscape—influenced by medical.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call