Abstract

The development of machine learning solutions in medicine is often hindered by difficulties associated with sharing patient data. Distributed learning aims to train machine learning models locally without requiring data sharing. However, the utility of distributed learning for rare diseases, with only a few training examples at each contributing local center, has not been investigated. The aim of this work was to simulate distributed learning models by ensembling with artificial neural networks (ANN), support vector machines (SVM), and random forests (RF) and evaluate them using four medical datasets. Distributed learning by ensembling locally trained agents improved performance compared to models trained using the data from a single institution, even in cases where only a very few training examples are available per local center. Distributed learning improved when more locally trained models were added to the ensemble. Local class imbalance reduced distributed SVM performance but did not impact distributed RF and ANN classification. Our results suggest that distributed learning by ensembling can be used to train machine learning models without sharing patient data and is suitable to use with small datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call