Federated Learning in Healthcare: A Benchmark Comparison of Engineering and Statistical Approaches for Structured Data Analysis.
Background: Federated learning (FL) holds promise for safeguarding data privacy in healthcare collaborations. While the term "FL" was originally coined by the engineering community, the statistical field has also developed privacy-preserving algorithms, though these are less recognized. Our goal was to bridge this gap with the first comprehensive comparison of FL frameworks from both domains. Methods: We assessed 7 FL frameworks, encompassing both engineering-based and statistical FL algorithms, and compared them against local and centralized modeling of logistic regression and least absolute shrinkage and selection operator (Lasso). Our evaluation utilized both simulated data and real-world emergency department data, focusing on comparing both estimated model coefficients and the performance of model predictions. Results: The findings reveal that statistical FL algorithms produce much less biased estimates of model coefficients. Conversely, engineering-based methods can yield models with slightly better prediction performance, occasionally outperforming both centralized and statistical FL models. Conclusion: This study underscores the relative strengths and weaknesses of both types of methods, providing recommendations for their selection based on distinct study characteristics. Furthermore, we emphasize the critical need to raise awareness of and integrate these methods into future applications of FL within the healthcare domain.
8
- 10.1016/j.jbi.2023.104485
- Sep 1, 2023
- Journal of Biomedical Informatics
9
- 10.1093/biostatistics/kxaa031
- Sep 10, 2020
- Biostatistics
32
- 10.1080/24754269.2021.1974158
- Sep 14, 2021
- Statistical Theory and Related Fields
517
- 10.1057/palgrave.jors.2600425
- Mar 1, 1997
- Journal of the Operational Research Society
73
- 10.1371/journal.pdig.0000033
- May 19, 2022
- PLOS Digital Health
176
- 10.1016/j.radonc.2016.10.002
- Oct 28, 2016
- Radiotherapy and Oncology
173
- 10.1136/amiajnl-2012-000862
- Jan 1, 2012
- Journal of the American Medical Informatics Association : JAMIA
25
- 10.1145/3510540
- Jun 28, 2022
- ACM Transactions on Intelligent Systems and Technology
35
- 10.1038/s41597-022-01782-9
- Oct 27, 2022
- Scientific data
- Research Article
- 10.1038/s44401-025-00035-2
- Aug 12, 2025
- npj Health Systems
FairFML: fair federated machine learning with a case study on reducing gender disparities in cardiac arrest outcome prediction
- Preprint Article
- 10.2196/preprints.74202
- Mar 22, 2025
BACKGROUND The current landscape of Emergency Care (EC) is marked by high demand leading to issues such as Emergency Department boarding, overcrowding and subsequent delays that impact the quality and safety of patient care. Integrating data science into EC can enhance decision-making with predictive, preventative, personalized, and participatory approaches. However, gaps in adherence to fairness, accountability, interpretability, and responsibility are evident, particularly due to barriers in data-sharing, which often result in a lack of transparency and robust oversight in these applications. OBJECTIVE The Fair, Accountable, Interpretable and Responsible (FAIR)-EC collaboration adapts the existing FAIR principles to address emerging challenges as data science integrates with EC. This initiative aims to transform EC by establishing ethical artificial intelligence (AI) standards specifically tailored for this integration. By bridging the gap between EC professionals, data scientists and other stakeholders, the collaboration promotes international cooperation that leverages advanced data science techniques to enhance EC outcomes across different care settings. METHODS We propose a federated research design that enables analyses of extensive datasets from various global institutions without compromising patient privacy. This approach transforms epidemiological research with advanced data science techniques, emphasizing the harmonization of data for comprehensive analyses across different healthcare systems. RESULTS The FAIR-EC initiative has facilitated the collection and analysis of datasets from diverse geographical regions, enabling the examination of regional variations in EC practices. Initial projects have demonstrated promising outcomes, including the successful development of a federated scoring system and the adaptation of association studies and predictive models across various regions. These efforts highlight the feasibility of leveraging advanced data science techniques to address the complexities of EC while preserving patient privacy. CONCLUSIONS FAIR-EC integrates data science ethically and effectively into EC, addressing challenges like fragmented data, real-time handoffs, and public health crises. Its federated design harmonizes diverse data streams while preserving privacy, and its emphasis on ethical AI aligns with the dynamic nature of EC. Despite challenges in data variability and system complexity, FAIR-EC establishes a strong foundation for innovation in global EC.
- Research Article
- 10.1371/journal.pdig.0001008
- Sep 18, 2025
- PLOS Digital Health
Patients with substance misuse who are admitted to the hospital are at heightened risk for adverse outcomes, such as readmission and death. This study aims to develop methods to identify at-risk patients to facilitate timely interventions that can improve outcomes and optimize healthcare resources. To accomplish this, we leveraged the Substance Misuse Data Commons to predict 30-day death or readmission from hospital discharge in patients with substance misuse. We explored several machine learning algorithms and approaches to integrate information from multiple data sources, such as structured features from a patient’s electronic health record (EHR), unstructured clinical notes, socioeconomic data, and emergency medical services (EMS) data. Our gradient-boosted machine model, which combined structured EHR data, socioeconomic status, and EMS data, was the best-performing model (c-statistic 0.746 [95% CI: 0.732-0.759]), outperforming other machine learning methods and structured data source combinations. The addition of unstructured text did not improve performance, suggesting a need for further exploration of how to leverage unstructured data effectively. Feature importance plots highlighted the importance of prior hospital and EMS encounters and discharge disposition in predicting our primary outcome. In conclusion, we integrated multiple data sources that offer complementary information from data sources beyond the typically used EHRs for risk assessment in patients with substance misuse.
- Research Article
- 10.1016/j.compbiomed.2025.111084
- Oct 1, 2025
- Computers in biology and medicine
Developing federated time-to-event scores using heterogeneous real-world survival data.
- Research Article
1
- 10.1016/j.jbi.2025.104780
- May 1, 2025
- Journal of biomedical informatics
FedIMPUTE: Privacy-preserving missing value imputation for multi-site heterogeneous electronic health records.
- Research Article
2
- 10.1016/j.ophtha.2024.10.017
- Apr 1, 2025
- Ophthalmology
Privacy Preserving Technology using Federated Learning and Blockchain in protecting against Adversarial Attacks for Retinal Imaging
- Research Article
1167
- 10.1109/twc.2020.3024629
- Oct 2, 2020
- IEEE Transactions on Wireless Communications
In this article, the problem of training federated learning (FL) algorithms over a realistic wireless network is studied. In the considered model, wireless users execute an FL algorithm while training their local FL models using their own data and transmitting the trained local FL models to a base station (BS) that generates a global FL model and sends the model back to the users. Since all training parameters are transmitted over wireless links, the quality of training is affected by wireless factors such as packet errors and the availability of wireless resources. Meanwhile, due to the limited wireless bandwidth, the BS needs to select an appropriate subset of users to execute the FL algorithm so as to build a global FL model accurately. This joint learning, wireless resource allocation, and user selection problem is formulated as an optimization problem whose goal is to minimize an FL loss function that captures the performance of the FL algorithm. To seek the solution, a closed-form expression for the expected convergence rate of the FL algorithm is first derived to quantify the impact of wireless factors on FL. Then, based on the expected convergence rate of the FL algorithm, the optimal transmit power for each user is derived, under a given user selection and uplink resource block (RB) allocation scheme. Finally, the user selection and uplink RB allocation is optimized so as to minimize the FL loss function. Simulation results show that the proposed joint federated learning and communication framework can improve the identification accuracy by up to 1.4%, 3.5% and 4.1%, respectively, compared to: 1) An optimal user selection algorithm with random resource allocation, 2) a standard FL algorithm with random user selection and resource allocation, and 3) a wireless optimization algorithm that minimizes the sum packet error rates of all users while being agnostic to the FL parameters.
- Conference Article
18
- 10.1109/spawc48557.2020.9154266
- May 1, 2020
Federated learning (FL) has emerged as a key technology for enabling next-generation AI at scale. The classical FL systems use single-hop cellular links to deliver the local models from mobile workers to edge routers that then reach the remote cloud servers via high-speed Internet core for global model averaging. Due to the cost-efficiency, wireless multi-hop networks have been widely exploited to build communication backbones. Therefore, enabling FL over wireless multi-hop networks can make it accessible in a low-cost manner to everyone (e.g., under-developed areas and disaster sites). Wireless multi-hop FL, however, suffers from profound communication constraints including noisy and interference-rich wireless links, which results in slow and nomadic FL model updates. To address this, we suggest novel machine learning-enabled wireless multi-hop FL framework, namely FedAir, that can greatly mitigate the adverse impact of wireless communications on FL performance metrics such as model convergence time. This will allow us to fast prototype, deploy, and evaluate FL algorithms over ML-enabled, programmable wireless router (ML-router). The experiments on the deployed testbed validate and show that wireless multi-hop FL framework can greatly accelerate the runtime convergence speed of the de-facto FL algorithm, FedAvg.
- Conference Article
64
- 10.1109/icc40277.2020.9148815
- Jun 1, 2020
In this paper, the convergence time of federated learning (FL), when deployed over a realistic wireless network, is studied. In particular, with the considered model, wireless users transmit their local FL models (trained using their locally collected data) to a base station (BS). The BS, acting as a central controller, generates a global FL model using the received local FL models and broadcasts it back to all users. Due to the limited number of resource blocks (RBs) in a wireless network, only a subset of users can be selected and transmit their local FL model parameters to the BS at each learning step. Meanwhile, since each user has unique training data samples and the BS must wait to receive all users' local FL models to generate the global FL model, the FL performance and convergence time will be significantly affected by the user selection scheme. In consequence, it is necessary to design an appropriate user selection scheme that enables all users to execute an FL scheme and efficiently train it. This joint learning, wireless resource allocation, and user selection problem is formulated as an optimization problem whose goal is to minimize the FL convergence time while optimizing the FL performance. To address this problem, a probabilistic user selection scheme is proposed using which the BS will connect to the users, whose local FL models have large effects on its global FL model, with high probabilities. Given the user selection policy, the uplink RB allocation can be determined. To further reduce the FL convergence time, artificial neural networks (ANNs) are used to estimate the local FL models of the users that are not allocated any RBs for local FL model transmission, which enables the BS to include more users' local FL models to generate the global FL model so as to improve the FL convergence speed and performance. Simulation results show that the proposed ANN-based FL scheme can reduce the FL convergence time by up to 53.8%, compared to a standard FL algorithm.
- Conference Article
10
- 10.1109/spawc48557.2020.9154300
- May 1, 2020
In this paper, the problem of training federated learning (FL) algorithms over a wireless network with mobile users is studied. In the considered model, several mobile users and a network base station (BS) cooperatively perform an FL algorithm. In particular, the wireless mobile users train their local FL models and send the trained local FL model parameters to the BS. The BS will then integrate the received local FL models to generate a global FL model and send it back to all users. Due to the limited training time at each iteration, the number of users that can transmit their local FL models to the BS will be affected by changes in the users’ locations and wireless channels. In this paper, this joint learning, user selection, and wireless resource allocation problem is formulated as an optimization problem whose goal is to minimize the FL loss function, which captures the FL performance, while meeting the transmission delay requirement. To solve this problem, a closed-form expression for the expected convergence rate of the FL algorithm is first derived to quantify the impact of the users’ mobility and wireless factors on FL. Then, based on the expected FL convergence rate, the user selection and uplink resource allocation is optimized at each FL iteration so as to minimize the FL loss function while satisfying the FL parameter transmission delay requirement. Simulation results show that the proposed approach can reduce the FL loss function value by up to 20% compared to a standard FL algorithm.
- Conference Article
91
- 10.1109/globecom38437.2019.9013160
- Dec 1, 2019
In this paper, the problem of training federated learning (FL) algorithms over a realistic wireless network is studied. In particular, in the considered model, wireless users perform an FL algorithm that trains their local FL models using their own data and send the trained local FL models to a base station (BS) that will generate a global FL model and send it back to the users. Since all training parameters are transmitted over wireless links, the quality of the training will be affected by wireless factors such as packet errors and availability of wireless resources. Meanwhile, due to the limited wireless bandwidth, the BS must select an appropriate subset of users to execute the FL learning algorithm so as to build a global FL model accurately. This joint learning, wireless resource allocation, and user selection problem is formulated as an optimization problem whose goal is to minimize an FL loss function that captures the performance of the FL algorithm. To address this problem, a closed-form expression for the expected convergence rate of the FL algorithm is first derived to quantify the impact of wireless factors on FL. Then, based on the expected convergence rate of the FL algorithm, the optimal transmit power for each user is derived, under a given user selection and uplink resource block (RB) allocation scheme. Finally, the user selection and uplink RB allocation is optimized so as to minimize the FL loss function. Simulation results show that the proposed joint federated learning and communication framework can reduce the FL loss function value by up to 10% and 16%, respectively, compared to 1) an optimal user selection algorithm with random resource allocation and 2) a random user selection and resource allocation algorithm.
- Research Article
283
- 10.1109/twc.2020.3042530
- Dec 11, 2020
- IEEE Transactions on Wireless Communications
In this paper, the convergence time of federated learning (FL), when deployed over a realistic wireless network, is studied. In particular, a wireless network is considered in which wireless users transmit their local FL models (trained using their locally collected data) to a base station (BS). The BS, acting as a central controller, generates a global FL model using the received local FL models and broadcasts it back to all users. Due to the limited number of resource blocks (RBs) in a wireless network, only a subset of users can be selected to transmit their local FL model parameters to the BS at each learning step. Moreover, since each user has unique training data samples, the BS prefers to include all local user FL models to generate a converged global FL model. Hence, the FL training loss and convergence time will be significantly affected by the user selection scheme. Therefore, it is necessary to design an appropriate user selection scheme that can select the users who can contribute toward improving the FL convergence speed more frequently. This joint learning, wireless resource allocation, and user selection problem is formulated as an optimization problem whose goal is to minimize the FL convergence time and the FL training loss. To solve this problem, a probabilistic user selection scheme is proposed such that the BS is connected to the users whose local FL models have significant effects on the global FL model with high probabilities. Given the user selection policy, the uplink RB allocation can be determined. To further reduce the FL convergence time, artificial neural networks (ANNs) are used to estimate the local FL models of the users that are not allocated any RBs for local FL model transmission at each given learning step, which enables the BS to improve the global model, the FL convergence speed, and the training loss. Simulation results show that the proposed approach can reduce the FL convergence time by up to 56% and improve the accuracy of identifying handwritten digits by up to 3%, compared to a standard FL algorithm.
- Book Chapter
55
- 10.1007/978-3-030-70604-3_6
- Jan 1, 2021
In the medical or healthcare industry, where, the already available information or data is never sufficient, excellence can be performed with the help of Federated Learning (FL) by empowering AI models to learn on private data without conceding privacy. It opened the door for ample research because of its high level of communication efficiency which is linked with distributed training problems. The primary objective of the chapter is to highlight the adaptability and working of the FL techniques in the healthcare system especially in drug development, clinical diagnosis, digital health monitoring, and various disease predictions and detection system. The first section of the chapter is comprised of a background study on an FL framework for healthcare, FL working model in healthcare, and various important benefits of FL. The next section of the chapter described the reported work which highlights different research works in the field of electronic health record systems, drug discovery, and disease prediction systems using the FL model. The final section of the chapter presented the comparative analysis, which shows the comparison between different FL algorithms for different health sectors by using parameters such as accuracy, the area under the curve, precision, recall, and F-score.KeywordsFederated learningElectronic health recordDrug discoveryMedical imagingDisease prediction
- Conference Article
8
- 10.1109/dcas57389.2023.10130231
- Apr 14, 2023
In modern devices, such as smartphones and IoT, AI hardware implements different ML models to train massive amounts of data for various applications. However, based on the sensitivity of this data, privacy and security concerns, or both, may restrict users from accessing the data storage to conduct the ML training using conventional methods. Federated learning (FL) is consequently emerged to maintain training data distribution among smart mobile devices while aggregating locally processed updates. In addition to improving the model training performance as the number of clients rises, FL also creates a privacy-preserved shared data model. FL execution, however, can be time-consuming. This study proposes a parallelized approach to enhance the performance and privacy of the FL algorithm (FedAvg). In this regard, the FedAvg algorithm is expanded to our proposed model, distributed FedAvg (D-FedAvg), which enables several clients to collaborate concurrently and train a single learning model. To evaluate the performance of the proposed model, we investigated the impact of various numbers of clients and training rounds on the result. To validate our findings, we conducted extensive experiments using MNIST datasets trained using two federated learning models, MLP and CNN. The results show our D-FedAvg can maintain data privacy and dramatically enhance the execution time of FL compared to traditional FedAvg. According to the study, the FL framework's time complexity increases as the number of clients and rounds increases, but parallelization allows it to operate 2–3 times faster than usual.
- Research Article
16
- 10.1109/jstsp.2022.3223498
- Jan 1, 2023
- IEEE Journal of Selected Topics in Signal Processing
Wireless federated learning (FL) is a collaborative machine learning (ML) framework in which wireless client-devices independently train their ML models and send the locally trained models to the FL server for aggregation. In this paper, we consider the coexistence of privacy-sensitive client-devices and privacy-insensitive yet computing-resource constrained client-devices, and propose an FL framework with a hybrid centralized training and local training. Specifically, the privacy-sensitive client-devices perform local ML model training and send their local models to the FL server. Each privacy-insensitive client-device can have two options, i.e., (i) conducting a local training and then sending its local model to the FL server, and (ii) directly sending its local data to the FL server for the centralized training. The FL server, after collecting the data from the privacy-insensitive client-devices (which choose to upload the local data), conducts a centralized training with the received datasets. The global model is then generated by aggregating (i) the local models uploaded by the client-devices and (ii) the model trained by the FL server centrally. Focusing on this hybrid FL framework, we firstly analyze its convergence feature with respect to the client-devices' selections of local training or centralized training. We then formulate a joint optimization of client-devices' selections of the local training or centralized training, the FL training configuration (i.e., the number of the local iterations and the number of the global iterations), and the bandwidth allocations to the client-devices, with the objective of minimizing the overall latency for reaching the FL convergence. Despite the non-convexity of the joint optimization problem, we identify its layered structure and propose an efficient algorithm to solve it. Numerical results demonstrate the advantage of our proposed FL framework with the hybrid local and centralized training as well as our proposed algorithm, in comparison with several benchmark FL schemes and algorithms.
- Conference Article
3
- 10.1109/iwcmc55113.2022.9825004
- May 30, 2022
Federated Learning (FL) is one of the hot research topics, and it utilizes Machine Learning (ML) in a distributed manner without directly accessing private data on clients. How-ever, FL faces many challenges, including the difficulty to obtain high accuracy, high communication cost between clients and the server, and security attacks related to adversarial ML. To tackle these three challenges, we propose an FL algorithm inspired by evolutionary techniques. The proposed algorithm groups clients randomly in many clusters, each with a model selected randomly to explore the performance of different models. The clusters are then trained in a repetitive process where the worst performing cluster is removed in each iteration until one cluster remains. In each iteration, some clients are expelled from clusters either due to using poisoned data or low performance. The surviving clients are exploited in the next iteration. The remaining cluster with surviving clients is then used for training the best FL model (i.e., remaining FL model). Communication cost is reduced since fewer clients are used in the final training of the FL model. To evaluate the performance of the proposed algorithm, we conduct a number of experiments using FEMNIST dataset and compare the result against the random FL algorithm. The experimental results show that the proposed algorithm outperforms the baseline algorithm in terms of accuracy, communication cost, and security.
- Research Article
20
- 10.1016/j.future.2023.10.013
- Oct 31, 2023
- Future Generation Computer Systems
FederatedTrust: A solution for trustworthy federated learning
- Research Article
26
- 10.1109/tmc.2022.3216837
- Jan 1, 2024
- IEEE Transactions on Mobile Computing
Federated learning (FL) enables collaborative model training without centralizing data. However, the traditional FL framework is cloud-based and suffers from high communication latency. On the other hand, the edge-based FL framework that relies on an edge server co-located with mobile base station for model aggregation has low communication latency but suffers from degraded model accuracy due to the limited coverage of edge server. In light of high-accuracy but high-latency cloud-based FL and low-latency but low-accuracy edge-based FL, this paper proposes a new FL framework based on cooperative mobile edge networking called cooperative federated edge learning (CFEL) to enable both high-accuracy and low-latency distributed intelligence at mobile edge networks. Considering the unique two-tier network architecture of CFEL, a novel federated optimization method dubbed cooperative edge-based federated averaging (CE-FedAvg) is further developed, wherein each edge server both coordinates collaborative model training among the devices within its own coverage and cooperates with other edge servers to learn a shared global model through decentralized consensus. Experimental results based on benchmark datasets show that CFEL can largely reduce the training time to achieve a target model accuracy compared with prior FL frameworks.
- Conference Article
21
- 10.1109/icc45855.2022.9838362
- May 16, 2022
Deploying federated learning (FL) over wireless networks with resource-constrained devices requires balancing between accuracy, energy efficiency, and precision. Prior art on FL often requires devices to train deep neural networks (DNNs) using a 32-bit precision level for data representation to improve accuracy. However, such algorithms are impractical for resource-constrained devices since DNNs could require execution of millions of operations. Thus, training DNNs with a high precision level incurs a high energy cost for FL. In this paper, a quantized FL framework, that represents data with a finite level of precision in both local training and uplink transmission, is proposed. Here, the finite level of precision is captured through the use of quantized neural networks (QNNs) that quantize weights and activations in fixed-precision format. In the considered FL model, each device trains its QNN and transmits a quantized training result to the base station. Energy models for the local training and the transmission with the quantization are rigorously derived. An energy minimization problem is formulated with respect to the level of precision while ensuring convergence. To solve the problem, we first analytically derive the FL convergence rate and use a line search method. Simulation results show that our FL framework can reduce energy consumption by up to 53% compared to a standard FL model. The results also shed light on the tradeoff between precision, energy, and accuracy in FL over wireless networks.
- Research Article
4
- 10.1016/j.iot.2022.100638
- Nov 1, 2022
- Internet of Things
FLAGS framework for comparative analysis of Federated Learning algorithms
- Research Article
- 10.34133/hds.0377
- Oct 13, 2025
- Health Data Science
- Discussion
- 10.34133/hds.0339
- Jul 21, 2025
- Health Data Science
- Research Article
- 10.34133/hds.0322
- Jun 18, 2025
- Health Data Science
- Research Article
- 10.34133/hds.0325
- Jun 17, 2025
- Health Data Science
- Supplementary Content
- 10.34133/hds.0321
- Jun 12, 2025
- Health Data Science
- Research Article
- 10.34133/hds.0284
- Apr 30, 2025
- Health data science
- Research Article
- 10.34133/hds.0280
- Apr 15, 2025
- Health Data Science
- Research Article
- 10.34133/hds.0143
- Jan 1, 2025
- Health Data Science
- Research Article
- 10.34133/hds.0161
- Jan 1, 2025
- Health Data Science
- Research Article
- 10.34133/hds.0151
- Jan 1, 2025
- Health data science
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.