Abstract
Federated Learning (FL) has emerged as a promising paradigm for privacy-preserving collaborative machine learning. However, the challenge of data imbalance, exacerbated by the non-IID nature of distributed datasets, significantly impacts model performance and fairness in FL systems. This paper investigates the implementation and evaluation of simple resampling techniques to address data imbalance within the FL framework. Using the MIMIC-III healthcare dataset, a simulated FL environment with ten virtual clients was created to test various resampling methods: SMOTE, random undersampling, and a hybrid approach. The study employed logistic regression models and evaluated performance using common and novel FL-specific metrics. Experimental results demonstrate that the hybrid resampling technique significantly outperforms other methods, improving the F1-score by 13.1% and reducing communication rounds by 25.3%. Statistical analyses, including repeated measures ANOVA and hierarchical linear modeling, confirm the robustness of these findings across varied client data distributions. This research provides a replicable framework for addressing data imbalance in FL, contributing to enhanced model fairness and efficiency in privacy-sensitive applications.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.