Abstract

Data-driven artificial intelligence algorithms cannot do without large amounts of training data. However, the characteristics of data privacy and decentralization make constructing large-scale training data costly, which restricts the further application of artificial intelligence algorithms in different downstream fields. To address the above problems, federated learning has gradually attracted more and more research interests in recent years, which aims to utilize decentralized private data for model training while preserving privacy. However, the non-independent and homogeneous distribution of data across devices causes federated learning to face problems such as data imbalance and label bias, which in turn affects the generalization performance of the model. The problem of data heterogeneity has become a major key challenge in federated learning, and this paper aims to explore the impact of data heterogeneity on federated learning and to synthesize recent research results in this area. By analyzing different solution approaches from the aspects of adaptive data distribution, adding regularization terms, contrastive Learning, and multi-task learning, a comprehensive overview is provided for researchers. This paper further summarizes the existing challenges of data heterogeneity in the research field of federated learning and discusses its potential development directions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call