A state-of-the-art survey on solving non-IID data in Federated Learning

Xiaodong Ma,Jia Zhu,Zhihao Lin,Shanxuan Chen,Yangjie Qin

doi:10.1016/j.future.2022.05.003

Abstract

Federated Learning (FL) proposed in recent years has received significant attention from researchers in that it can enable multiple clients to cooperatively train global models without revealing private data. This training mode protects users’ privacy without violating the supervision, and aggregates scattered data to exert great potential. However, the data samples on each participating device of FL are usually not independent and identically distributed (IID), which leads to serious statistical heterogeneity challenges for FL. In this article, we analyze and establish the definition of non-IID data problems, and put forward a series of challenges that this problem may bring to FL. We classify existing methods to solve this problem from the researcher’s entry point and subsequent sub-methods, aiming to provide a comprehensive study for solving the problem of non-IID data in FL. Our research shows that non-IID data will not only reduce the performance of the FL model, but also damage the active participation of users in the FL process. Compared with methods based on data-side sharing, enhancement, and selection, it is more common for researchers to improve federated learning algorithms from models, algorithms, and frameworks to solve non-IID problems. To the best of our knowledge, although many efforts have been made to address the problem of non-IID data, there are currently few authoritative systematic reviews in this field and are not up-to-date. In this article, we will fill the gaps in FL and provide researchers with the state-of-the-art research results to solve non-IID problems in FL and promote the further implementation of FL.

Full Text