Averaging Is Probably Not the Optimum Way of Aggregating Parameters in Federated Learning

Peng Xiao,Vladimir Stankovic,Samuel Cheng,Dejan Vukobratovic

doi:10.3390/e22030314

Abstract

Federated learning is a decentralized topology of deep learning, that trains a shared model through data distributed among each client (like mobile phones, wearable devices), in order to ensure data privacy by avoiding raw data exposed in data center (server). After each client computes a new model parameter by stochastic gradient descent (SGD) based on their own local data, these locally-computed parameters will be aggregated to generate an updated global model. Many current state-of-the-art studies aggregate different client-computed parameters by averaging them, but none theoretically explains why averaging parameters is a good approach. In this paper, we treat each client computed parameter as a random vector because of the stochastic properties of SGD, and estimate mutual information between two client computed parameters at different training phases using two methods in two learning tasks. The results confirm the correlation between different clients and show an increasing trend of mutual information with training iteration. However, when we further compute the distance between client computed parameters, we find that parameters are getting more correlated while not getting closer. This phenomenon suggests that averaging parameters may not be the optimum way of aggregating trained parameters.

Highlights

Nowadays, more and more intelligent devices, such as smart phones, wearable devices and autonomous vehicles, are widely used [1,2], generating a wealth of data
This study explores the correlation between clients’ model parameters in federated learning
Through estimating the Mutual information (MI) between different client computed parameters in two learning tasks by two methods, we confirm the existence of correlation between different clients

Summary

Introduction

More and more intelligent devices, such as smart phones, wearable devices and autonomous vehicles, are widely used [1,2], generating a wealth of data. The generated data can be used to develop deep learning models powering applications such as speech recognition, face detection and text entry. With the increasing amount of generated data and increasing computing power of the smart devices [3], recent studies have explored distributed training of models at these edging devices [4,5]. Federated learning [7] can be viewed as an extension of conventional distributed deep learning [8], as it aims to train a high quality shared model while keeping data distributed over clients. Each client computes an updated model based on his/her own locally collected data (that is not shared with others)

Objectives

Results

Conclusion