DONE: Distributed Approximate Newton-type Method for Federated Edge Learning

Canh T. Dinh,Nguyen H. Tran,Albert Zomaya,Wei Bao,Bing B Zhou,Amir Rezaei Balef,Tuan Dung Nguyen

doi:10.1109/tpds.2022.3146253

Abstract

There is growing interest in applying distributed machine learning to edge computing, forming <i>federated edge learning</i> . Federated edge learning faces non-i.i.d. and heterogeneous data, and the communication between edge workers, possibly through distant locations and with unstable wireless networks, is more costly than their local computational overhead. In this work, we propose <inline-formula><tex-math notation="LaTeX">${{\sf DONE}}$</tex-math></inline-formula> , a distributed approximate Newton-type algorithm with fast convergence rate for communication-efficient federated edge learning. First, with strongly convex and smooth loss functions, <inline-formula><tex-math notation="LaTeX">${{\sf DONE}}$</tex-math></inline-formula> approximates the Newton direction in a distributed manner using the classical Richardson iteration on each edge worker. Second, we prove that <inline-formula><tex-math notation="LaTeX">${{\sf DONE}}$</tex-math></inline-formula> has linear-quadratic convergence and analyze its communication complexities. Finally, the experimental results with non-i.i.d. and heterogeneous data show that <inline-formula><tex-math notation="LaTeX">${{\sf DONE}}$</tex-math></inline-formula> attains a comparable performance to Newton's method. Notably, <inline-formula><tex-math notation="LaTeX">${{\sf DONE}}$</tex-math></inline-formula> requires fewer communication iterations compared to distributed gradient descent and outperforms DANE, FEDL, and GIANT, state-of-the-art approaches, in the case of non-quadratic loss functions.

Full Text