Asynchronous Dual Free Stochastic Dual Coordinate Ascent for Distributed Data Mining

Zhouyuan Huo,Xue Jiang,Heng Huang

doi:10.1109/icdm.2018.00032

Abstract

The primal-dual distributed computational methods have broad large-scale data mining applications. Previous primal-dual distributed methods are not applicable when the dual formulation is not available, e.g. the sum-of-non-convex objectives. Moreover, these algorithms and theoretical analysis are based on the fundamental assumption that the computing speeds of multiple machines in a cluster are similar. However, the straggler problem is an unavoidable practical issue in the distributed system because of the existence of slow machines. Therefore, the total computational time of the distributed optimization methods is highly dependent on the slowest machine. In this paper, we address these two issues by proposing novel distributed asynchronous dual free stochastic dual coordinate ascent algorithm for distributed data mining. Our method does not need the dual formulation of the target problem in the computation. We tackle the straggler problem through asynchronous communication and the negative effect of slow machines is significantly alleviated. We also analyze the convergence rate of our method and prove the linear convergence rate even if the individual functions in objective are non-convex. Experiments on both convex and nonconvex loss functions are used to validate our statements.

Full Text