Abstract

In this paper, we develop an adaptive dual free Stochastic Dual Coordinate Ascent (adfSDCA) algorithm for regularized empirical risk minimization problems. This is motivated by the recent work on dual free SDCA of Shalev-Shwartz (2016). The novelty of our approach is that the coordinates to update at each iteration are selected non-uniformly from an adaptive probability distribution, and this extends the previously mentioned work which only allowed for a uniform selection of ``dual coordinates from a fixed probability distribution. We describe an efficient iterative procedure for generating the non-uniform samples, where the scheme selects the coordinate with the greatest potential to decrease the sub-optimality of the current iterate. We also propose a heuristic variant of adfSDCA that is more aggressive than the standard approach. Furthermore, in order to utilize multi-core machines, we consider a mini-batch adfSDCA algorithm and develop complexity results that guarantee the algorithm's convergence. The work is concluded with several numerical experiments to demonstrate the practical benefits of the proposed approach.

Highlights

  • In this work we study the l2-regularized Empirical Risk Minimization (ERM) problem, which is widely used in the field of machine learning

  • Non-uniform Sampling Procedure Rather than using a uniform sampling of coordinates, which is the commonly used approach, here we propose the use of non-uniform sampling from an adaptive probability distribution

  • Reduced variance SGD methods have became very popular in the past few years, see for example [3, 7, 8, 27]. It is show in ShalevShwartz [24] that uniform dual free SDCA is an instance of a reduced variance SGD algorithm and a similar result applies to adaptive dual free Stochastic Dual Coordinate Ascent (adfSDCA) in Algorithm 1

Read more

Summary

INTRODUCTION

In this work we study the l2-regularized Empirical Risk Minimization (ERM) problem, which is widely used in the field of machine learning. Many algorithms have been proposed to solve problem (P) over the past few years, including SGD [2], SVRG and S2GD [3,4,5], and SAG/SAGA [6,7,8] Another very popular approach to solving l2-regularized ERM problems is to consider the following dual formulation max D(α) : α∈Rn. In many cases algorithms that employ non-uniform coordinate sampling outperform naïve uniform selection, and in some cases help to decrease the number of iterations needed to achieve a desired accuracy by several orders of magnitude, see for example [15, 23]. Throughout this work we let R+ denote the set of nonnegative real numbers and we let Rn+ denote the set of n-dimensional vectors with all components being real and nonnegative

Contributions
Outline
THE ADAPTIVE DUAL FREE SDCA ALGORITHM
Adaptive Dual Free SDCA as a Reduced Variance SGD Method
CONVERGENCE ANALYSIS
Case I
Case II
HEURISTIC ADFSDCA
MINI-BATCH ADFSDCA
Efficient Single Coordinate Sampling
Non-uniform Mini-Batch Sampling
3: Initialization
Mini-Batch adfSDCA Algorithm
Expected Separable Overapproximation
NUMERICAL EXPERIMENTS
Comparison for a Variety of adfSDCA Approaches
Mini-Batch adfSDCA
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call