Dual Free Adaptive Minibatch SDCA for Empirical Risk Minimization

Xi He,Martin Takáč,Rachael Tappenden

doi:10.3389/fams.2018.00033

Abstract

In this paper, we develop an adaptive dual free Stochastic Dual Coordinate Ascent (adfSDCA) algorithm for regularized empirical risk minimization problems. This is motivated by the recent work on dual free SDCA of Shalev-Shwartz (2016). The novelty of our approach is that the coordinates to update at each iteration are selected non-uniformly from an adaptive probability distribution, and this extends the previously mentioned work which only allowed for a uniform selection of ``dual coordinates from a fixed probability distribution. We describe an efficient iterative procedure for generating the non-uniform samples, where the scheme selects the coordinate with the greatest potential to decrease the sub-optimality of the current iterate. We also propose a heuristic variant of adfSDCA that is more aggressive than the standard approach. Furthermore, in order to utilize multi-core machines, we consider a mini-batch adfSDCA algorithm and develop complexity results that guarantee the algorithm's convergence. The work is concluded with several numerical experiments to demonstrate the practical benefits of the proposed approach.

Highlights

In this work we study the l2-regularized Empirical Risk Minimization (ERM) problem, which is widely used in the field of machine learning
Non-uniform Sampling Procedure Rather than using a uniform sampling of coordinates, which is the commonly used approach, here we propose the use of non-uniform sampling from an adaptive probability distribution
Reduced variance SGD methods have became very popular in the past few years, see for example [3, 7, 8, 27]. It is show in ShalevShwartz [24] that uniform dual free SDCA is an instance of a reduced variance SGD algorithm and a similar result applies to adaptive dual free Stochastic Dual Coordinate Ascent (adfSDCA) in Algorithm 1

Summary

INTRODUCTION

In this work we study the l2-regularized Empirical Risk Minimization (ERM) problem, which is widely used in the field of machine learning. Many algorithms have been proposed to solve problem (P) over the past few years, including SGD [2], SVRG and S2GD [3,4,5], and SAG/SAGA [6,7,8] Another very popular approach to solving l2-regularized ERM problems is to consider the following dual formulation max D(α) : α∈Rn. In many cases algorithms that employ non-uniform coordinate sampling outperform naïve uniform selection, and in some cases help to decrease the number of iterations needed to achieve a desired accuracy by several orders of magnitude, see for example [15, 23]. Throughout this work we let R+ denote the set of nonnegative real numbers and we let Rn+ denote the set of n-dimensional vectors with all components being real and nonnegative

Contributions

Outline

THE ADAPTIVE DUAL FREE SDCA ALGORITHM

Adaptive Dual Free SDCA as a Reduced Variance SGD Method

CONVERGENCE ANALYSIS

Case I

Case II

HEURISTIC ADFSDCA

MINI-BATCH ADFSDCA

Efficient Single Coordinate Sampling

Non-uniform Mini-Batch Sampling

3: Initialization

Mini-Batch adfSDCA Algorithm

Expected Separable Overapproximation

NUMERICAL EXPERIMENTS

Comparison for a Variety of adfSDCA Approaches

Mini-Batch adfSDCA

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in applied mathematics and statistics	Publication Date: Jul 25, 2018
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Dual Free Adaptive Minibatch SDCA for Empirical Risk Minimization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in applied mathematics and statistics

Lead the way for us

Similar Papers

New Optimization Methods for Modern Machine Learning

-

01 Jan 2017
01 Jan 2017

Adaptation for non-stationary binary sources for data compression
A Zandi ... G.G Langdon
-
A Zandi, et. al.A Zandi ... G.G Langdon
30 Oct 1995
30 Oct 1995

Asymptotic Properties of Stationary Solutions of Coupled Nonconvex Nonsmooth Empirical Risk Minimization
Zhengling Qi ... Jong-Shi Pang
Mathematics of Operations Research | VOL. 47
Zhengling Qi, et. al.Zhengling Qi ... Jong-Shi Pang
10 Nov 2021
Mathematics of Operations Research | VOL. 47

Parallel and Robust Empirical Risk Minimization via the Median Trick
Alexander Kogler ... Patrick Traxler
-
Alexander Kogler, et. al.Alexander Kogler ... Patrick Traxler
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dual Free Adaptive Minibatch SDCA for Empirical Risk Minimization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in applied mathematics and statistics