Parallel coordinate descent methods for big data optimization

Peter Richtárik,Martin Takáč

doi:10.1007/s10107-015-0901-6

Peter Richtárik, Martin Takáč

Open Access

https://doi.org/10.1007/s10107-015-0901-6

Copy DOI

Journal: Mathematical Programming	Publication Date: Apr 12, 2015
Citations: 373	License type: CC BY 4.0

Affiliation: University of Edinburgh

Abstract

In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex function and a simple separable convex function. The theoretical speedup, as compared to the serial method, and referring to the number of iterations needed to approximately solve the problem with high probability, is a simple expression depending on the number of parallel processors and a natural and easily computable measure of separability of the smooth component of the objective function. In the worst case, when no degree of separability is present, there may be no speedup; in the best case, when the problem is separable, the speedup is equal to the number of processors. Our analysis also works in the mode when the number of blocks being updated at each iteration is random, which allows for modeling situations with busy or unreliable processors. We show that our algorithm is able to solve a LASSO problem involving a matrix with 20 billion nonzeros in 2 h on a large memory node with 24 cores.

Highlights

1.1 Big data optimizationRecently there has been a surge in interest in the design of algorithms suitable for solving convex optimization problems with a huge number of variables [12,15]
The number of iterations a Coordinate descent methodsCoordinate descent methods (CDM) requires to solve a smooth convex optimization problem is O( nL R2 ), where is the error tolerance, n is the number variables, Lis the average of the Lipschitz constants of the gradient of the objective function associated with the variables and R is the distance from the starting iterate to the set of optimal solutions
Complexity We show theoretically (Sect. 7) and numerically (Sect. 8) that parallel coordinate descent methods (PCDMs) accelerates on its serial counterpart for partially separable problems

Summary

Big data optimization

There has been a surge in interest in the design of algorithms suitable for solving convex optimization problems with a huge number of variables [12,15]. The size of problems arising in fields such as machine learning [1], network analysis [29], PDEs [27], truss topology design [16] and compressed sensing [5] usually grows with our capacity to solve them, and is projected to grow dramatically in the decade. Much of computational science is currently facing the “big data” challenge, and this work is aimed at developing optimization algorithms suitable for the task

Coordinate descent methods

Parallelization

Research idea

Minimizing a partially separable composite objective

Examples of partially separable functions

Brief literature review

Contents

Parallel block coordinate descent methods

Inner products

Smoothness of f

Strong convexity

Algorithms

Smmary of contributions

Method

Revision note requested by a reviewer

Block samplings

Technical results

Expected separable overapproximation

Nonoverlapping uniform samplings

Nice samplings

Doubly uniform samplings

Iteration complexity

Iteration complexity: convex case

Iteration complexity: strongly convex case

Numerical experiments

A LASSO problem with 1 billion variables

Progress to solving the problem

Parallelization speedup

Theory versus reality

Training linear SVMs with bad data for PCDM

L2-regularized logistic regression with good data for PCDM

10.1 ESO for a convex combination of samplings

Findings

10.2 ESO for a conic combination of functions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Parallel coordinate descent methods for big data optimization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematical Programming

Lead the way for us

Similar Papers

Random Coordinate Descent Methods for Sparse Optimization: Application to Sparse Control
Andrei Patrascu ... Ion Necoara
-
Andrei Patrascu, et. al.Andrei Patrascu ... Ion Necoara
01 May 2015
01 May 2015

Parallel and distributed random coordinate descent method for convex error bound minimization
Ion Necoara ... Rolf Findeisen
-
Ion Necoara, et. al.Ion Necoara ... Rolf Findeisen
01 Jul 2015
01 Jul 2015

A block coordinate gradient descent method for regularized convex separable optimization and covariance selection
Sangwoon Yun ... Kim-Chuan Toh
Mathematical Programming | VOL. 129
Sangwoon Yun, et. al.Sangwoon Yun ... Kim-Chuan Toh
11 Jun 2011
Mathematical Programming | VOL. 129

Block Coordinate Descent Methods for Semidefinite Programming
Zaiwen Wen ... Donald Goldfarb
-
Zaiwen Wen, et. al.Zaiwen Wen ... Donald Goldfarb
26 Sep 2011
26 Sep 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel coordinate descent methods for big data optimization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematical Programming