Abstract

Large-scale data science trains models for data sets containing massive numbers of samples. Training is often formulated as the solution of empirical risk minimization problems that are optimization programs whose complexity scales with the number of elements in the data set. Stochastic optimization methods overcome this challenge, but they come with their own set of limitations. This article discusses recent developments to accelerate the convergence of stochastic optimization through the exploitation of second-order information. This is achieved with stochastic variants of quasi-Newton methods that approximate the curvature of the objective function using stochastic gradient information. The reasons for why this leads to faster convergence are discussed along with the introduction of an incremental method that exploits memory to achieve a superlinear convergence rate. This is the best-known convergence rate for a stochastic optimization method. Stochastic quasi-Newton methods are applied to several problems, including prediction of the click-through rate of an advertisement displayed in response to a specific search engine query by a specific visitor. Experimental evaluations showcase reductions in overall computation time relative to stochastic gradient descent algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call