On the Optimal Tradeoff Between Computational Efficiency and Generalizability of Oja’s Algorithm

Xiangxiang Xu,Shao-Lun Huang

doi:10.1109/access.2020.2998825

Abstract

The Oja's algorithm is widely applied for computing principal eigenvectors in real problems, and it is practically useful to understand the theoretical relationships between the learning rate, convergence rate, and generalization error of this algorithm for noisy samples. In this paper, we show that under mild assumptions of sampling noise, both the generalization error and the convergence rate reveal linear relationships with the learning rate in the large sample size and small learning rate regime. In addition, when the algorithm nearly converges, we provide a refined characterization of the generalization error, which suggests the optimal design for the learning rate for data with noise. Moreover, we investigate the minibatch variation of Oja's algorithm and demonstrate that the learning rate of minibatch training is decayed by a factor characterized by the batch size, which provides theoretical insights and guidance for designing the learning rate in minibatch training algorithms. Finally, our theoretical results are validated by experiments on both synthesized data and the MNIST dataset.

Highlights

Understanding the fundamental correlations between the learning rate, convergence rate, and generalization error of machine learning algorithms is an important issue in designing effective and efficient algorithms for real applications [1]
While the top eigenvector of matrices can be efficiently computed by the well-known power iteration algorithm [6], in practice the objective matrices are typically estimated by a small batch of data samples, which leads to an estimation noise at each
We study the asymptotic tradeoff between the learning rate, generalization errors, and convergence rate of Oja’s algorithm in the large sample size and small learning rate regime, by only requiring the variance of the estimation noises to be finite

Summary

Introduction

Understanding the fundamental correlations between the learning rate, convergence rate, and generalization error of machine learning algorithms is an important issue in designing effective and efficient algorithms for real applications [1]. We investigate this issue in computing the top eigenvector of a positive-semidefinite matrix A. Such problems appear in many machine learning scenarios, including the streaming principal component analysis (PCA) [2], canonical correlation analysis (CCA) [3], and recently the HGR maximal correlation problem [4], [5]. The learning rate η in the Oja’s algorithm controls the magnitude of the updating steps for estimated eigenvector φn at each iteration, which essentially improves the generalizability of the algorithm by trading with the convergence rate of the algorithm. Understanding the fundamental structure of this tradeoff is critical for designing efficient algorithms

Objectives

Methods

Results