A Random Algorithm for Low-Rank Decomposition of Large-Scale Matrices With Missing Entries.

Yiguang Liu Yiguang Liu,Wenzheng Xu Wenzheng Xu,Yifei Pu Yifei Pu,Chunguang Li Chunguang Li,Yinjie Lei Yinjie Lei

doi:10.1109/tip.2015.2458176

Abstract

A random submatrix method (RSM) is proposed to calculate the low-rank decomposition U(m×r)V(n×r)(T) (r < m, n) of the matrix Y∈R(m×n) (assuming m > n generally) with known entry percentage 0 < ρ ≤ 1. RSM is very fast as only O(mr(2)ρ(r)) or O(n(3)ρ(3r)) floating-point operations (flops) are required, compared favorably with O(mnr+r(2)(m+n)) flops required by the state-of-the-art algorithms. Meanwhile, RSM has the advantage of a small memory requirement as only max(n(2),mr+nr) real values need to be saved. With the assumption that known entries are uniformly distributed in Y, submatrices formed by known entries are randomly selected from Y with statistical size k×nρ(k) or mρ(l)×l , where k or l takes r+1 usually. We propose and prove a theorem, under random noises the probability that the subspace associated with a smaller singular value will turn into the space associated to anyone of the r largest singular values is smaller. Based on the theorem, the nρ(k)-k null vectors or the l-r right singular vectors associated with the minor singular values are calculated for each submatrix. The vectors ought to be the null vectors of the submatrix formed by the chosen nρ(k) or l columns of the ground truth of V(T). If enough submatrices are randomly chosen, V and U can be estimated accordingly. The experimental results on random synthetic matrices with sizes such as 13 1072 ×10(24) and on real data sets such as dinosaur indicate that RSM is 4.30 ∼ 197.95 times faster than the state-of-the-art algorithms. It, meanwhile, has considerable high precision achieving or approximating to the best.

Full Text