Abstract

We study the problem of recovery of matrices that are simultaneously low rank and row and/or column sparse. Such matrices appear in recent applications in cognitive neuroscience, imaging, computer vision, macroeconomics, and genetics. We propose a GDT (Gradient Descent with hard Thresholding) algorithm to efficiently recover matrices with such structure, by minimizing a bi-convex function over a nonconvex set of constraints. We show linear convergence of the iterates obtained by GDT to a region within statistical error of an optimal solution. As an application of our method, we consider multi-task learning problems and show that the statistical error rate obtained by GDT is near optimal compared to minimax rate. Experiments demonstrate competitive performance and much faster running speed compared to existing methods, on both simulations and real data sets.

Highlights

  • Many problems in machine learning, statistics and signal processing can be formulated as optimization problems with a smooth objective and nonconvex constraints

  • We show that the statistical error nearly matches the optimal minimax rate, while the algorithm achieves the best performance in terms of estimation and prediction error in simulations

  • We proposed a new gradient descent with hard thresholding (GDT) algorithm to efficiently solve for optimization problem with simultaneous low rank and row and/or column sparsity structure on the coefficient matrix

Read more

Summary

Introduction

Many problems in machine learning, statistics and signal processing can be formulated as optimization problems with a smooth objective and nonconvex constraints. Compared to the existing work for optimization over low rank matrices with (alternating) gradient descent, we need to study a projection onto a nonconvex set in each iteration, which in our case is a hard-thresholding operation, that requires delicate analysis and novel theory. Our algorithm does not require a new independent sample in each iteration and allows for non-Gaussian errors, while at the same time achieves nearly optimal error rate compared to the information theoretic minimax lower bound for the problem. Our proposed algorithm can be applied to the regression step of any MTRL algorithm (we chose Fitted Q-iteration (FQI) for presentation purposes) to solve for the optimal policies for MDPs. Compared to [26] which uses convex relaxation, our algorithm is much more efficient in high dimensions

Related work
Organization of the paper
Gradient descent with hard thresholding
2: Parameters
11: Output
Theoretical result
Regularity conditions
Main result
Proof sketch of Theorem 1
Application to multi-task learning
GDT for multi-task learning
Application to multi-task reinforcement learning
Synthetic datasets
Norwegian paper quality dataset
Calcium imaging data
Proof of Lemma 2
Proof of Lemma 3
Proof of Lemma 5
USU VSV
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call