Sparse factorization of square matrices with application to neural attention modeling

Ruslan Khalitov,Tong Yu,Lei Cheng,Zhirong Yang

doi:10.1016/j.neunet.2022.04.014

Abstract

Square matrices appear in many machine learning problems and models. Optimization over a large square matrix is expensive in memory and in time. Therefore an economic approximation is needed. Conventional approximation approaches factorize the square matrix into a number matrices of much lower ranks. However, the low-rank constraint is a performance bottleneck if the approximated matrix is intrinsically high-rank or close to full rank. In this paper, we propose to approximate a large square matrix with a product of sparse full-rank matrices. In the approximation, our method needs only N(logN)2 non-zero numbers for an N×N full matrix. Our new method is especially useful for scalable neural attention modeling. Different from the conventional scaled dot-product attention methods, we train neural networks to map input data to the non-zero entries of the factorizing matrices. The sparse factorization method is tested for various square matrices, and the experimental results demonstrate that our method gives a better approximation when the approximated matrix is sparse and high-rank. As an attention module, our new method defeats Transformer and its several variants for long sequences in synthetic data sets and in the Long Range Arena benchmarks. Our code is publicly available22https://github.com/RuslanKhalitov/SparseFactorization..

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Neural networks : the official journal of the International Neural Network Society	Publication Date: Apr 22, 2022
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

Sparse factorization of square matrices with application to neural attention modeling

Abstract

Talk to us

Similar Papers

More From: Neural networks : the official journal of the International Neural Network Society

Lead the way for us

Similar Papers

Alternating DCA for reduced-rank multitask linear regression with covariance matrix estimation
Hoai An Le Thi ... Vinh Thanh Ho
Annals of Mathematics and Artificial Intelligence | VOL. 90
Hoai An Le Thi, et. al.Hoai An Le Thi ... Vinh Thanh Ho
20 Mar 2021
Annals of Mathematics and Artificial Intelligence | VOL. 90

PLR-TV: patch-based low rank with spatio-temporal total variation constraints for ungated myocardial perfusion CMR
Ganesh Adluru ... Edward V Dibella
Critical reviews in computed tomography | VOL. 16
Ganesh Adluru, et. al.Ganesh Adluru ... Edward V Dibella
01 Jan 2014
Critical reviews in computed tomography | VOL. 16

A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix Learning
Yaqing Wang ... James Kwok
-
Yaqing Wang, et. al.Yaqing Wang ... James Kwok
19 Apr 2021
19 Apr 2021

Radar imaging of stationary indoor targets using joint low-rank and sparsity constraints
V H Tang ... S L Phung
-
V H Tang, et. al.V H Tang ... S L Phung
01 Mar 2016
01 Mar 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sparse factorization of square matrices with application to neural attention modeling

Abstract

Talk to us

Similar Papers

More From: Neural networks : the official journal of the International Neural Network Society