Sublinear Time Approximation of Text Similarity Matrices

Archan Ray,Andrew Mccallum,Cameron Musco,Nicholas Monath

doi:10.1609/aaai.v36i7.20779

Abstract

We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a similarity matrix for n data points requires Omega(n^2) similarity computations. This quadratic scaling is a significant bottleneck, especially when similarities are computed via expensive functions, e.g., via transformer models. Approximation methods reduce this quadratic complexity, often by using a small subset of exactly computed similarities to approximate the remainder of the complete pairwise similarity matrix. Significant work focuses on the efficient approximation of positive semidefinite (PSD) similarity matrices, which arise e.g., in kernel methods. However, much less is understood about indefinite (non-PSD) similarity matrices, which often arise in NLP. Motivated by the observation that many of these matrices are still somewhat close to PSD, we introduce a generalization of the popular Nystrom method to the indefinite setting. Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix, producing a rank-s approximation with just O(ns) similarity computations. We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices arising in NLP tasks. We demonstrate high accuracy of the approximated similarity matrices in tasks of document classification, sentence similarity, and cross-document coreference.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sublinear Time Approximation of Text Similarity Matrices

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 1

Similar Papers

Sentence similarity using weighted path and similarity matrices
Reza Javadzadeh ... Marzea Rahimi
Turkish Journal of Electrical Engineering and Computer Sciences | VOL. 27
Reza Javadzadeh, et. al.Reza Javadzadeh ... Marzea Rahimi
18 Sep 2019
Turkish Journal of Electrical Engineering and Computer Sciences | VOL. 27

Efficient Similarity-Based Passive Filter Pruning for Compressing CNNS
Arshdeep Singh ... Mark D Plumbley
-
Arshdeep Singh, et. al.Arshdeep Singh ... Mark D Plumbley
04 Jun 2023
04 Jun 2023

Measuring text similarity based on structure and word embedding
Mamdouh Farouk
Cognitive Systems Research | VOL. 63
Mamdouh FaroukMamdouh Farouk
06 May 2020
Cognitive Systems Research | VOL. 63

Relationship Matrix Nonnegative Decomposition for Clustering
Ji-Yuan Pan ... Jiang-She Zhang
Mathematical Problems in Engineering | VOL. 2011
Ji-Yuan Pan, et. al.Ji-Yuan Pan ... Jiang-She Zhang
01 Jan 2010
Mathematical Problems in Engineering | VOL. 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sublinear Time Approximation of Text Similarity Matrices

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence