Alternate Low-Rank Matrix Approximation in Latent Semantic Analysis

Fahrettin Horasan,Emre Deniz,Hasan Erbay,Fatih Varçın

doi:10.1155/2019/1095643

Abstract

The latent semantic analysis (LSA) is a mathematical/statistical way of discovering hidden concepts between terms and documents or within a document collection (i.e., a large corpus of text). Each document of the corpus and terms are expressed as a vector with elements corresponding to these concepts to form a term-document matrix. Then, the LSA uses a low-rank approximation to the term-document matrix in order to remove irrelevant information, to extract more important relations, and to reduce the computational time. The irrelevant information is called as “noise” and does not have a noteworthy effect on the meaning of the document collection. This is an essential step in the LSA. The singular value decomposition (SVD) has been the main tool obtaining the low-rank approximation in the LSA. Since the document collection is dynamic (i.e., the term-document matrix is subject to repeated updates), we need to renew the approximation. This can be done via recomputing the SVD or updating the SVD. However, the computational time of recomputing or updating the SVD of the term-document matrix is very high when adding new terms and/or documents to preexisting document collection. Therefore, this issue opened the door of using other matrix decompositions for the LSA as ULV- and URV-based decompositions. This study shows that the truncated ULV decomposition (TULVD) is a good alternative to the SVD in the LSA modeling.

Highlights

Academic Editor: Danilo Pianini e latent semantic analysis (LSA) is a mathematical/statistical way of discovering hidden concepts between terms and documents or within a document collection
The computational time of recomputing or updating the singular value decomposition (SVD) of the term-document matrix is very high when adding new terms and/or documents to preexisting document collection. erefore, this issue opened the door of using other matrix decompositions for the LSA as ULV- and URV-based decompositions. is study shows that the truncated ULV decomposition (TULVD) is a good alternative to the SVD in the LSA modeling
In the LSA where document collections are dynamic over time, i.e., the term-document matrix is subject to repeated updates, the SVD becomes prohibitive due to the high computational expense. us, alternative decompositions have been proposed for these applications such as lowrank ULV/URV decompositions [7] and truncated ULV

Summary

Notations and Background

Roughout the paper, uppercase letters such as A denote matrices. E n × n identity matrix is denoted by In. the norm · · · denotes the spectral norm, and. · · · F denotes the Frobenius norm. E notation Rm×n represents the set of m × n real matrices. An m × n dimensional matrix A is represented as A [aij] where aij is the entry of A at i row and j column with 1 ≤ i ≤ m and 1 ≤ j ≤ n

Orthogonal Matrix Decompositions

Application

Conclusion

Findings

Disclosure

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Programming	Publication Date: Feb 3, 2019
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Alternate Low-Rank Matrix Approximation in Latent Semantic Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming

Lead the way for us

Similar Papers

Textual noise analysis and removal for effective search engines
Tareq Jaber ... Abbes Amira
-
Tareq Jaber, et. al.Tareq Jaber ... Abbes Amira
01 Jul 2010
01 Jul 2010

Implementation techniques for large-scale latent semantic indexing applications
Roger B Bradford
-
Roger B BradfordRoger B Bradford
24 Oct 2011
24 Oct 2011

An Application of Latent Semantic Analysis for Text Categorization
Gang Kou ... Yi Peng
International Journal of Computers Communications & Control | VOL. 10
Gang Kou, et. al.Gang Kou ... Yi Peng
01 Jun 2015
International Journal of Computers Communications & Control | VOL. 10

Two uses for updating the partial singular value decomposition in latent semantic indexing
Jane E Tougas ... Raymond J Spiteri
Applied Numerical Mathematics | VOL. 58
Jane E Tougas, et. al.Jane E Tougas ... Raymond J Spiteri
15 Feb 2007
Applied Numerical Mathematics | VOL. 58

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Alternate Low-Rank Matrix Approximation in Latent Semantic Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming