Sentence level matrix representation for document spectral clustering

Víctor Mijangos,Gerardo Sierra,Azucena Montes

doi:10.1016/j.patrec.2016.11.008

Sentence level matrix representation for document spectral clustering

Víctor Mijangos, Gerardo Sierra + Show 1 more

https://doi.org/10.1016/j.patrec.2016.11.008

Copy DOI

Journal: Pattern recognition letters	Publication Date: Nov 21, 2016
Citations: 22

Affiliation: Universidad Nacional Autónoma de México, Centro Nacional de Investigación y Desarrollo Tecnológico

#Vector In Rn #Traditional Clustering Algorithms + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Using a simple vector in Rn is a traditional way of representing documents in vector spaces. However, this representation tends to ignore the discourse and syntactic structure of texts. A matrix representation such as the one offered by the Doc2Vec word embedding method preserves these characteristics. In order to integrate a sentence level matrix representing documents to a clustering algorithm, we use a Frobenius based inner product that allows defining kernel functions for spectral clustering. We show that this methodology provides advantages over traditional clustering algorithms and performs better than bag of words (BoW) representations used in Information Retrieval (IR).

Full Text