A document representation framework with interpretable features using pre-trained word embeddings

Narendra Babu Unnam,P Krishna Reddy

doi:10.1007/s41060-019-00200-5

Abstract

We propose an improved framework for document representation using word embeddings. The existing models represent the document as a position vector in the same word embedding space. As a result, they are unable to capture the multiple aspects as well as the broad context in the document. Also, due to their low representational power, existing approaches perform poorly at document classification. Furthermore, the document vectors obtained using such methods have uninterpretable features. In this paper, we propose an improved document representation framework which captures multiple aspects of the document with interpretable features. In this framework, a document is represented in a different feature space by representing each dimension with a potential feature word with relatively high discriminating power. A given document is modeled as the distances between the feature words and the document. To represent a document, we have proposed two criteria for the selection of potential feature words and a distance function to measure the distance between the feature word and the document. Experimental results on multiple datasets show that the proposed model consistently performs better at document classification over the baseline methods. The proposed approach is simple and represents the document with interpretable word features. Overall, the proposed model provides an alternative framework to represent the larger text units with word embeddings and provides the scope to develop new approaches to improve the performance of document representation and its applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A document representation framework with interpretable features using pre-trained word embeddings

Abstract

Talk to us

Similar Papers

More From: International Journal of Data Science and Analytics

Lead the way for us

Journal: International Journal of Data Science and Analytics	Publication Date: Nov 25, 2019
Citations: 2

Similar Papers

Dictionary-based Debiasing of Pre-trained Word Embeddings
Masahiro Kaneko ... Danushka Bollegala
-
Masahiro Kaneko, et. al.Masahiro Kaneko ... Danushka Bollegala
01 Jan 2020
01 Jan 2020

WEFEST: Word Embedding Feature Extension for Short Text Classification
Lei Sang ... Xindong Wu
-
Lei Sang, et. al.Lei Sang ... Xindong Wu
01 Dec 2016
01 Dec 2016

Querying Word Embeddings for Similarity and Relatedness
Fatemeh Torabi Asr ... Michael Jones
-
Fatemeh Torabi Asr, et. al.Fatemeh Torabi Asr ... Michael Jones
01 Jan 2018
01 Jan 2018

Adjectival modification in L2 Spanish Noun Phrases
Pedro Guijarro-Fuentes
EUROSLA Yearbook | VOL. 14
Pedro Guijarro-FuentesPedro Guijarro-Fuentes
05 Aug 2014
EUROSLA Yearbook | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A document representation framework with interpretable features using pre-trained word embeddings

Abstract

Talk to us

Similar Papers

More From: International Journal of Data Science and Analytics