Polyseme-Aware Vector Representation for Text Classification

Shun Guo,Nianmin Yao

doi:10.1109/access.2020.3010981

Abstract

Representation models for text classification have recently shown impressive performance. However, these models neglect the importance of polysemous words in text. When polysemous words appear in a text, imprecise polysemous word embeddings will produce low-quality text representation that results in changing the original meaning of the text. To address this problem, in this paper, we present a more effective model architecture, the polyseme-aware vector representation model (PAVRM), to generate more precise vector representations for words and texts. The PAVRM can effectively identify polysemous words in a corpus with a context clustering algorithm. Additionally, we propose two methods to construct polysemous word representations, PAVRM-Context and PAVRM-Center. Experiments conducted on three standard text classification tasks and a custom text classification task demonstrate that the proposed PAVRM can be effectively introduced into existing models to generate higher-quality word and text representations to achieve better classification performance.

Highlights

Representation learning is a fundamental problem in natural language processing (NLP) and is crucial in text classification tasks
Traditional representation models for text classification can be roughly divided into two types: models based on linear operations, which utilize tools for word embedding training, such as word2vec [1] or GloVe [2], to learn word-level representations that are later combined to form text representations [3]–[5]; and models based on deep neural networks, which use various neural network structures, such as convolutional neural networks (CNNs) [6]–[9], recurrent neural networks (RNNs) based on long short-term memory (LSTM) [10]–[12], neural networks based on attention mechanisms [13], generative adversarial networks (GANs) [14], [15], reinforcement learning (RL) [16], [17], graph convolutional networks (GCNs) [18]–[20], and pretrained language models [21], [22], to extract complex syntactic and semantic meaning from texts to generate text representations
polyseme-aware vector representation model (PAVRM)-Center indicates a variation of our proposed PAVRM model in which we replace the algorithm based on context with the algorithm based on the center vector to construct polysemous word representations

Summary

Introduction

Representation learning is a fundamental problem in natural language processing (NLP) and is crucial in text classification tasks. Traditional representation models for text classification can be roughly divided into two types: models based on linear operations, which utilize tools for word embedding training, such as word2vec [1] or GloVe [2], to learn word-level representations that are later combined to form text representations [3]–[5]; and models based on deep neural networks, which use various neural network structures, such as convolutional neural networks (CNNs) [6]–[9], recurrent neural networks (RNNs) based on long short-term memory (LSTM) [10]–[12], neural networks based on attention mechanisms [13], generative adversarial networks (GANs) [14], [15], reinforcement learning (RL) [16], [17], graph convolutional networks (GCNs) [18]–[20], and pretrained language models [21], [22], to extract complex syntactic and semantic meaning from texts to generate text representations These models achieve very good performance in many text classification tasks, they still neglect an important. The following two sentences are taken as examples: they are extracted from two different sentiment classification tasks: movie reviews and baby product reviews

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Polyseme-Aware Vector Representation for Text Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Research On Text Classification Based On Deep Neural Network
Deageon Kim
International Journal of Communication Networks and Information Security (IJCNIS) | VOL. 14
Deageon KimDeageon Kim
31 Dec 2022
International Journal of Communication Networks and Information Security (IJCNIS) | VOL. 14

Several alternative term weighting methods for text representation and classification
Zhong Tang ... Song Li
Knowledge-Based Systems | VOL. 207
Zhong Tang, et. al.Zhong Tang ... Song Li
14 Aug 2020
Knowledge-Based Systems | VOL. 207

Systematic framework for short text classification based on improved TWE and supervised MCFS topic merging strategy
Baoshan Sun ... Chunqing Li
International Journal of Computers and Applications | VOL. 44
Baoshan Sun, et. al.Baoshan Sun ... Chunqing Li
06 May 2020
International Journal of Computers and Applications | VOL. 44

Stochastic Tokenization with a Language Model for Neural Text Classification
Tatsuya Hiraoka ... Hiroyuki Shindo
-
Tatsuya Hiraoka, et. al.Tatsuya Hiraoka ... Hiroyuki Shindo
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Polyseme-Aware Vector Representation for Text Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access