Survey of Neural Text Representation Models

Karlo Babić,Ana Meštrović,Sanda Martinčić-Ipšić

doi:10.3390/info11110511

Abstract

In natural language processing, text needs to be transformed into a machine-readable representation before any processing. The quality of further natural language processing tasks greatly depends on the quality of those representations. In this survey, we systematize and analyze 50 neural models from the last decade. The models described are grouped by the architecture of neural networks as shallow, recurrent, recursive, convolutional, and attention models. Furthermore, we categorize these models by representation level, input level, model type, and model supervision. We focus on task-independent representation models, discuss their advantages and drawbacks, and subsequently identify the promising directions for future neural text representation models. We describe the evaluation datasets and tasks used in the papers that introduced the models and compare the models based on relevant evaluations. The quality of a representation model can be evaluated as its capability to generalize to multiple unrelated tasks. Benchmark standardization is visible amongst recent models and the number of different tasks models are evaluated on is increasing.

Highlights

Natural language is a valuable and rich source of information for many applications
The vector representations of text can be constructed in many different ways, this paper provides a survey of the neural models that generate continuous and dense text representations
We notice a trend in the number of tasks that the neural text representation models are evaluated on (Table A2), throughout recent years, we can see a growth in the number of tasks each of the models uses for its evaluation

Summary

Introduction

Natural language is discrete and sparse, and as such a challenging source of data. For text to be usable as input data, it first has to be transformed into a suitable representation, which is usually a vector of the text’s features, a vector of numbers. The basic non-neural text representation methods, which preserve a very limited amount of information, are one-hot encoding and TFIDF (term frequency inverse document frequency) [1]. One-hot encoding creates a Boolean vector of values for each word. To represent larger units of text (multi-word units like phrases, sentences, or documents), one-hot vectors have a “1” for each of the words in the unit in focus (bag-of-words representation). TFIDF vectors extend Boolean values from one-hot vectors with frequencies, normalized by the inverse document frequencies

Objectives

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Oct 30, 2020
Citations: 25	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Survey of Neural Text Representation Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Representation Learning for Stack Overflow Posts: How Far Are We?
Junda He ... Ivana Clairine Irsan
ACM Transactions on Software Engineering and Methodology | VOL. 33
Junda He, et. al.Junda He ... Ivana Clairine Irsan
15 Mar 2024
ACM Transactions on Software Engineering and Methodology | VOL. 33

Natural Language Processing and Computational Linguistics
Junichi Tsujii
Computational Linguistics | VOL. -
Junichi TsujiiJunichi Tsujii
07 Dec 2021
Computational Linguistics | VOL. -

Network Energy Optimization of IOTs in Wireless Sensor Networks Using Capsule Neural Network Learning Model
S Govindaraj ... S N Deepa
Wireless Personal Communications | VOL. 115
S Govindaraj, et. al.S Govindaraj ... S N Deepa
05 Aug 2020
Wireless Personal Communications | VOL. 115

Multi-objective Evolutionary Neural Architecture Search for Recurrent Neural Networks
Reinhard Booysen ... Anna Sergeevna Bosman
Neural Processing Letters | VOL. 56
Reinhard Booysen, et. al.Reinhard Booysen ... Anna Sergeevna Bosman
18 Jun 2024
Neural Processing Letters | VOL. 56

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Survey of Neural Text Representation Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information