Combining Embeddings of Input Data for Text Classification

Zuzanna Parcheta,Francisco Casacuberta,Germán Sanchis-Trilles,Robin Rendahl

doi:10.1007/s11063-020-10312-w

Abstract

The problem of automatic text classification is an essential part of text analysis. The improvement of text classification can be done at different levels such as a preprocessing step, network implementation, etc. In this paper, we focus on how the combination of different methods of text encoding may affect classification accuracy. To do this, we implemented a multi-input neural network that is able to encode input text using several text encoding techniques such as BERT, neural embedding layer, GloVe, skip-thoughts and ParagraphVector. The text can be represented at different levels of tokenised input text such as the sentence level, word level, byte pair encoding level and character level. Experiments were conducted on seven datasets from different language families: English, German, Swedish and Czech. Some of those languages contain agglutinations and grammatical cases. Two out of seven datasets originated from real commercial scenarios: (1) classifying ingredients into their corresponding classes by means of a corpus provided by Northfork; and (2) classifying texts according to the English level of their corresponding writers by means of a corpus provided by ProvenWord. The developed architecture achieves an improvement with different combinations of text encoding techniques depending on the different characteristics of the datasets. Once the best combination of embeddings at different levels was determined, different architectures of multi-input neural networks were compared. The results obtained with the best embedding combination and best neural network architecture were compared with state-of-the-art approaches. The results obtained with the dataset used in the experiments were better than the state-of-the-art baselines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Neural Processing Letters	Publication Date: Aug 11, 2020
Citations: 7	License type: other-oa

R Discovery Prime

R Discovery Prime

Combining Embeddings of Input Data for Text Classification

Abstract

Talk to us

Similar Papers

More From: Neural Processing Letters

Lead the way for us

Similar Papers

An Innovative Word Encoding Method For Text Classification Using Convolutional Neural Network
Amr Adel Helmy ... Rania Hodhod
-
Amr Adel Helmy, et. al.Amr Adel Helmy ... Rania Hodhod
01 Dec 2018
01 Dec 2018

Chinese sentence semantic matching based on multi-level relevance extraction and aggregation for intelligent human–robot interaction
Wenpeng Lu ... Hao Wu
Applied Soft Computing | VOL. 131
Wenpeng Lu, et. al.Wenpeng Lu ... Hao Wu
11 Nov 2022
Applied Soft Computing | VOL. 131

Effects of speech cues in French-speaking children with dysarthria.
Erika S Levy ... Luca Campanelli
International Journal of Language & Communication Disorders | VOL. 55
Erika S Levy, et. al.Erika S Levy ... Luca Campanelli
20 Feb 2020
International Journal of Language & Communication Disorders | VOL. 55

INTRAINDIVIDUAL DIFFERENCES IN LEVELS OF WRITTEN LANGUAGE
Virginia W Berninger ... Donald T Mizokawa
Reading & Writing Quarterly | VOL. 10
Virginia W Berninger, et. al.Virginia W Berninger ... Donald T Mizokawa
01 Jul 1994
Reading & Writing Quarterly | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Combining Embeddings of Input Data for Text Classification

Abstract

Talk to us

Similar Papers

More From: Neural Processing Letters