Embedding generation for text classification of Brazilian Portuguese user reviews: from bag-of-words to transformers

Frederico Dias Souza,João Baptista De Oliveira E Souza Filho Filho

doi:10.1007/s00521-022-08068-6

Frederico Dias Souza, João Baptista De Oliveira E Souza Filho Filho

Open Access

https://doi.org/10.1007/s00521-022-08068-6

Copy DOI

Abstract

Text classification is a natural language processing (NLP) task relevant to many commercial applications, like e-commerce and customer service. Naturally, classifying such excerpts accurately often represents a challenge, due to intrinsic language aspects, like irony and nuance. To accomplish this task, one must provide a robust numerical representation for documents, a process known as embedding. Embedding represents a key NLP field nowadays, having faced a significant advance in the last decade, especially after the introduction of the word-to-vector concept and the popularization of Deep Learning models for solving NLP tasks, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformer-based Language Models (TLMs). Despite the impressive achievements in this field, the literature coverage regarding generating embeddings for Brazilian Portuguese texts is scarce, especially when considering commercial user reviews. Therefore, this work aims to provide a comprehensive experimental study of embedding approaches targeting a binary sentiment classification of user reviews in Brazilian Portuguese. This study includes from classical (Bag-of-Words) to state-of-the-art (Transformer-based) NLP models. The methods are evaluated with five open-source databases with pre-defined data partitions made available in an open digital repository to encourage reproducibility. The Fine-tuned TLMs achieved the best results for all cases, being followed by the Feature-based TLM, LSTM, and CNN, with alternate ranks, depending on the database under analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Embedding generation for text classification of Brazilian Portuguese user reviews: from bag-of-words to transformers

Abstract

Talk to us

Similar Papers

More From: Neural Computing and Applications

Lead the way for us

Journal: Neural Computing and Applications	Publication Date: Dec 1, 2022
Citations: 2

Similar Papers

Research On Text Classification Based On Deep Neural Network
Deageon Kim
International Journal of Communication Networks and Information Security (IJCNIS) | VOL. 14
Deageon KimDeageon Kim
31 Dec 2022
International Journal of Communication Networks and Information Security (IJCNIS) | VOL. 14

Abstract 2063: Deep natural language processing for automated ascertainment of cancer outcomes from clinician progress notes
Kenneth L Kehl ... Deb Schrag
Cancer Research | VOL. 80
Kenneth L Kehl, et. al.Kenneth L Kehl ... Deb Schrag
13 Aug 2020
Cancer Research | VOL. 80

NLP-Based Approach for Predicting HMI State Sequences Towards Monitoring Operator Situational Awareness.
Harsh V. P. Singh ... Qusay H. Mahmoud
Sensors (Basel, Switzerland) | VOL. 20
Harsh V. P. Singh, et. al.Harsh V. P. Singh ... Qusay H. Mahmoud
05 Jun 2020
Sensors (Basel, Switzerland) | VOL. 20

A Comparative Study of Deep Learning Approaches for Arabic Language Processing
Mahmoud Mohamed ... Khaled Alosman
Jordan Journal of Electrical Engineering | VOL. 11
Mahmoud Mohamed, et. al.Mahmoud Mohamed ... Khaled Alosman
01 Jan 2025
Jordan Journal of Electrical Engineering | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Embedding generation for text classification of Brazilian Portuguese user reviews: from bag-of-words to transformers

Abstract

Talk to us

Similar Papers

More From: Neural Computing and Applications