An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments

Danilo Dessì,Harald Sack,Diego Reforgiato Recupero

doi:10.3390/electronics10070779

Abstract

Today, increasing numbers of people are interacting online and a lot of textual comments are being produced due to the explosion of online communication. However, a paramount inconvenience within online environments is that comments that are shared within digital platforms can hide hazards, such as fake news, insults, harassment, and, more in general, comments that may hurt someone’s feelings. In this scenario, the detection of this kind of toxicity has an important role to moderate online communication. Deep learning technologies have recently delivered impressive performance within Natural Language Processing applications encompassing Sentiment Analysis and emotion detection across numerous datasets. Such models do not need any pre-defined hand-picked features, but they learn sophisticated features from the input datasets by themselves. In such a domain, word embeddings have been widely used as a way of representing words in Sentiment Analysis tasks, proving to be very effective. Therefore, in this paper, we investigated the use of deep learning and word embeddings to detect six different types of toxicity within online comments. In doing so, the most suitable deep learning layers and state-of-the-art word embeddings for identifying toxicity are evaluated. The results suggest that Long-Short Term Memory layers in combination with mimicked word embeddings are a good choice for this task.

Highlights

In these years, short text information is continuously being created due to the explosion of online communication, social networks, and e-commerce platforms
We evaluate the use of four word embedding representations based on Word2Vec [9,13] and Bidirectional Encoder Representations from Transformer (BERT) [10] algorithms for the task of toxicity detection in online textual comments
The literature already showed [50] that deep learning methods that are trained with word embeddings outperform those trained with tf-idf features

Summary

Introduction

Short text information is continuously being created due to the explosion of online communication, social networks, and e-commerce platforms. All of these approaches fall within the Sentiment Analysis research topic, which classifies data into positive or negative classes, and it includes several subtasks, such as emotion detection, aspect-based polarity detection [8], etc To detect such knowledge, supervised Machine Learning-based systems are designed and provided by the research community to support and improve online services to mine and use the information. We analyzed four deep learning models based on Dense, Convolutional Neural Network (CNN), and Long-Short Term Memory (LSTM) layers to detect various levels of toxicity within online textual comments. We evaluate the use of four word embedding representations based on Word2Vec [9,13] and Bidirectional Encoder Representations from Transformer (BERT) [10] algorithms for the task of toxicity detection in online textual comments.

Related Work

Problem Formulation

Preprocessing

Deep Learning Models

Dense Model

CNN Model

LSTM Model

Bidirectional LSTM

Word Embeddings Representations

Word2Vec

Word Embeddings Preparation

Experimental Study

The Dataset

Baselines

Results and Discussion

Comparison with the Kaggle Challenge

Baseline Comparison

Dense-Based Model

CNN-Based Model

LSTM-Based Model

BiLSTM-Based Model

Overall Evaluation of the Deep Learning Models

Overall Evaluation of Word Embeddings

Conclusions and Future Work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Mar 25, 2021
Citations: 15	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Multichannel Approach for Sentiment Analysis Using Stack of Neural Network with Lexicon Based Padding and Attention Mechanism
Venkateswara Rao Kota ... M. Shyamala Devi
Applied Computer Systems | VOL. 28
Venkateswara Rao Kota, et. al.Venkateswara Rao Kota ... M. Shyamala Devi
01 Jun 2023
Applied Computer Systems | VOL. 28

Quality of Word Embeddings on Sentiment Analysis Tasks
Erion Çano ... Maurizio Morisio
-
Erion Çano, et. al.Erion Çano ... Maurizio Morisio
01 Jan 2017
01 Jan 2017

Transportation sentiment analysis using word embedding and ontology-based topic modeling
Farman Ali ... Kyung-Sup Kwak
Knowledge Based Systems | VOL. 174
Farman Ali, et. al.Farman Ali ... Kyung-Sup Kwak
05 Mar 2019
Knowledge Based Systems | VOL. 174

A Method of Deep Learning Tackles Sentiment Analysis Problem in Arabic Texts
...
Iraqi Journal of Computer, Communication, Control and System Engineering | VOL. -
, et. al. ...
30 Oct 2020
Iraqi Journal of Computer, Communication, Control and System Engineering | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics