Cross-lingual hate speech detection using domain-specific word embeddings.

Ayme Arango Monnar,Jorge Perez Rojas,Barbara Polete Labra

doi:10.1371/journal.pone.0306521

Abstract

THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS. Hate speech detection in online social networks is a multidimensional problem, dependent on language and cultural factors. Most supervised learning resources for this task, such as labeled datasets and Natural Language Processing (NLP) tools, have been specifically tailored for English. However, a large portion of web users around the world speak different languages, creating an important need for efficient multilingual hate speech detection approaches. In particular, such approaches should be able to leverage the limited cross-lingual resources currently existing in their learning process. The cross-lingual transfer in this task has been difficult to achieve successfully. Therefore, we propose a simple yet effective method to approach this problem. To our knowledge, ours is the first attempt to create a multilingual embedding model specific to this problem. We validate the effectiveness of our approach by performing an extensive comparative evaluation against several well-known general-purpose language models that, unlike ours, have been trained on massive amounts of data. We focus on a zero-shot cross-lingual evaluation scenario in which we classify hate speech in one language without having access to any labeled data. Despite its simplicity, our embeddings outperform more complex models for most experimental settings we tested. In addition, we provide further evidence of the effectiveness of our approach through an ad hoc qualitative exploratory analysis, which captures how hate speech is displayed in different languages. This analysis allows us to find new cross-lingual relations between words in the hate-speech domain. Overall, our findings indicate common patterns in how hate speech is expressed across languages and that our proposed model can capture such relationships significantly.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cross-lingual hate speech detection using domain-specific word embeddings.

Abstract

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Journal: PloS one	Publication Date: Jul 30, 2024
License type: CC BY 4.0

Similar Papers

Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model
Hind Saleh ... Kawthar Moria
Applied Artificial Intelligence | VOL. 37
Hind Saleh, et. al.Hind Saleh ... Kawthar Moria
02 Feb 2023
Applied Artificial Intelligence | VOL. 37

Hate speech detection: A comprehensive review of recent works
Ankita Gandhi ... Amir Hussain
Expert Systems | VOL. 41
Ankita Gandhi, et. al.Ankita Gandhi ... Amir Hussain
25 Feb 2024
Expert Systems | VOL. 41

Hate speech detection in low-resourced Indian languages: An analysis of transformer-based monolingual and multilingual models with cross-lingual experiments
Koyel Ghosh ... Apurbalal Senapati
Natural Language Processing | VOL. -
Koyel Ghosh, et. al.Koyel Ghosh ... Apurbalal Senapati
27 Aug 2024
Natural Language Processing | VOL. -

Sinhala Hate Speech Detection in Social Media using Text Mining and Machine learning
H.M.S.T Sandaruwan ... S.A.S Lorensuhewa
-
H.M.S.T Sandaruwan, et. al.H.M.S.T Sandaruwan ... S.A.S Lorensuhewa
01 Sep 2019
01 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-lingual hate speech detection using domain-specific word embeddings.

Abstract

Talk to us

Similar Papers

More From: PloS one