Tamil Offensive Language Detection: Supervised versus Unsupervised Learning Approaches

Vimala Balakrishnan,Kumanan N Govaichelvan,Vithyatheri Govindan

doi:10.1145/3575860

Abstract

Studies on natural language processing are mainly conducted in English, with very few exploring languages that are under-resourced, including the Dravidian languages. We present a novel work in detecting offensive language using a corpus collected from YouTube containing comments in Tamil. The study specifically aims to compare two machine learning approaches—namely, supervised and unsupervised—to detect offensive patterns in textual communications. In the first setup, offensive language detection models were developed using traditional machine learning algorithms such as Random Forest, Logistic Regression, Support Vector Machine, and AdaBoost, and assessed based on human labeling. Conversely, we used K -means ( K = 2) to cluster the unlabeled data before training the same set of machine learning algorithms to detect offensive communications. Performance scores indicate unsupervised clustering to be more effective than human labeling with ensemble classifiers achieving an impressive accuracy of 99.70% and 99.87% respectively for balanced and imbalanced datasets, hence showing that the unsupervised approach can be used effectively to detect offensive language in low-resourced languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Mar 24, 2023
Citations: 1	License type: pd

R Discovery Prime

R Discovery Prime

Tamil Offensive Language Detection: Supervised versus Unsupervised Learning Approaches

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Similar Papers

An exhaustive measurement of re-sampling detection in lossy compressed images using deep learning approach
Vijayakumar Kadha ... Santos Kumar Das
Engineering Applications of Artificial Intelligence | VOL. 129
Vijayakumar Kadha, et. al.Vijayakumar Kadha ... Santos Kumar Das
30 Nov 2023
Engineering Applications of Artificial Intelligence | VOL. 129

Enhancing Large-Diameter Tunnel Construction Safety with Robust Optimization and Machine Learning Integrated into BIM
Jagendra Singh ... Sandeep Kumar
The Open Civil Engineering Journal | VOL. 18
Jagendra Singh, et. al.Jagendra Singh ... Sandeep Kumar
07 Oct 2024
The Open Civil Engineering Journal | VOL. 18

Offensive language detection in low resource languages: A use case of Persian language.
Marzieh Mozafari ... Noel Crespi
PloS one | VOL. 19
Marzieh Mozafari, et. al.Marzieh Mozafari ... Noel Crespi
01 Jan 2024
PloS one | VOL. 19

OLF-ML: An Offensive Language Framework for Detection, Categorization, and Offense Target Identification Using Text Processing and Machine Learning Algorithms
Md Nahid Hasan ... Taghrid Tahani Preeti
Mathematics | VOL. 12
Md Nahid Hasan, et. al.Md Nahid Hasan ... Taghrid Tahani Preeti
06 Jul 2024
Mathematics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tamil Offensive Language Detection: Supervised versus Unsupervised Learning Approaches

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing