Transformer based model for offensive content recognition in dravidian languages

S Divya,N Sripriya

doi:10.34117/bjdv9n12-006

Abstract

This paper describes a model for spotting offensive data from the comments being collected from social media. The comments posted will include expressions, emoticons and will mostly be in code mixed language and classifying these code-mixed language comments is tricky. The proposed system uses a multi-head attention model to extract features from the code-mixed Tamil input data. Various classification algorithms are applied to these extracted features to categorize offensive comments. The generated labels are optimized by performing majority voting on labels generated by different algorithms. This system is validated on the validation set and is evaluated by applying the Tamil CodeMix test data from the dataset published by the HASOC task (Task2-subtask1) at FIRE 2021. The evaluation yields an average weighted F1 score of 0.83 and is ranked 3rd position in the official ranking.

Full Text