Cross-lingual offensive speech identification with transfer learning for low-resource languages

Xinyi Liu,Fang Chen,Chun Xu,Yuanyuan Huang,Xiayang Shi,Shaolin Zhu

doi:10.1016/j.compeleceng.2022.108005

Abstract

Most of research on the identification of offensive speech on social media platforms exist in English and other rich languages. A series of recently proposed methods for detecting low-resource offensive languages require labeled data. In this work, we propose an unsupervised model that can detect offensive speech for low-resource languages. Our method does not depend on any labeled data of low-resource languages. In detail, we propose an agreement regularized training that combines adversarial learning and transfer learning. Augmenting low-resource training data with sample regeneration methods to maintain the performance of the trained offensive speech identification model from rich-resource to low-resource languages. Extensive experiments on four low-resource languages demonstrate that our model either is on par or outperforms the supervised methods, without employing any annotated data on real-world offensive speech detection tasks for low-resource languages.

Full Text