Abstract

Most of research on the identification of offensive speech on social media platforms exist in English and other rich languages. A series of recently proposed methods for detecting low-resource offensive languages require labeled data. In this work, we propose an unsupervised model that can detect offensive speech for low-resource languages. Our method does not depend on any labeled data of low-resource languages. In detail, we propose an agreement regularized training that combines adversarial learning and transfer learning. Augmenting low-resource training data with sample regeneration methods to maintain the performance of the trained offensive speech identification model from rich-resource to low-resource languages. Extensive experiments on four low-resource languages demonstrate that our model either is on par or outperforms the supervised methods, without employing any annotated data on real-world offensive speech detection tasks for low-resource languages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call