Zero-Shot Hate to Non-Hate Text Conversion Using Lexical Constraints

Zishan Ahmad,Asif Ekbal,Vinnakota Sai Sujeeth

doi:10.1109/tcss.2022.3175259

Abstract

Systems meant for tackling hate speech have been increasing in demand with the rapid growth of social media platforms. One way of controlling hate speech in texts is to transform the text into its non-hate version while preserving the rest of the contents. Without the use of parallel data, unsupervised back-translation-based text style transfer is a common method of tackling such problems. In this article, we propose a zero-shot style-transfer technique that does effective unsupervised hate to non-hate conversion without using any hate domain text for training. While decoding the outputs produced by the system, we define an additional step of introducing lexical constraints, for better preservation of contents. Detailed empirical evaluation shows that the zero-shot method outperforms classical unsupervised style-transfer methods while at the same time reducing the data required while training.

Full Text