Abstract

Systems meant for tackling hate speech have been increasing in demand with the rapid growth of social media platforms. One way of controlling hate speech in texts is to transform the text into its non-hate version while preserving the rest of the contents. Without the use of parallel data, unsupervised back-translation-based text style transfer is a common method of tackling such problems. In this article, we propose a zero-shot style-transfer technique that does effective unsupervised hate to non-hate conversion without using any hate domain text for training. While decoding the outputs produced by the system, we define an additional step of introducing lexical constraints, for better preservation of contents. Detailed empirical evaluation shows that the zero-shot method outperforms classical unsupervised style-transfer methods while at the same time reducing the data required while training.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call