Detection and Identification of Obfuscated Obscene Language with Character Level Transformers

Tobias Renwick,Denilson Barbosa

doi:10.21428/594757db.cd61e1d6

Abstract

We present O-Norm (Offense Normalizer), a system for normalizing offensive words and phrases by reversing user-created obfuscations. O-Norm is intended to be a preprocessing tool for pipelines that run on user-created text, particularly those tasks where obscenities may be of significant value such as sentiment, toxicity, and abuse detection. O-Norm is constructed with a purely generated, context-free dataset derived from a curse dictionary. The generative dataset allows O-Norm to be flexible to the addition of new words as language adapts, and also allows it to be readily retrained in other Latin alphabet-based languages as it does not require manual annotations to train. O-Norm is based on a character-level, transformer network which attempts de-obfuscation only on out-of-vocabulary (OOV) tokens. In an 80/20 train-test split, O-Norm achieves an F1 score of 89.6% over 141 curses in a generated dataset with 2.16 million unique training points. An inspection of O-Normâs output on a sample of social media posts from Kaggleâs Jigsaw corpus reveals accuracy of 95.7% on de-obfuscating transformations in toxic user-created text.

Full Text