ConBERT-RL: A policy-driven deep reinforcement learning based approach for detecting homophobia and transphobia in low-resource languages

Vivek Suresh Raj,Chinnaudayar Navaneethakrishnan Subalalitha,Lavanya Sambath,Frank Glavin,Bharathi Raja Chakravarthi

doi:10.1016/j.nlp.2023.100040

Abstract

In this work, we present a novel framework for discriminatory comment classification in targeted low-resource languages thereby enabling identification of discriminatory comments for promoting safer online environment in linguistically diverse contexts. Recently proposed literatures with Bidirectional Encoder Representations from Transformers (BERT) and its variants, have produced promising results, particularly in the case of transliterated Tamil words in English. Such approaches are seen as transfer learning and fine-tuning between a general environment and targeted downstream task. However, for an effective transfer of knowledge, from a source task to a targeted task, their feature space has to be correlated (similarity measure as metric, to measure knowledge transfer), which is unexplored in the previous literature works. In practice, such similarity conditions are often violated. We propose a Concatenated representation, powered with BERT in Reinforcement Learning (RL) - ConBERT-RL framework, to capture problem-specific features, as well as to understand nuances in case of transliterated Tamil words into English, to make decision in classifying discriminatory comments. ConBERT-RL incorporates a fusion of learned hidden-state representation from our pre-trained Classifier-Model (CM), along with the broader contextualized representation (pooled output) from BERT. The key idea is to utilize this concatenated representation to drive our policy network for hate comment classification. To effectively learn such a policy, we use the REINFORCE algorithm in a reinforcement learning setting, to guide our ConBERT-RL agent, in making informative decisions. To demonstrate the general aspects of ConBERT-RL, we conduct experiments for offensive comment classification on transliterated Tamil words in English dataset. ConBERT-RL obtains results, where it significantly improved the score of micro average accuracy with 90% (≈1.0% absolute improvement over BERT+FC), 93% (≈3.0% absolute improvement over BERT+FC), on transliterated Tamil words in English and an English-only dataset respectively. To further extend support to our previous argument, we present a 2-dimensional t-distributed Stochastic Neighbour Embedding (t-SNE) visualization of ConBERT-RL’s concatenated representation. Additionally, to compare the feature space understanding, specific to the problem of discriminatory comment, we present a graph network comparison our concatenated representation with BERT output embedding. We also design and conduct, a systematic evaluation, to observe the broader capability of ConBERT-RL, in capturing the contextual words in the vicinity of the primary offensive term, that amplifies an offensive term in the given input comment. We show that ConBERT-RL is robust and effective in capturing targeted language specific features for hate comment classification.

Full Text