In this paper, a novel dual-channel system for multi-class text emotion recognition has been proposed, and a novel technique to explain its training & predictions has been developed. The architecture of the proposed system contains the embedding module, dual-channel module, emotion classification module, and explainability module. The embedding module extracts the textual features from the input sentences in the form of embedding vectors using the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model. Then the embedding vectors are fed as the inputs to the dual-channel network containing two network channels made up of convolutional neural network (CNN) and bidirectional long short term memory (BiLSTM) network. The intuition behind using CNN and BiLSTM in both the channels was to harness the goodness of the convolutional layer for feature extraction and the BiLSTM layer to extract text’s order and sequence-related information. The outputs of both channels are in the form of embedding vectors which are concatenated and fed to the emotion classification module. The proposed system’s architecture has been determined by thorough ablation studies, and a framework has been developed to discuss its computational cost. The emotion classification module learns and projects the emotion embeddings on a hyperplane in the form of clusters. The proposed explainability technique explains the training and predictions of the proposed system by analyzing the inter & intra-cluster distances and the intersection of these clusters. The proposed approach’s consistent accuracy, precision, recall, and F1 score results for ISEAR, Aman, AffectiveText, and EmotionLines datasets, ensure its applicability to diverse texts.
Read full abstract