Emotion Recognition in Conversations (ERC) is a key step towards successful human–machine interaction. While the field has seen tremendous advancement in the last few years, new applications and implementation scenarios present novel challenges and opportunities. These range from leveraging the conversational context, speaker, and emotion dynamics modelling, to interpreting common sense expressions, informal language, and sarcasm, addressing challenges of real-time ERC, recognizing emotion causes, different taxonomies across datasets, multilingual ERC, and interpretability. This survey starts by introducing ERC, elaborating on the challenges and opportunities of this task. It proceeds with a description of the emotion taxonomies and a variety of ERC benchmark datasets employing such taxonomies. This is followed by descriptions comparing the most prominent works in ERC with explanations of the neural architectures employed. Then, it provides advisable ERC practices towards better frameworks, elaborating on methods to deal with subjectivity in annotations and modelling and methods to deal with the typically unbalanced ERC datasets. Finally, it presents systematic review tables comparing several works regarding the methods used and their performance. Benchmarking these works highlights resorting to pre-trained Transformer Language Models to extract utterance representations, using Gated and Graph Neural Networks to model the interactions between these utterances, and leveraging Generative Large Language Models to tackle ERC within a generative framework. This survey emphasizes the advantage of leveraging techniques to address unbalanced data, the exploration of mixed emotions, and the benefits of incorporating annotation subjectivity in the learning phase.
Read full abstract