Our work explores the differences between GRU-based and transformer-based approaches in the context of sentiment analysis on text dialog. In addition to the overall performance on the downstream task, we assess the knowledge transfer capabilities of the models by applying a thorough zero-shot analysis at task level, and on the cross-lingual performance between five European languages. The ability to generalize over different tasks and languages is of high importance, as the data needed for a particular application may be scarce or non existent. We perform evaluations on both known benchmark datasets and a novel synthetic dataset for dialog data, containing Romanian call-center conversations. We study the most appropriate combination of synthetic and real data for fine-tuning on the downstream task, enabling our models to perform in low-resource environments. We leverage the informative power of the conversational context, showing that appending the previous four utterances of the same speaker to the input sequence has the greatest benefit on the inference performance. The cross-lingual and cross-task evaluations have shown that the transformer-based models possess superior transfer abilities to the GRU model, especially in the zero-shot setting. Considering its prior intensive fine-tuning on multiple labeled datasets for various tasks, FLAN-T5 excels in the zero-shot task experiments, obtaining a zero-shot accuracy of 51.27% on the IEMOCAP dataset, alongside the classical BERT that obtained the highest zero-shot accuracy on the MELD dataset with 55.08%.
Read full abstract