Leveraging Large Language Model ChatGPT for enhanced understanding of end-user emotions in social media feedbacks

Nek Dil Khan,Javed Ali Khan,Jianqiang Li,Tahir Ullah,Qing Zhao

doi:10.1016/j.eswa.2024.125524

Abstract

For software evolution, user feedback has become a meaningful way to improve applications. Recent studies show a significant increase in analysing end-user feedback from various social media platforms for software evolution. However, less attention has been given to the end-user feedback for low-rating software applications. Also, such approaches are developed mainly on the understanding of human annotators who might have subconsciously tried for a second guess, questioning the validity of the methods. For this purpose, we proposed an approach that analyzes end-user feedback for low-rating applications to identify the end-user opinion types associated with negative reviews(an issue or bug). Also, we utilized Generative Artificial Intelligence (AI), i.e., ChatGPT, as an annotator and negotiator when preparing a truth set for the deep learning (DL) classifiers to identify end-user emotion. For the proposed approach, we first used the ChatGPT Application Programming Interface(API) to identify negative end-user feedback by processing 71853 reviews collected from 45 apps in the Amazon store. Next, a novel grounded theory is developed by manually processing end-user negative feedback to identify frequently associated emotion types, including anger, confusion, disgust, distrust, disappointment, fear, frustration, and sadness. Next, two datasets were developed, one with human annotators using a content analysis approach and the other using ChatGPT API with the identified emotion types. Next, another round is conducted with ChatGPT to negotiate over the conflicts with the human-annotated dataset, resulting in a conflict-free emotion detection dataset. Finally, various DL classifiers, including LSTM, BILSTM, CNN, RNN, GRU, BiGRU and BiRNN, are employed to identify their efficacy in detecting end-users emotions by preprocessing the input data, applying feature engineering, balancing the data set, and then training and testing them using a cross-validation approach. We obtained an average accuracy of 94%, 94%, 93%, 92%, 91%, 91%, and 85%, with LSTM, BILSTM, RNN, CNN, GRU, BiGRU and BiRNN, respectively, showing improved results with the truth set curated with human and ChatGPT. Using ChatGPT as an annotator and negotiator can help automate and validate the annotation process, resulting in better DL performances.

Full Text