Sentiment analysis is a vital task in natural language processing (NLP) that aims to identify and extract the emotional states and opinions of text. In this study, we conduct a comprehensive comparison of large language models (LLMs), such as ChatGPT and Google Bard, with conventional methods in sentiment analysis. We employ a rigorous evaluation framework that covers four essential metrics: accuracy, precision, recall, and the [Formula: see text]1-score. Our results reveal that TextBlob outperforms other methods, achieving an impressive accuracy of 69% and precision of 83%. On the other hand, Bard shows a relatively poor performance, with only 39% accuracy and 46% precision. This study offers valuable insights into the diverse capabilities of AI models in sentiment analysis. A key finding of this study is the importance of model selection according to the specific requirements of the task. Each model has its own strengths and weaknesses, which are reflected in their performance profiles. Moreover, the context in which these models operate is crucial. For instance, ChatGPT generates varied responses, Bard struggles with multiple sentences, and Robustly Optimized BERT Pretraining Approach (RoBERTa) balances precision and recall. This study also reveals the performance gap between LLMs and state-of-the-art deep learning methods. We believe this work will inspire future research and applications of ChatGPT and similar AI models in sentiment analysis and related tasks.
Read full abstract