Sarcasm, a form of verbal irony, involves using remarks that intentionally convey the opposite of their literal meaning. It serves to criticize or humorously undermine a situation. Verbal and non-verbal cues—such as changes in tone, incongruence across modalities, and word emphasis—often convey sarcasm. As technology advances, more people express their opinions online, necessitating the development of efficient models for detecting speech nature. Machine Learning algorithms, Linguistic Models, Ensemble Learning, and Multi-modal approaches play crucial roles in this endeavor. While most research has focused on unimodal approaches using textual data, the field now embraces multimodal sarcasm detection. This approach integrates diverse data types, including images, text, and audio, to enhance accuracy and automate sarcasm classification. In our review paper, we delve into the captivating world of multimodal sarcasm detection, tracing its evolution from early unimodal methods to the current state of multimodal techniques. We aim to provide a comprehensive overview, shedding light on challenges, methodologies, and prospects. Keywords: Sarcasm Detection, Multimodal Approaches, Machine Learning Algorithms, Linguistic Models, Ensemble Learning
Read full abstract