Multi-modal humor segment prediction in video

Zekun Yang,Haruo Takemura,Yuta Nakashima

doi:10.1007/s00530-023-01105-x

Abstract

Humor can be induced by various signals in the visual, linguistic, and vocal modalities emitted by humans. Finding humor in videos is an interesting but challenging task for an intelligent system. Previous methods predict humor in the sentence level given some text (e.g., speech transcript), sometimes together with other modalities, such as videos and speech. Such methods ignore humor caused by the visual modality in their design, since their prediction is made for a sentence. In this work, we first give new annotations to humor based on a sitcom by setting up temporal segments of ground truth humor derived from the laughter track. Then, we propose a method to find these temporal segments of humor. We adopt an approach based on sliding window, where the visual modality is described by pose and facial features along with the linguistic modality given as subtitles in each sliding window. We use long short-term memory networks to encode the temporal dependency in poses and facial features and pre-trained BERT to handle subtitles. Experimental results show that our method improves the performance of humor prediction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-modal humor segment prediction in video

Abstract

Talk to us

Similar Papers

More From: Multimedia Systems

Lead the way for us

Journal: Multimedia Systems	Publication Date: Jun 3, 2023
License type: CC BY 4.0

Similar Papers

M3TR: Multi-modal Multi-label Recognition with Transformer
Jiawei Zhao ... Yifan Zhao
-
Jiawei Zhao, et. al.Jiawei Zhao ... Yifan Zhao
17 Oct 2021
17 Oct 2021

AVForensics: Audio-driven Deepfake Video Detection with Masking Strategy in Self-supervision
Yizhe Zhu ... Jialin Gao
-
Yizhe Zhu, et. al.Yizhe Zhu ... Jialin Gao
12 Jun 2023
12 Jun 2023

Action processing in the motor system: Transcranial Magnetic Stimulation (TMS) evidence of shared mechanisms in the visual and linguistic modalities
Claudia Gianelli ... Riccardo Dalla Volta
Brain and Cognition | VOL. 139
Claudia Gianelli, et. al.Claudia Gianelli ... Riccardo Dalla Volta
07 Jan 2020
Brain and Cognition | VOL. 139

Squib: On Phonology and prelexical mechanisms of the language acquisition

-

12 Aug 2015
12 Aug 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-modal humor segment prediction in video

Abstract

Talk to us

Similar Papers

More From: Multimedia Systems