Audio-Video detection of the active speaker in meetings

Francisco Madrigal,Isabelle Ferrane,Lionel Pibre,Frederic Lerasle

doi:10.1109/icpr48806.2021.9412681

Abstract

Meetings are a common activity that provides certain challenges when creating systems that assist them. Such is the case of the Active speaker detection, which can provide useful information for human interaction modeling, or human-robot interaction. Active speaker detection is mostly done using speech, however, certain visual and contextual information can provide additional insights. In this paper we propose an active speaker detection framework that integrates audiovisual features with social information, from the meeting context. Visual cue is processed using a Convolutional Neural Network (CNN) that captures the spatio-temporal relationships. We analyze several CNN architectures with both cues: raw pixels (RGB images) and motion (estimated with optical flow). Contextual reasoning is done with an original methodology, based on the gaze of all participants. We evaluate our proposal with a public benchmark in state-of-art: AMI corpus. We show how the addition of visual and context information improves the performance of the active speaker detection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Audio-Video detection of the active speaker in meetings

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 10, 2021
Citations: 3	License type: other-oa

Similar Papers

Audio-Video detection of the active speaker in meetings

-

29 Dec 2020
29 Dec 2020

Audio-video fusion strategies for active speaker detection in meetings
Lionel Pibre ... Isabelle Ferrané
Multimedia Tools and Applications | VOL. 82
Lionel Pibre, et. al.Lionel Pibre ... Isabelle Ferrané
28 Sep 2022
Multimedia Tools and Applications | VOL. 82

AS-Net: active speaker detection using deep audio-visual attention
Abduljalil Radman ... Jorma Laaksonen
Multimedia Tools and Applications | VOL. 83
Abduljalil Radman, et. al.Abduljalil Radman ... Jorma Laaksonen
05 Feb 2024
Multimedia Tools and Applications | VOL. 83

Improved Active Speaker Detection based on Optical Flow
Chong Huang ... Kazuhito Koishida
-
Chong Huang, et. al.Chong Huang ... Kazuhito Koishida
01 Jun 2020
01 Jun 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Audio-Video detection of the active speaker in meetings

Abstract

Talk to us

Similar Papers