Human interaction categorization by using audio-visual cues

M J Marín-Jiménez,R Muñoz-Salinas,E Yeguas-Bolivar,N Pérez De La Blanca

doi:10.1007/s00138-013-0521-1

Abstract

Human Interaction Recognition (HIR) in uncontrolled TV video material is a very challenging problem because of the huge intra-class variability of the classes (due to large differences in the way actions are performed, lighting conditions and camera viewpoints, amongst others) as well as the existing small inter-class variability (e.g., the visual difference between hug and kiss is very subtle). Most of previous works have been focused only on visual information (i.e., image signal), thus missing an important source of information present in human interactions: the audio. So far, such approaches have not shown to be discriminative enough. This work proposes the use of Audio-Visual Bag of Words (AVBOW) as a more powerful mechanism to approach the HIR problem than the traditional Visual Bag of Words (VBOW). We show in this paper that the combined use of video and audio information yields to better classification results than video alone. Our approach has been validated in the challenging TVHID dataset showing that the proposed AVBOW provides statistically significant improvements over the VBOW employed in the related literature.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Machine Vision and Applications	Publication Date: Jun 1, 2013
Citations: 18	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Human interaction categorization by using audio-visual cues

Abstract

Talk to us

Similar Papers

More From: Machine Vision and Applications

Lead the way for us

Similar Papers

Image Classification Model Using Visual Bag of Semantic Words
Yali Qi ... Yeli Li
Pattern Recognition and Image Analysis | VOL. 29
Yali Qi, et. al.Yali Qi ... Yeli Li
01 Jul 2019
Pattern Recognition and Image Analysis | VOL. 29

New bag of deep visual words based features to classify chest x-ray images for COVID-19 diagnosis.
Chiranjibi Sitaula ... Sunil Aryal
Health Information Science and Systems | VOL. 9
Chiranjibi Sitaula, et. al.Chiranjibi Sitaula ... Sunil Aryal
18 Jun 2021
Health Information Science and Systems | VOL. 9

Spatial Weighting for Bag-of-Visual-Words and Its Application in Content-Based Image Retrieval
Xin Chen ... Xiajiong Shen
-
Xin Chen, et. al.Xin Chen ... Xiajiong Shen
01 Jan 2009
01 Jan 2009

Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification
Yousef Alqasrawi ... Daniel Neagu
Signal, Image and Video Processing | VOL. 7
Yousef Alqasrawi, et. al.Yousef Alqasrawi ... Daniel Neagu
20 Oct 2011
Signal, Image and Video Processing | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Human interaction categorization by using audio-visual cues

Abstract

Talk to us

Similar Papers

More From: Machine Vision and Applications