Abstract

The natural environment and our interaction with it are essentially multisensory, where we may deploy visual, tactile and/or auditory senses to perceive, learn and interact with our environment. Our objective in this study is to develop a scene analysis algorithm using multisensory information, specifically vision and audio. We develop a proto-object-based audiovisual saliency map (AVSM) for the analysis of dynamic natural scenes. A specialized audiovisual camera with 360∘ field of view, capable of locating sound direction, is used to collect spatiotemporally aligned audiovisual data. We demonstrate that the performance of a proto-object-based audiovisual saliency map in detecting and localizing salient objects/events is in agreement with human judgment. In addition, the proto-object-based AVSM that we compute as a linear combination of visual and auditory feature conspicuity maps captures a higher number of valid salient events compared to unisensory saliency maps. Such an algorithm can be useful in surveillance, robotic navigation, video compression and related applications.

Highlights

  • Scientists and engineers have traditionally separated the analysis of a multisensory scene into its constituent sensory domains

  • We can conclude that the unisensory saliency maps detect valid unisensory events which agree with human judgment

  • We show that a proto-object-based audiovisual saliency map detects salient unisensory and multisensory events, which agree with human judgment

Read more

Summary

Introduction

Scientists and engineers have traditionally separated the analysis of a multisensory scene into its constituent sensory domains. Recent evidence from neuroscience [1,6] suggests that the traditional view that the low level areas of cortex are strictly unisensory, processing sensory information independently, which is later on merged in higher level associative areas is increasingly becoming obsolete. This has been proved by many fMRI [7,8], EEG [9] and neuro-physiological experiments [10,11] at various neural population scales. There is enough evidence to suggest an interplay of connections among thalamus, primary sensory and higher level association areas which are responsible for audiovisual integration

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call