Joint Audio-Visual Tracking Using Particle Filters

Dmitry N Zotkin,Ramani Duraiswami,Larry S Davis

doi:10.1155/s1110865702206058

Dmitry N Zotkin, Ramani Duraiswami + Show 1 more

Open Access

https://doi.org/10.1155/s1110865702206058

Copy DOI

Abstract

It is often advantageous to track objects in a scene using multimodal information when such information is available. We use audio as a complementary modality to video data, which, in comparison to vision, can provide faster localization over a wider field of view. We present a particle-filter based tracking framework for performing multimodal sensor fusion for tracking people in a videoconferencing environment using multiple cameras and multiple microphone arrays. One advantage of our proposed tracker is its ability to seamlessly handle temporary absence of some measurements (e.g., camera occlusion or silence). Another advantage is the possibility of self-calibration of the joint system to compensate for imprecision in the knowledge of array or camera parameters by treating them as containing an unknown statistical component that can be determined using the particle filter framework during tracking. We implement the algorithm in the context of a videoconferencing and meeting recording system. The system also performs high-level semantic analysis of the scene by keeping participant tracks, recognizing turn-taking events and recording an annotated transcript of the meeting. Experimental results are presented. Our system operates in real time and is shown to be robust and reliable.

Highlights

The goal of most machine perception systems is to mimic the performance of human and animal systems
We present a probabilistic framework for combining results from the two modes and develop a particle filter based joint audio-video tracking algorithm
We present experimental results showing the potential of the developed algorithm

Summary

INTRODUCTION

The goal of most machine perception systems is to mimic the performance of human and animal systems. Capabilities of computers have reached such a level that it is possible to build and develop systems that can combine multiple audio and video sensors and perform meaningful joint-analysis of a scene, such as joint audiovisual speaker localization, tracking, speaker change detection and remote speech acquisition using beamforming techniques, which is necessary for the development of natural, robust and environmentally-independent applications. Applications of such systems include novel human-computer interfaces, robots that sense and perceive their environment, perceptive spaces for applications in immersive virtual or augmented reality, and so forth. We present experimental results showing the potential of the developed algorithm

ALGORITHMS

Particle filter formulation

Update algorithm

Self-calibration

Motion model

Video measurements

Audio measurements

Occlusion handling

Face detection and tracking

Turn-taking detection

SYSTEM SETUP

RESULTS

Synthetic data

Real data

Annotated meeting recording

SUMMARY AND CONCLUSIONS

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Advances in Signal Processing	Publication Date: Nov 28, 2002
Citations: 119	License type: cc-by

R Discovery Prime

R Discovery Prime

Joint Audio-Visual Tracking Using Particle Filters

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing

Lead the way for us

Similar Papers

A New Joint Localization Model Using Multiple Microphone Arrays for Passive Acoustic Source Localization System
Yue Kan ... Wentao Sheng
-
Yue Kan, et. al.Yue Kan ... Wentao Sheng
01 Dec 2016
01 Dec 2016

Simultaneous Calibration of Positions, Orientations, and Time Offsets, Among Multiple Microphone Arrays
Chishio Sugiyama ... Kenji Nishida
-
Chishio Sugiyama, et. al.Chishio Sugiyama ... Kenji Nishida
11 Aug 2021
11 Aug 2021

Spotforming by NMF Using Multiple Microphone Arrays
Yasuhiro Kagimoto ... Katsutoshi Itoyama
-
Yasuhiro Kagimoto, et. al.Yasuhiro Kagimoto ... Katsutoshi Itoyama
23 Oct 2022
23 Oct 2022

Multimodal Tracking for Smart Videoconferencing and Video Surveillance
Dmitry N Zotkin ... Larry S Davis
-
Dmitry N Zotkin, et. al.Dmitry N Zotkin ... Larry S Davis
01 Jun 2007
01 Jun 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Joint Audio-Visual Tracking Using Particle Filters

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing