Abstract

The goal of this paper is to evaluate how the fusion of multimodal features (i.e., audio, RGB and depth) can help in the challenging task of people identification based on their gait (i.e., the way they walk), or gait recognition, and by extension to the tasks of gender and shoes recognition. Most of previous research on gait recognition has focused on designing visual descriptors, mainly on binary silhouettes, or building sophisticated machine learning frameworks. However, little attention has been paid to audio or depth patterns associated with the action of walking. So, we propose and evaluate here a multimodal system for gait recognition. The proposed approach is evaluated on the challenging `TUM GAID' dataset, which contains audio and depth recordings in addition to image sequences. The experimental results show that using either early or late fusion techniques to combine feature descriptors from three kinds of modalities (i.e., RGB, depth and audio) improves the state-of-the-art results on the standard experiments defined on the dataset for the tasks of gait, gender and shoes recognition. Additional experiments on CASIA-B (where only visual modality is available) support the benefits of feature fusion as well.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call