Abstract

Automatic person identification based on the acoustic footstep collected from the microphone array has fewer privacy considerations compared to the camera-dependent solutions. Moreover, the time difference of sound arrival among multiple audio channels could enable the person direction estimation, which is helpful for robots to perform the person following task in a socially compliant manner. As far as we know, the acoustic footstep benchmark dataset built for robots from the microphone array has remained unexplored. In this paper, we propose to build one improved acoustic footstep-based person identification dataset (AFPID-II). It is designed to involve various experimental covariates that are commonly considered to degrade the footstep recognition accuracy. Specifically, we build the AFPID-II dataset from 41 subjects with a lightweight microphone array in unstrained indoor rooms. We consider the covariates of clothes, shoes, and different rooms for the acoustic footstep collection. The AFPID-II dataset contains over 14 h of footstep audios (around 88,467 footstep events). It is much more plentiful compared to the former SFootBD and TUM GAID datasets. We also present one baseline identification method (AFPI-Net), which fully excavates the acoustic features in one multimodal feature fusion manner. Experimental results showed that room type degrades the identification severely (around 78%), followed by shoe type (31%) and clothing (13%).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call