Abstract

This paper presents a perception sensor network (PSN) for detect audio-based emergency situations such as human scream. The PSN consists of multiple units, each has a Kinect and a pan-tilt-zoom camera. Audio signals, which are acquired by the Kinect microphone array, are used in sound source classification and sound source localization. In order to work in multi-person scenarios, we propose an audio-visual fusion method to detect a single speaking person among multiple ones. The PSN system was demonstrated in a scenario having four persons, where the system is able to detect and localize the screaming person and send a robot to that location to check his/her condition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call