Abstract

In this paper, we present a perception sensor network (PSN) capable of detecting audio- and visual-based emergency situations such as students’ quarrel with scream and punch, and of keeping an effective school safety. As a system aspect, PSN is basically composed of ambient type sensor units using a Kinect, a pan-tilt-zoom camera, and a control board to acquire raw audio signals, color and depth images. Audio signals, which are acquired by the Kinect microphone array, are used in recognizing sound classes and localizing that sound source. Vision signals, which are acquired by the Kinect and PTZ camera stream, are used to detect the location of humans, identify their name and recognize their gestures. In the system, fusion methods are utilized to associate with multiple person detection and tracking, face identification, and audio–visual emergency recognition. Two approaches of matching pursuit algorithm and dense trajectories covariance matrix are also applied for reliably recognizing abnormal activities of students. Through this, human-caused emergencies are detected automatically while identifying human data of occurrence place, subject, and emergency type. Our PSN that consists of four units was used to conduct experiments to detect the designated target with abnormal actions in multi-person scenarios. By evaluating the performance of perception capabilities and integrated system, it was confirmed that the proposed system can help to conduct more meaningful information which can be of substantive support to teachers or staff members in school environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call