Abstract

The Kinect sensor is a powerful tool for applications that require machine vision and voice recognition. The sensor has the capability to detect and track up to two individuals within its field of view and output 20 key 3D “skeleton” joints on these individuals at 30 frames per second. Moreover, the sensor also has a sound-localizing array of microphones that is used to compute the azimuth of any primary sound source within its range. While this skeleton data have good accuracy most of the time, the 20 tracking points exhibit a high level of jitter due to noise and estimation error, and when a subject moves slightly out of the field of view of the sensor for a short period of time, there is no built-in capability to continue the tracking by extrapolating the positions of these points. In addition, the sensor does not take advantage of the sound source angle when the subject being tracked is speaking. In this work, tracking with the sensor is improved by applying an extended Kalman filter. This filter smooths out the jitter, adds the capability to continue tracking for a short period of time when the subject moves out of range of the sensor, and improves the accuracy of the tracking by incorporating the information contained in the sound source angle from the sensor. The efficacy of the filter is demonstrated by applying it to the skeleton head joint, a tracking point near the center of the subject’s head.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.