An Improved Gesture Tracking Algorithm Based On Depth Image Information

Quan Yang

doi:10.2991/icacsei.2013.64

Abstract

An improved Depth Image CamShift (DI_CamShift) algorithm is proposed to realize the accurate tracking of gestures in the sign language video. First, it used Kinect to obtain the depth image information of sign language gestures. Then it adjusted the search window by calculated spindle direction angle and mass center position of the depth images. Finally, the calculation of minimum depth information value in the search window was used to determine the target gesture area. Experiments results show that the algorithm has good robustness and can effectively track the sign language gestures. Gesture tracking in the video is the key premise relative to other studies, such as sign language recognition, gesture control etc., especially under the condition of complicated background and unqualified circumstance. In complex cases, there may not appear or have more than one object in video, so the accurate detection and tracking can facilitate further purpose. Because color image contains more information and characteristics, so the tracking and segmentation methods based on skin color are widely used. Although the skin color can be easily used to distinguish the hand from other things, it may disturb by objects which have the similar color with skin and couldn't determine which is the target when many hands showing in the video at the same time. As depth image contains the distance information between camera and hands, it can define the forefront hand as the objective hand to solve this problem. While the values of skin pixel color in motion tracking area play a dominant role, if the depth image information can be combined with it, the skin color based gesture tracking will be more accurate. Kinect, short for the Kinect for Xbox360, is a motion sensing input device by Microsoft for the Xbox 360 video game console and Windows PCs. Based around a webcam- style add-on peripheral for the Xbox 360 console, it enables users to control and interact with the Xbox 360 without the need to touch a game controller. The Kinect sensor is a horizontal bar connected to a small base with a motorized pivot and is designed to be positioned lengthwise above or below the video display. The device features an RGB camera, depth sensor and multi-array microphone running proprietary software. The depth sensor consists of an infrared laser projector combined with a monochrome CMOS sensor, which captures video data in 3D under any ambient light conditions. So Kinect is a 3D multifunction camera and can get color images and 3D depth information at the same time. Using Kinect as video acquisition device in the research, it can get the depth image information of gesture corresponding with color gesture video. This paper improved classic CamShift algorithm by uses the Kinect depth image information, and verified the accuracy and robustness of the new algorithm through gesture tracking experiment. II. CamShift CamShift (Continuously Adaptive Mean Shift) algorithm is a non-parameter iteration algorithm searching the probability distribution with the core of MeanShift algorithm and based on objective color features. Applied MeanShift in the continuous image sequence is the basic ideas of CamShift, and its tracking of moving object in video through the following ways: (1) Detect probability distribution image by MeanShift. (2) Calculate initial window of the next video frame from the result of previous frame. (3) Iterative procedure of described above. CamShift algorithm can implement adaptive adjustment in view of the object in the video size, thus greatly improve the tracking performance. It makes full use of advantages of MeanShift algorithm, simple and easy to calculate, and realized the adaptive window size control without any increase in computational complexity at the same time. After MeanShift iteration is completed, it can adjust the window size

Full Text