Abstract

Hand gestures can be used for natural and intuitive human-computer interaction. To achieve this goal, computers should be able to visually recognize hand gestures from video input. However, vision-based hand tracking and gesture recognition is an extremely challenging problem due to the complexity of hand gestures, which are rich in diversities due to high degrees of freedom involved by the human hand. On the other hand, computer vision algorithms are notoriously brittle and computation intensive, which make most current gesture recognition systems fragile and inefficient. This thesis proposes a new architecture to solve the problem of real-time vision-based hand tracking and gesture recognition with the combination of statistical and syntactic analysis. The fundamental idea is to use a divide-and-conquer strategy based on the hierarchical composition property of hand gestures so that the problem can be decoupled into two levels. The low-level of the architecture focuses on hand posture detection and tracking with Haar-like features and the AdaBoost learning algorithm. The Haar-like features can effectively catch the appearance properties of the hand postures. The AdaBoost learning algorithm can significantly speed up the performance and construct an accurate cascade of classifiers by combining a sequence of weak classifiers. To recognize different hand postures, a parallel cascades structure is implemented. This structure achieves real-time performance and high classification accuracy. The 3D position of the hand is recovered according to the camera's perspective projection. To make the system robust against cluttered backgrounds, background subtraction and noise removal are applied. For the high-level hand gestures recognition, a stochastic context-free grammar (SCFG) is used to analyze the syntactic structure of the hand gestures with the terminal strings converted from the postures detected by the low-level of the architecture. Based on the similarity measurement and the probabilities associated with the production rules, given an input string, the corresponding hand gesture can be identified by looking for the production rule that has the greatest probability to generate this string. For the hand motion analysis, two SCFGs are defined to analyze two structured hand gestures with different trajectory patterns: the rectangle gesture and the diamond gesture. Based on the different probabilities associated with these two grammars, the SCFGs can effectively disambiguate the distorted trajectories and classify them correctly. An application of gesture-based interaction with a 3D gaming virtual environment is implemented. With this system, the user can navigate the 3D gaming world by driving the avatar car with a set of hand postures. When the user wants to manipulate the virtual objects, he can use a set of hand gestures to select the target traffic sign and open a window to check the information of the correspondent learning object. This application demonstrates the gesture-based interface can achieve an improved interaction, which are more intuitive and flexible for the user.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call