We present a systematic exploration of how to utilize video game context (e.g., player and environmental state) to modify and augment existing 3D gesture recognizers to improve accuracy for large gesture sets. Specifically, our work develops and evaluates three strategies for incorporating context into 3D gesture recognizers. These strategies include modifying the well-known Rubine linear classifier to handle unsegmented input streams and per-frame retraining using contextual information (CA-Linear); a GPU implementation of dynamic time warping (DTW) that reduces the overhead of traditional DTW by utilizing context to evaluate only relevant time sequences inside of a multithreaded kernel (CA-DTW); and a multiclass SVM with per-class probability estimation that is combined with a contextually based prior probability distribution (CA-SVM). We evaluate each strategy using a Kinect-based third-person perspective VE game prototype that combines parkour-style navigation with hand-to-hand combat. Using a simple gesture collection application to collect a set of 57 gestures and the game prototype that implements 37 of these gestures, we conduct three experiments. In the first experiment, we evaluate the effectiveness of several established classifiers on our gesture set and demonstrate state-of-the-art results using our proposed method. In our second experiment, we generate 500 random scenarios having between 5 and 19 of the 57 gestures in context. We show that the contextually aware classifiers CA-Linear, CA-DTW, and CA-SVM significantly outperform their non--contextually aware counterparts by 37.74%, 36.04%, and 20.81%, respectively. On the basis of the results of the second experiment, we derive upper-bound expectations for in-game performance for the three CA classifiers: 96.61%, 86.79%, and 96.86%, respectively. Finally, our third experiment is an in-game evaluation of the three CA classifiers with and without context. Our results show that through the use of context, we are able to achieve an average in-game recognition accuracy of 89.67% with CA-Linear compared to 65.10% without context, 79.04% for CA-DTW compared to 58.1% without context, and 90.85% with CA-SVM compared to 75.2% without context.
Read full abstract