Abstract

With a growing interest in autonomous vehicles’ operation, there is an equally increasing need for efficient anticipatory gesture recognition systems for human-vehicle interaction. Existing gesture recognition algorithms have been primarily restricted to historical data. In this paper, we propose a novel Context-and-Gap-aware Pose Prediction Framework (CGAP2), which predicts the future pose data for anticipatory recognition of gestures in an online fashion. CGAP2 implements an encoder–decoder architecture paired with a pose prediction module to anticipate future frames followed by a shallow classifier. CGAP2’s pose prediction module uses 3D convolutional layers and is dependent on the number of pose frames supplied, the time difference between subsequent pose frames, and the number of predicted pose frames. The performance of CGAP2 is evaluated on the Human3.6M dataset with the MPJPE metric. For the pose prediction of 15 frames in advance, an error of 79 mm is achieved. The pose prediction module consists of only 26M parameters and can run at 50 FPS on the NVIDIA TITAN RTX. CGAP2 has a 1-s time advantage compared to other gesture recognition systems, which can be crucial for autonomous vehicles. Furthermore, this work formalizes the problem of anticipatory gesture recognition by introducing two hyper-parameters namely context and gap and a relationship is established between the two using carefully designed experiments in ablation testing. As per our knowledge, this work is the first to tackle all the challenges faced by anticipatory gesture recognition and also the first to post a performance benchmark on Human3.6M dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call