Abstract

In this study, a vision based in-car entertainment user interface is presented. The user interface is designed using a hand posture and gesture recognition algorithm in deep learning framework. The hand posture recognition algorithm is formulated using the convolutional neural network to perform the fundamental tasks in the user interface. The hand gesture recognition algorithm is formulated using the long-term recurrent convolutional neural network to intuitively interact with the touchless automotive user interface in a detailed manner. In the recurrent deep learning framework, typically, the gesture frames are taken from a uniformly sampled image sequence. In this work, the recurrent structure is enhanced using a reduced number of input frames captured from the image sequence. The reduced input frames or key frames represent the action present in the video sequence. Sparse dictionary learning provide reliable key frame extraction from video sequences. However, sparse dictionary learning is computationally expensive, and are individually optimized for every video sequence. In this paper, we propose to approximate sparse dictionary learning using a non-linear regression framework. The multilayer perceptron is utilized to model the non-linear regression framework. The optimal neural network architecture is identified after a detailed evaluation. We evaluate the proposed recognition methods on public datasets. The proposed methods yield a recognition accuracy of 92% and 90% for pose and gestures, respectively. The combined hand posture and gesture recognition takes 82ms which is a reasonable for real time implementation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call