FMCW Radar-Based Hand Gesture Recognition Using Spatiotemporal Deformable and Context-Aware Convolutional 5-D Feature Representation

Xichao Dong,Zewei Zhao,Tao Zeng,Jianping Wang,Yupei Wang,Yi Sui

doi:10.1109/tgrs.2021.3122332

Abstract

Recently, frequency-modulated continuous-wave (FMCW) radar-based hand gesture recognition (HGR) using deep learning has achieved favorable performance. However, many existing methods use extracted features separately, i.e., using one of the range, Doppler, azimuth, or elevation angle information, or a combination of any two, to train convolutional neural networks (CNNs), which ignore the interrelation among the 5-D time-varying-range-Doppler-azimuth-elevation feature space. Although there have been methods using the 5-D information, their mining of the interrelation among the 5-D feature space is not sufficient, and there is still room for improvements. This article proposes a new processing scheme of HGR based on 5-D feature cubes that are jointly encoded by a 3-D fast Fourier transform (3-D-FFT)-based method. Then, a CNN is proposed by building two novel blocks, i.e., the spatiotemporal deformable convolution (STDC) block and the adaptive spatiotemporal context-aware convolution (ASTCAC) block. Concretely, STDC is designed to cope with hand gestures’ large spatiotemporal geometric transformations in the 5-D feature space. Moreover, ASTCAC is designed for modeling long-distance global relationships, e.g., relationships between pixels of the feature at the upper left corner and lower right corner, and exploring the global spatiotemporal context, in order to enhance the target feature representation and suppress interference. Finally, our presented method is verified on a large radar dataset, including 19 760 sets of 16 common hand gestures, collected by 19 subjects. Our method obtains a recognition rate of 99.53% on the validation dataset and that of 97.22% on the test dataset, which is significantly better than state-of-the-art methods.

Full Text