Abstract

Dynamic hand gesture is an emerging and promising biometric trait containing both physiological and behavioral characteristics. Possessing the two kinds of characteristics makes dynamic hand gesture have more identity information enabling more accurate and secure authentication theoretically, but also poses a challenge of efficient fine-grained spatiotemporal feature extraction. This challenge involves a seemingly paradoxical problem that high-frame-rate videos are required for behavioral characteristic analysis, but they can also introduce high computational costs. To mitigate this issue, we propose a Frequency Spatiotemporal Attention Network (FSTA-Net) with a focus on satisfying the high-performance and low-computation requirements of authentication systems. The FSTA-Net is established with a two-stage identity characteristic analysis paradigm for short- and long-term modeling. Specifically, considering that models prefer to analyze physiological characteristics which are relatively straightforward to understand, we first design a Behavior Enhanced (BE) module to emphasize hand motions and reduce redundant information to facilitate local identity feature distillation in the first stage. We then present a Frequency Spatiotemporal Attention (FSTA) module to summarize global identity features with decent FLOPs and GPU memory occupation in the second stage. Incorporating the BE and FSTA modules enables them to complement each other’s strengths, resulting in a clear-cut improvement in equal error rate and running speed. Extensive experiments on the SCUT-DHGA dataset demonstrate the superiority of the FSTA-Net. The code is available at https://github.com/SCUT-BIP-Lab/FSTA-Net.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call