Abstract

Biometric authentication technologies are rapidly gaining popularity, and hand gestures are emerging as a promising biometric trait due to their rich physiological and behavioral characteristics. Hand gesture authentication can be categorized as defined hand gesture authentication and random hand gesture authentication. Unlike defined hand gesture authentication, random hand gesture authentication is not constrained to specific hand gesture types, allowing users to perform hand gestures randomly during enrollment and verification, thus more flexible and friendly. However, in random hand gesture authentication, the model needs to extract more generalized physiological and behavioral features from different viewpoints and positions without gesture templates, which is more challenging. In this paper, we present a novel efficient Temporal-Segment-Set-Network (TS2N) that directly extracts both behavioral and physiological features from a single RGB video to further enhance the performance of random hand gesture authentication. Our method adopts a new motion pseudo-modality and leverages a set-based representation to capture behavioral characteristics online. Additionally, we propose a channel-spatial attention mechanism, Contextual Squeeze-and-Excitation Network (CoSEN), to better abstract and understand physiological characteristics by explicitly modeling the channel-spatial interdependence, thereby adaptively recalibrating channel-specific and spatial-specific responses. Extensive experiments on the largest public hand gesture authentication dataset SCUT-DHGA demonstrate TS2N’s superiority against 21 state-of-the-art models in terms of EER (5.707% for full version and 6.664% for lite version) and computational cost (98.9022G for full version and 46.3741G for lite version). The code is available at https://github.com/SCUT-BIP-Lab/TS2N.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call