Unsupervised Feature Extraction from Raw Data for Gesture Recognition with Wearable Ultra Low-Power Ultrasound.

Sergei Vostrikov,Matteo Anderegg,Luca Benini,Andrea Cossettini

doi:10.1109/tuffc.2024.3404997

Abstract

Wearable ultrasound is a novel sensing approach that shows promise in multiple application domains, and specifically in hand gesture recognition. In fact, ultrasound enables to collect information from deep musculoskeletal structures at high spatiotemporal resolution and high signal-to-noise ratio, making it a perfect candidate to complement surface electromyography for improved accuracy performance and on-the-edge classification. However, existing wearable solutions for ultrasound-based gesture recognition are not sufficiently low-power for continuous, long-term operation. On top of that, practical hardware limitations of wearable ultrasound devices (limited power budget, reduced wireless throughput, restricted computational power) set the need for the compressed size of models for feature extraction and classification. To overcome these limitations, this paper presents a novel end-to-end approach for feature extraction from raw musculoskeletal ultrasound data suited for edge-computing, coupled with an armband for hand gesture recognition based on a truly wearable (12 cm2, 9 g), ultra-low power (16 mW) ultrasound probe. The proposed approach uses a 1D convolutional autoencoder to compress raw ultrasound data by 20× while preserving the main amplitude features of the envelope signal. The latent features of the autoencoder are used to train an XGBoost classifier for hand gesture recognition on datasets collected with a custom US armband, considering armband removal/repositioning in between sessions. Our approach achieves a classification accuracy of 96%. Furthermore, the proposed unsupervised feature extraction approach offers generalization capabilities for inter-subject use, as demonstrated by testing the pre-trained Encoder on a different subject and conducting post-training analysis, revealing that the operations performed by the Encoder are subject-independent. The autoencoder is also quantized to 8-bit integers and deployed on an ultra-low-power wearable ultrasound probe along with the XGBoost classifier, allowing for a gesture recognition rate ≥ 25 Hz and leading to 21% lower power consumption (at 30 FPS) compared to the conventional approach (raw data transmission and remote processing).

Full Text