Wearable ultrasound (US) is a novel sensing approach that shows promise in multiple application domains, and specifically in hand gesture recognition (HGR). In fact, US enables to collect information from deep musculoskeletal structures at high spatiotemporal resolution and high signal-to-noise ratio, making it a perfect candidate to complement surface electromyography for improved accuracy performance and on-the-edge classification. However, existing wearable solutions for US-based gesture recognition are not sufficiently low power for continuous, long-term operation. On top of that, practical hardware limitations of wearable US devices (limited power budget, reduced wireless throughput, and restricted computational power) set the need for the compressed size of models for feature extraction and classification. To overcome these limitations, this article presents a novel end-to-end approach for feature extraction from raw musculoskeletal US data suited for edge computing, coupled with an armband for HGR based on a truly wearable (12 cm2, 9 g), ultralow-power (ULP) (16 mW) US probe. The proposed approach uses a 1-D convolutional autoencoder (CAE) to compress raw US data by 20× while preserving the main amplitude features of the envelope signal. The latent features of the autoencoder are used to train an XGBoost classifier for HGR on datasets collected with a custom US armband, considering armband removal/repositioning in between sessions. Our approach achieves a classification accuracy of 96%. Furthermore, the proposed unsupervised feature extraction approach offers generalization capabilities for intersubject use, as demonstrated by testing the pretrained encoder on a different subject and conducting posttraining analysis, revealing that the operations performed by the encoder are subject-independent. The autoencoder is also quantized to 8-bit integers and deployed on a ULP wearable US probe along with the XGBoost classifier, allowing for a gesture recognition rate ≥ 25 Hz and leading to 21% lower power consumption [at 30 frames/s (FPS)] compared to the conventional approach (raw data transmission and remote processing).