Finger joint angle (FJA) estimation, as a dynamic and fine-grained decoding mode, can support intuitive and natural human–machine interactions. This study is pioneering work dedicated to achieving accurate and clinically friendly FJA estimation. In this context, a novel surface electromyography and force myography-based multimodal multistream multilevel fusion network (MMMFNet) is proposed. MMMFNet utilises gate-recurrent units (GRU) networks to capture spatial dependencies among features and adopts a multilevel fusion strategy to extract more comprehensive representations from information sources than a single-level fusion. A cross-stream interaction (CSI) block is added between the GRU networks to focus on important information in the prediction and optimise the learning process and representation capability by facilitating information interaction among branch networks. The experimental results suggest that the MMMFNet has higher estimation accuracy (RMSE: 6.225±0.276 vs. 6.953±0.282, R2: 0.879±0.019 vs 0.846±0.021), shorter training time (4.146±0.074 min vs. 16.536±0.430 min), and lower computational complexity (5.4×106 FLOPs vs 8.8×106 FLOPs) than state-of-the-art fusion methods. Moreover, MMMFNet, which quantifies feature importance using attention weights, is the first explainable multimodal model for hand movement intention decoding. In summary, compared with state-of-the-art methods, MMMFNet can not only estimate FJAs more accurately but is also more clinically friendly.
Read full abstract