BiMNet: A Multimodal Data Fusion Network for continuous circular capsulorhexis Action Segmentation

Daniel Santos Da Silva,Victor Hugo C De Albuquerque,Chen Xin,Gui-Bin Bian,Jia-Ying Zheng,Zhen Li,Jie Wang,Pan Fu,Wan-Qing Wu

doi:10.1016/j.eswa.2023.121885

Abstract

Completing continuous circular capsulorhexis (CCC) requires the operator to perform fine operations, which is difficult to do accurately when continuous fine actions are out of balance in the classification of CCC procedures. Multimodal deep learning can improve the classifier’s performance, but the recognition accuracy of inferior classes is difficult to improve. To solve these problems, a bidirect-gate recurrent unit (Bi-GRU)-attention-based multimodal, multi-timescale data fusion network (BiMNet) is proposed, which contains a data extraction module called a skip-concatenate gate recurrent unit (SC-GRU), a bimodal data fusion attention computation, and a decoder module. The combination of these modules can fully extract the features of different temporal scales in multimodal action data and fuse them effectively. The model is validated using the ophthalmologist CCC multimodal maneuver dataset, which was collected by the data collection platform constructed in this research, achieving an accuracy of 0.9124 ± 0.0125 in continuous action sequence segmentation and improving the F1-score of minority class recognition to over 80%, making it more effective than baseline algorithms.

Full Text