CCGL-YOLOV5:A cross-modal cross-scale global-local attention YOLOV5 lung tumor detection model

Tao Zhou,Fengzhen Liu,Xinyu Ye,Hongwei Wang,Huiling Lu

doi:10.1016/j.compbiomed.2023.107387

Abstract

BackgroundMultimodal medical image detection is a key technology in medical image analysis, which plays an important role in tumor diagnosis. There are different sizes lesions and different shapes lesions in multimodal lung tumor images, which makes it difficult to effectively extract key features of lung tumor lesions. MethodsA Cross-modal Cross-scale Clobal-Local Attention YOLOV5 Lung Tumor Detection Model (CCGL-YOLOV5) is proposed in this paper. The main works are as follows: Firstly, the Cross-Modal Fusion Transformer Module (CMFTM) is designed to improve the multimodal key lesion feature extraction ability and fusion ability through the interactive assisted fusion of multimodal features; Secondly, the Global-Local Feature Interaction Module (GLFIM) is proposed to enhance the interaction ability between multimodal global features and multimodal local features through bidirectional interactive branches. Thirdly, the Cross-Scale Attention Fusion Module (CSAFM) is designed to obtain rich multi-scale features through grouping multi-scale attention for feature fusion. ResultsThe comparison experiments with advanced networks are done. The Acc, Rec, mAP, F1 score and FPS of CCGL-YOLOV5 model on multimodal lung tumor PET/CT dataset are 97.83%, 97.39%, 96.67%, 97.61% and 98.59, respectively; The experimental results show that the performance of CCGL-YOLOV5 model in this paper are better than other typical models. ConclusionThe CCGL-YOLOV5 model can effectively use the multimodal feature information. There are important implications for multimodal medical image research and clinical disease diagnosis in CCGL-YOLOV5 model.

Full Text