Deep learning models based on Transformer and CNN are current research hotspots. However, there are more time complexity and space complexity in extracting global–local features, the feature extraction ability is insufficient for small lesions in CT images. A Re-parameterization Neighborhood Enhancement-based Dual-Stream Network (RNE-DSNet) is proposed for CT image recognition in this paper. This method improves the ability to recognize small lesions while reducing the Params number and FLOPs, which gives a trade-off between performance and cost. The main works are as following: Firstly, a Re-parameterization Neighborhood Enhancement-based Dual-Stream Network is designed to improve the global–local feature extraction ability. Secondly, the Re-parameterization Group Residual Block is designed in the CNN branch, which improves the local feature extraction ability through multi-scale and multi-branch structures, and the number of the Params and FLOPs are reduced effectively. The Neighborhood Enhancement Transformer Block is designed in the Transformer branch, in which the global attention is calculated by parallel Hadamard product and Dot product, and the global attention maps are enhanced by deep convolution operation to embed local features in global features. Thirdly, the Adaptive-random Mask Cropping method is designed to improve the features extraction ability for small lesions in CT images through heat map and random Mask techniques. RNE-DSNet is compared with other SOTA models. The Accuracy (ACC), Precision (PRE), Recall (REC), F1 score (F1), and Area Under Curve (AUC) values of RNE-DSNet are reached 99.22%, 99.23%, 99.08%, 99.23% and 99.38%, respectively, which verified the effectiveness of the RNE-DSNet model. The number Params and FLOPs are reduced by 62.78% and 84.20%, respectively. The effectiveness of RNE-DSNet is verified by heat map visualization. The performance of RNE-DSNet is generally better than the others methods. There is positive significance for computer aided diagnosis.
Read full abstract