Convolutional Neural Networks (CNNs) are extensively utilized in medical disease diagnosis, demonstrating the prominent performance in most cases. However, medical image processing based on deep learning faces some challenges. The limited availability and time-consuming annotations of medical image data restrict the scale and accuracy of model training. Data diversity and complexity further complicate these challenges. In order to address these issues, we introduce the Double Branch Convolutional Transformer (DBCvT), a hybrid CNN-Transformer feature extractor, which can better capture diverse fine-grained features and remain suitable for small datasets. In this model, separable downsampling convolution (SDConv) is used to mitigate excessive information loss during downsampling in standard convolutions. Additionally, we propose the Dual branch Channel Efficient multi-head Self-Attention (DCESA) mechanism to enhance the self-attention efficiency, consequently elevating network performance and effectiveness. Moreover, we introduce a novel convolutional channel-enhanced Attention mechanism to strengthen inter-channel relationships within feature maps post self-attention. The experiments of DBCvT on various medical image datasets have demonstrated the outstanding classification performance and generalization capability of the proposed model.
Read full abstract