Mammography is an effective method for diagnosing breast diseases, and computer-aided detection (CAD) systems play an important role in the detection of breast masses. However, low contrast and the interference of surrounding tissues make the detection of masses challenging. In this paper, an efficient RetinaNet network named ERetinaNet is proposed to improve the accuracy and inference speed of mammographic breast mass detection. Efficient modules are designed and introduced into the network to facilitate the extraction of comprehensive features, while the structure of the network is simplified to improve the inference speed. A Faster RepVGG (FRepVGG) architecture is first proposed as the backbone network that utilizes three effective strategies: 1) The multi-branch structure used during training enhances learning, and it is equivalently converted to a single-path structure during inference by re-parameterization technique to accelerate the detection speed. 2) The Extraction operation is proposed to condense the features of intermediate layers. 3) An effective Multi-spectral Channel Attention (eMCA) module is added in the last layer of each stage, enabling the network to pay more attention to the target region. In addition, Vision Transformer (ViT) is added to ERetinaNet, which enables ERetinaNet to learn global semantic information. The detection head is simplified to make ERetinaNet more efficient. The experimental results show that compared with the original RetinaNet, ERetinaNet improves the mean Average Precision (mAP) from 79.16% to 85.01% and significantly shortens the inference time. Moreover, the detection accuracy of ERetinaNet outperforms other excellent object detection networks, such as Faster R-CNN, SSD, YOLOv3 and YOLOv7.
Read full abstract