Improving the detection accuracy and speed for small and multi-object detection is a hot issue in traffic environments. Despite the substantial advances in object detection algorithms based on deep neural networks, addressing the inaccuracy and low efficiency of small and multi-object detection remains challenging. In this paper, we propose a bidirectional attention network called BANet, which includes multichannel attention (MCA) blocks, alpha-effective intersection-over-union (α-EIoU) loss, and a multiple attention fusion (MAF) module. Each MCA block consists of low-layer, medium-layer, and high-layer features to provide rich base information for feature fusion at the neck module. We introduce MAF to alleviate the spatial location loss and poor semantic performance resulting from the continuous downsampling of the path aggregation feature pyramid network (PAFPNet). Finally, α-EIoU is our regression loss module, which calculates the difference between the predicted box and the ground truth (gt) box. Our study further demonstrates that these strategies yield significant improvements in performance over some existing YOLO detectors. Compared with the performance of YOLOX, BANet demonstrates 0.39%–0.52% mAP@0.5 improvement on the PASCAL VOC 2007 (VOC 07) dataset and 0.55%–2.93% mAP@0.5 improvement on the PASCAL VOC 2012 (VOC 12) dataset. Additionally, 0.3%–1.01% improvement in the mAP@0.5 is achieved on the MS COCO 2017 (COCO 17) dataset, indicating that BANet has a significant effect on multi-object detection. Experiments to determine the approximate number of parameters with YOLOX, show that our strategy not only improves by 7.5 frames per second (FPS) but also reduces the Average forward time by 0.97 ms.
Read full abstract