Multi-Spatial Pyramid Feature and Optimizing Focal Loss Function for Object Detection

Le-Yuan Gao,Sheng-Ye Wang,Zhong Qu

doi:10.1109/tiv.2023.3282996

Abstract

Previous deep convolutional neural network research has made significant progress toward improving the speed and accuracy of object detection. However, despite these advancements, the inaccurate detection of multi-object (small objects) remains challenging in the traffic environments. In this paper, we propose a new architecture called YOLOM, which is specifically designed to achieve enhanced multi-object (small objects) detection precision. YOLOM incorporates several innovative features: a multi-spatial pyramid (MSP), an optimized focal loss (OFLoss) function, and an objectness loss that incorporates effective intersection over union (EIoU) calculations. These features collectively yield enhanced accuracy and reduce the miss rate of small objects, particularly in the multi-object cases. According to the sizes of receptive field features with different spatial scales with pooling layers, we propose the MSP module. We optimize the focal loss as a classification function instead of the cross-entropy loss, which solves some class imbalance problems caused by anchor-free detection when encountering disparate datasets. Due to the superior performance of EIoU in confidence scoring, we use EIoU to participate in the objectness loss calculation of our work. Therefore, our method substitutes EIoU for YOLOX's objectness loss. The experimental results demonstrate that our strategies significantly outperform some end-to-end object detection methods.

Full Text