Bounding box regression is a crucial step in most object detection algorithms, and directly affects the positioning accuracy and regression speed of convolutional neural networks (CNN). The existing loss functions commonly used in bounding box regression suffer two main disadvantages: firstly, the ln-norm loss does not match the evaluation metric Intersection over Union (IOU), leading to poor regression performance. Second, some recently proposed IOU-based loss functions are beneficial to IOU metric, but the negative effects of some terms in these loss functions on bounding box regression lead to slow convergence and inaccurate regression results. To solve these shortcomings, we proposed a Manhattan-Distance IOU (MIOU) loss function here. It takes into account that the Euclidean distance term in the Complete IOU (CIOU) loss and the Efficient IOU (EIOU) loss is unstable in training due to the huge gradient in the early stage of regression, and the Manhattan distance is added to effectively alleviate this defect. In addition, the denominator of the Euclidean distance term in the two loss functions discussed above has an antagonistic effect on loss reduction, and setting it as a normalized coefficient without participating in backpropagation can effectively improve the convergence speed. The effectiveness of the proposed MIOU loss was verified with designed simulation experiments. Moreover, object detection is usually applied to natural scenes and remote sensing scenes, but the application of detection methods are often limited due to varied image characteristics in different scene settings. We incorporated the MIOU loss into YOLO v4 and other mainstream object detection networks to examine its effectiveness in remote sensing and natural object detection scenarios. The experimental results on real remote sensing datasets DOTA and natural datasets MS COCO demonstrate that the MIOU loss has strong robustness in both remote sensing object detection tasks and natural object detection tasks. In summary, as a general regression loss function, the MIOU loss shows excellent performance in the above two types of scenes.
Read full abstract