Unifying Classification and Bounding Box Regression Head For Object Detection

Cunzhang Gao,Haitao Gu,Xingzhen Li,Siquan Yu

doi:10.1088/1742-6596/2216/1/012106

Abstract

Object detection usually includes two parts: objection classification and location. At present, the popular object detectors usually use two detection heads: one head is used to predict classification score, and the other one is used to predict the bounding box (bbox), respectively. In this paper, we first stack classification head after feature extract convolutional neural networks of bbox regression head. Then, we establish the classification networks by using a bounding box feature. The bounding box feature is very useful when the classification head uses soft Intersection over Union (IoU) labels. In experiment parts, only using PASCAL VOC 2007 datasets, soft Centerness labels, and soft IoU labels get 50.06 mAP and 52.08 mAP on VOC 2007 test. Compared with FCOS, they have 1.08% and 1.12% improvements. Using PASCAL VOC 2007 and 2012 datasets, our Union A*B head gets 78.71mAP after 12 epochs training with ResNet-101 as backbone and FPN as the neck. Extensive experimental results show that the proposed algorithm is superior to other detection methods.

Full Text