Accurate single-shot object detection is an extremely challenging task in real environments because of complex scenes, occlusion, ambiguities, blur, and shadow, i.e., these factors are called uncertainty problem. It leads to unreliable labeling of bounding box annotation and makes detectors arduous to learn bounding box localization. Previous methods viewed the ground truth box coordinates as a rigid distribution omitting localization uncertainty in real datasets. This article proposes a novel bounding box encoding algorithm integrated into the single-shot detector (BBENet) to consider the flexible distribution of bounding box localization. First, discretized ground truth labels are generated by decomposing each object’s boundary into multiple boundaries. The new representation of ground truth boxes is more arbitrary and flexible to cover any case of complex scenes. During training, the detector directly learns discretized box locations instead of continuous domain. Second, the bounding box encoding algorithm reorganizes bounding box predictions to be more accurate. Furthermore, another problem in existing methods is inconsistency in estimating detection quality. The single-shot detection consists of classification and localization tasks, but the popular detectors consider the classification score as the final detection quality. Thus, it lacks localization quality and hinders the overall performance because both tasks have a positive correlation. To overcome this problem, BBENet introduces detection quality by combining the localization and classification quality to rank detection during nonmaximum suppression. The localization quality is computed based on how uncertain the predicted boxes are, which is a new perspective in detection literature. The proposed BBENet is evaluated on three benchmark datasets, i.e., MS-COCO, Pascal VOC, and CrowdHuman. Without bells and whistles, BBENet outperforms the existing methods by a large margin with comparable speed, achieving the state-of-the-art single-shot detector.
Read full abstract