Abstract

Two-stage object detectors have achieved great success in recent years. However, recent work mostly focuses on optimizing loss functions or learning multi-level feature representation, while introducing additional homogeneous task to improve detection has been under-explored. In this paper, a novel framework named as IdentifyNet is proposed, which incorporates an additional identification task to enhance the feature learning of region proposals. Specifically, besides classification and bounding box regression, the proposed IdentifyNet further learns to predict whether two different region proposals belong to the same object, thus forcing the network to learn more informative and representative features for different proposals, especially for those from the same object class. Moreover, current detectors apply greedy non-maximum suppression to remove duplicated boxes whenever their Intersaction-over-Union (IoU) exceeds a preset threshold, which would fail when two boxes largely overlap with each other while belonging to two different objects of the same class. To overcome this, we further propose a novel decode non-maximum suppression algorithm by taking advantage of the predicted identity information of different proposals from the identification task. Extensive experiments on PASCAL VOC 2007, VOC 2012 well demonstrate the proposed method can greatly improve detection performance.

Highlights

  • Object detection is at the core of computer vision, which aims at locating objects of certain categories in the images

  • The key improvements of object detection are to introduce additional tasks [2], [3] for realizing and improving traditional algorithms using convolutional networks. Such being the case, is there still be traditional algorithms in object detection can be replaced by convolutional networks without additional annotations? And how can we convert the traditional algorithms to a task of convolutional networks?

  • We present a new method for replacing the last traditional artificial algorithm greedy non-maximum suppression (NMS) by introducing decode NMS based on the task of identification, and we fully implement the end-to-end two-stage object detector based on deep learning, which is called IdentifyNet

Read more

Summary

INTRODUCTION

Object detection is at the core of computer vision, which aims at locating objects of certain categories in the images. The principle of traditional NMS is to eliminate repeated boxes through the spatial relations of different boxes It is inhomogeneous with object detection, which locates objects of certain categories based on the feature map of images. If the IoU between the detected bounding box and the ground-truth box exceeds the given threshold, the detection results would be considered as containing objects By this definition, object detectors generate a set of boxes for every object in the test stage, and eliminate repeated boxes of the same object with the help of greedy NMS. The faster R-CNN [2] proposes ConvNet-based region proposal to replace the classic region proposal It trains the two-stage detector end to end and improves the speed and performance of object detection. Cascade R-CNN [27] gradually refines the boxes of detections by multiple regression networks

NON-MAXIMUM SUPPRESSION
METHODS
FEATURE FUSION
IDENTIFYNET
LOSS FUNCTION
EXPERIMENTS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call