Abstract

In the work setting of deep learning, most of the neural networks employed for visual object detection in recent years are based on bounding box regression. The performance of active detectors through multi-step decision-making is limited by the rough model design. However, from the perspective of cognitive science, the recognition in the human visual system is a decision process from coarse to fine. Based on the theory of “see the forest first, then the trees”, this paper proposes a dynamic coarse-to-fine gaze strategy for active object detection, named AHDet, which takes the key points as the realization carrier of the coarse-to-fine concept. The detection process is divided into two steps, AIM and HIT. In the step of AIM, the positioning and prior bounding boxes for objects are given by detecting the center points, referring to the first glance. In the step of HIT, bounding boxes are dynamically adjusted to obtain compact bounding boxes with the help of the corner points, referring to the careful observation. With the design of the two-step coarse-to-fine gaze process, AHDet outperforms traditional approaches. A series of experiments performed on MS-COCO and PASCAL VOC dataset demonstrate the advantages of AHDet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call