Abstract

Objects often have different appearances because of viewpoint changes or part deformation. How to reasonably model these variations is still a big challenge for object detection. In this paper, we propose a novel Deformable Template Network (DTN), which exploits the pictorial structure to model possible variations of an object. DTN represents an object by virtue of a generated template in a deformable way. It has two key modules: the template generating module and the part matching module. The template generating module produces a template for a given object which defines the anchor positions of the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k{\times }k$</tex-math></inline-formula> parts. Based on such a template, the part matching module aims to perform part alignment around the anchor positions. In terms of each part, the matching process makes a trade-off between maximizing the detection score and minimizing the deformation cost relative to the anchor position. Moreover, DTN is a fully convolutional network which means it is competitive in terms of detection efficiency. We evaluate DTN on both the PASCAL VOC and MSCOCO datasets, achieving the state-of-the-art results, an accuracy of 82.7% for PASCAL VOC and of 44.9% for MSCOCO.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.