Abstract

In this paper, we aim to devise a new framework to compel the network to be equipped with the capability of detecting objects using image-level class labels as supervision. The challenge of such a weakly supervised setting mainly lies in how to make the network accurately understand both semantics and objectness of a given proposal without bounding box annotations. To this end, we contribute a concise and elegant framework, named Class Prototypical Network (CPNet). Concretely, our CPNet defines a set of learnable class prototypes to help classify object proposals. To endow the prototypes be not only discriminative for classes but also sensitive for proposals&#x2019; objectness, we conduct both class-aware cross-attention and location-aware cross-attention between the feature embeddings of the learnable prototypes and the object proposals. The learned attention scores are then used to form the proposal-level category information into the image-level one, making the entire framework be trained without any bounding box annotations. Besides, by applying these two kinds of attention mechanisms, the knowledge from both proposals&#x2019; location and its class information can be successfully transferred into the corresponding prototypes. With the help of prototypes, our CPNet detects true positive object proposals accurately as the learned prototypes provide a strong basis for inference. In addition, inspired by the recent progress on vision transformers, the CPNet further introduces a multi-head detection head to perform complementary training, preventing the model from falling into local discriminative parts and improving the model&#x2019;s performance on challenging non-rigid categories. We examine our CPNet on popular benchmarks, <i>i.e.</i>, PASCAL VOC 2007, 2012 and MS COCO 2014. Extensive experiments show our CPNet is a simple and effective framework.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.