Abstract

Abstract Mid-level element based representations have been proven to be very effective for visual recognition. This paper presents a method to discover discriminative mid-level visual elements based on deep Convolutional Neural Networks (CNNs). We present a part-level CNN architecture, namely Part-based CNN (P-CNN), which acts as a role of encoding module in a part-based representation model. The P-CNN can be attached at arbitrary layer of a pre-trained CNN and be trained using image-level labels. The training of P-CNN essentially corresponds to the optimization and selection of discriminative mid-level visual elements. For an input image, the output of P-CNN is naturally the part-based coding and can be directly used for image recognition. By applying P-CNN to multiple layers of a pre-trained CNN, more diverse visual elements can be obtained for visual recognitions. We validate the proposed P-CNN on several visual recognition tasks, including scene categorization, action classification and multi-label object recognition. Extensive experiments demonstrate the competitive performance of P-CNN in comparison with state-of-the-arts.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.