Abstract

This paper introduces an explanatory graph representation to reveal object parts encoded inside convolutional layers of a CNN. Given a pre-trained CNN, each filter1 in a conv-layer usually represents a mixture of object parts. We develop a simple yet effective method to learn an explanatory graph, which automatically disentangles object parts from each filter without any part annotations. Specifically, given the feature map of a filter, we mine neural activations from the feature map, which correspond to different object parts. The explanatory graph is constructed to organize each mined part as a graph node. Each edge connects two nodes, whose corresponding object parts usually co-activate and keep a stable spatial relationship. Experiments show that each graph node consistently represented the same object part through different images, which boosted the transferability of CNN features. The explanatory graph transferred features of object parts to the task of part localization, and our method significantly outperformed other approaches.

Highlights

  • IN this paper, we investigate the disentanglement of intermediate-layer feature representations of a CNN pretrained for object classification

  • We propose a new metric to quantitatively evaluate whether a node consistently represents the same part in different images

  • It is assumed that if a graph node consistently represents the same object part, the distances between the inferred part and some ground-truth landmarks of the object should not change a lot through different images

Read more

Summary

Introduction

IN this paper, we investigate the disentanglement of intermediate-layer feature representations of a CNN pretrained for object classification. We notice that each filter in a CNN usually encodes mixed features of object parts and textural patterns. In this paper, given a CNN, we propose to learn an explanatory graph without any part annotations. The explanatory graph automatically reveals how object-part features are organized in the CNN. 1. disentangles features of object parts from mixed features in intermediate-layers of a CNN; 2. Encodes which object parts are usually co-activated and keep the stable spatial relationship. 1. disentangles features of object parts from mixed features in intermediate-layers of a CNN; 2. encodes which object parts are usually co-activated and keep the stable spatial relationship.

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.