Abstract

Fine-grained recognition is a challenging task due to small intra-category variances. Most of the top-performing fine-grained recognition methods leverage parts of objects for better performance. Therefore, part annotations which are extremely computationally expensive are required. In this paper, we propose a novel cascaded deep CNN detection framework for fine-grained recognition which is trained to detect a whole object without considering parts. Nevertheless, most of the current top-performing detection networks use N + 1 class (N object categories plus background) softmax loss. The background category with much more training samples dominates the feature learning progress where the features are not suitable for object categorisation with fewer samples. To address this issue, we here introduce two strategies: 1) We leverage a cascaded structure to eliminate the background. 2) We introduce a novel one-vs-rest loss function to capture more minute variances from different subordinate categories. Experiments show that our proposed recognition framework achieves comparable performance against the state-of-the-art, part-free, fine-grained recognition methods on the CUB-200-2011 Bird dataset. Meanwhile, our method outperforms most of the existing part annotation based methods and does not need part annotations at the training stage whilst being free from any annotations at the test stage.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.