Abstract

Fine-grained image recognition, a computer vision task filled with challenges due to its imperceptible inter-class variance and large intra-class variance, has been drawing increasing attention. While manual annotation can be utilized to effectively enhance performance in this task, it is extremely time-consuming and expensive. Recently, Convolutional Neural Networks (CNN) achieved state-of-the-art performance in image classification. We propose a fine-grained image recognition framework by exploiting CNN as the raw feature extractor along with several effective methods including a feature encoding method, a feature weighting method, and a strategy to better incorporate information from multi-scale images to further improve recognition ability. Besides, we investigate two dimension reduction methods and successfully merge them to our framework to compact the final image representation. Based on the discriminative and compact framework, we achieved the state-of-the-art performance in terms of classification accuracy on several fine-grained image recognition benchmarks based on weekly supervision.

Highlights

  • As a fashionable topic in computer vision, fine-grained image recognition has been attracting increasingly attention from both academia and industry in the past few years

  • We propose our fine-grained image recognition framework based on FV-Convolutional Neural Networks (CNN) and gear it with a novel strategy of utilizing multi-scale information

  • 5 Compact FV-CNN We propose to concatenate Fisher Vectors and VLAD to form VLAD-FV, which inevitably increases dimension of image representation

Read more

Summary

Introduction

As a fashionable topic in computer vision, fine-grained image recognition has been attracting increasingly attention from both academia and industry in the past few years. In view of the above annoying problems, part annotations, object bounding box or part annotations are often used to eliminate background noise and to highlight the discriminative part [1,2,3,4,5,6], such as the whole body, head, or torso of a bird These manual labeling works are always extremely time-consuming, expensive, and Without such expensive manual label information, powerful image representation will be the key factor for fine-grained visual recognition. We propose our fine-grained image recognition framework based on FV-CNN and gear it with a novel strategy of utilizing multi-scale information. We investigate two dimension reduction methods: Tensor Sketch approximation [9] and Mutual Information dimension selection [10] to compact FV-CNN which is rather highdimensional and cannot be generalized to large-scale task

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.