Abstract

Extracting discriminative fine-grained features is essential for fine-grained image recognition tasks. Many researchers utilize expensive human annotations to learn discriminative part models, which may be impossible for real-world applications. Recently, bilinear pooling has been frequently adopted and has shown its effectiveness owing to its learning discriminative regions automatically. However, most bilinear pooling models still utilize the all convolutional part/region features for recognition, including those noisy or even harmful feature elements. In this paper, we devise a novel fine-grained image classification approach by the H ierarchical B ilinear P ooling with A ggregated S lack M ask (HBPASM) model. The proposed model generates a RoI-aware image feature representation for better performance. We conduct experiments on three frequently used fine-grained image classification datasets. The experimental results demonstrate that HBPASM achieves competitive performance or even match the state-of-the-art methods on CUB-200-2011, Stanford Cars, and FGVC-Aircraft, respectively.

Highlights

  • Owing to the development of deep learning, many efforts have been made in many computer vision tasks

  • We develop a hierarchical bilinear pooling with aggregated slack mask model for fine-grained image recognition

  • HIERARCHICAL BILINEAR POOLING WITH AGGREGATED SLACK MASK MODEL We develop a novel Hierarchical Bilinear Pooling with Aggregated Slack Mask (HBPASM) model for fine-grained classification to generate better RoI-aware image representation

Read more

Summary

INTRODUCTION

Owing to the development of deep learning, many efforts have been made in many computer vision tasks. Many efforts have been made to design part-based models to localize object parts as the distinctive regions [7]–[12] These models are obtained by analyzing the convolutional activations from neural network in an unsupervised manner or discriminatively training part detectors with supervised bounding-box/part annotations. Others utilized unsupervised learning schemes to locate informative regions without additional annotations [4], [11], [21]–[27] They are concentrated on learning attention models to detect objects or local regions, and adopting the features extracted within the RoIs for improved classification. These models are too heuristic and have the risk of discarding some important foreground regions To address this issue, we propose a novel aggregated slack mask model to extract robust RoI-aware features for interaction. The aggregated mask generates more reliable image mask by taking full advantage of multiple masks learned on different layers

RELATED WORK
PRELIMINARY
HBP MODEL
HIERARCHICAL BILINEAR POOLING WITH AGGREGATED SLACK MASK MODEL
EXPERIMENTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call