Abstract

The key to solving the fine-grained image recognition is exploring more discriminative features for capturing tiny hints. In particular, the triplet objective function fits well with the fine-grained image recognition task because they capture the semantic similarity between images. However, triplet loss needs many pairs of tuples with hard negative samples, and it takes too much cost. To alleviate this problem, we propose a new framework that generates features of the hard negative samples. The proposed framework consists of three stages: learning part-wise features, enriching refined hard negative samples, and fine-grained image recognition. Our proposed method has achieved state-of-the-art performance in CUB-200-2011, Stanford Cars, FGVC-Aircraft, and DeepFashion datasets. Also, our extensive experiments demonstrate that each stage has a good effect on the final goal.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call