Abstract

Fine-grained visual classification (FGVC) is challenging task due to discriminative feature representations. The attention-based methods show great potential for FGVC, which neglect that the deeply digging inter-layer feature relations have an impact on refining feature learning. Similarly, the associating cross-layer features methods achieve significant feature enhancement, which lost the long-distance dependencies between elements. However, most of the previous researches neglect that these two methods are mutually correlated to reinforce feature learning, which are independent of each other in related models. Thus, we adopt the respective advantages of the two methods to promote fine-gained feature representations. In this paper, we propose a novel CLNET network, which effectively applies attention mechanism and cross-layer features to obtain feature representations. Specifically, CL-NET consists of 1) adopting self-attention to capture long-rang dependencies for each element, 2) associating cross-layer features to reinforce feature learning,and 3) to cover more feature regions,we integrate attention-based operations between output and input. Experiments verify that CLNET yields new state-of-the-art performance in three widely used fine-grained benchmark datasets, including CUB-200-2011, Stanford Cars and FGVC-Aircraft. The url of our code is https://github.com/dlearing/CLNET.git.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call