Abstract
Vegetable and fruit recognition can be considered as a fine-grained visual categorization (FGVC) task, which is challenging due to the large intraclass variances and small interclass variances. A mainstream direction to address the challenge is to exploit fine-grained local/global features to enhance the feature extraction and representation in the learning pipeline. However, unlike the human visual system, most of the existing FGVC methods only extract features from individual images during training. In contrast, human beings can learn discriminative features by comparing two different images. Inspired by this intuition, a recent FGVC method, named Attentive Pairwise Interaction Network (API-Net), takes as input an image pair for pairwise feature interaction and demonstrates superior performance in several open FGVC data sets. However, the accuracy of API-Net on VegFru, a domain-specific FGVC data set, is lower than expected, potentially due to the lack of spatialwise attention. Following this direction, we propose an FGVC framework named Attention-aware Interactive Features Network (AIF-Net) that refines the API-Net by integrating an attentive feature extractor into the backbone network. Specifically, we employ a region proposal network (RPN) to generate a collection of informative regions and apply a biattention module to learn global and local attentive feature maps, which are fused and fed into an interactive feature learning subnetwork. The novel neural structure is verified through extensive experiments and shows consistent performance improvement in comparison with the SOTA on the VegFru data set, demonstrating its superiority in fine-grained vegetable and fruit recognition. We also discover that a concatenation fusion operation applied in the feature extractor, along with three top-scoring regions suggested by an RPN, can effectively boost the performance.
Highlights
Despite the consistent improvement in the application of convolutional neural networks (CNNs) to various computer vision tasks, fine-grained visual categorization (FGVC)is still a challenging task due to the large intraclass variance, small interclass variance, and the difficulties in obtaining part annotations [1,2]
Attention-aware Interactive Features Network (AIF-Net) that refines the Attentive Pairwise Interaction Network (API-Net) by integrating an attentive feature extractor into the backbone network
We discover that a concatenation fusion operation applied in the feature extractor, along with three top-scoring regions suggested by an region proposal network (RPN), can effectively boost the performance
Summary
Despite the consistent improvement in the application of convolutional neural networks (CNNs) to various computer vision tasks, fine-grained visual categorization (FGVC). A common goal of these methods is to enhance a model’s capability to exploit distinguishable fine-grained features from global or local regions for performance boosting Their main difference is that the former focuses on certain informative regions of an image, while the latter aims to find critical patterns from the whole image. Humans often recognize fine-grained objects by comparing image pairs to extract subtle visual differences that can be used as distinguishable features Inspired by this intuition, recent efforts have explored ways to learn interactive features from image pairs. The proposed AIF-Net consists of three components, including: (1) An attentive feature extractor that allows the network to identify and learn from critical areas in an image where distinguishable patterns may reside in.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.