Fine-grained Image Recognition Research Articles

IntroductionThe objective of fine-grained image classification on marine organisms is to distinguish the subtle variations in the organisms so as to accurately classify them into subcategories. The key to accurate classification is to locate the distinguishing feature regions, such as the fish’s eye, fins, or tail, etc. Images of marine organisms are hard to work with as they are often taken from multiple angles and contain different scenes, additionally they usually have complex backgrounds and often contain human or other distractions, all of which makes it difficult to focus on the marine organism itself and identify its most distinctive features.Related workMost existing fine-grained image classification methods based on Convolutional Neural Networks (CNN) cannot accurately enough locate the distinguishing feature regions, and the identified regions also contain a large amount of background data. Vision Transformer (ViT) has strong global information capturing abilities and gives strong performances in traditional classification tasks. The core of ViT, is a Multi-Head Self-Attention mechanism (MSA) which first establishes a connection between different patch tokens in a pair of images, then combines all the information of the tokens for classification.MethodsHowever, not all tokens are conducive to fine-grained classification, many of them contain extraneous data (noise). We hope to eliminate the influence of interfering tokens such as background data on the identification of marine organisms, and then gradually narrow down the local feature area to accurately determine the distinctive features. To this end, this paper put forwards a novel Transformer-based framework, namely Token-Selective Vision Transformer (TSVT), in which the Token-Selective Self-Attention (TSSA) is proposed to select the discriminating important tokens for attention computation which helps limits the attention to more precise local regions. TSSA is applied to different layers, and the number of selected tokens in each layer decreases on the basis of the previous layer, this method gradually locates the distinguishing regions in a hierarchical manner.ResultsThe effectiveness of TSVT is verified on three marine organism datasets and it is demonstrated that TSVT can achieve the state-of-the-art performance.

The aim of webly supervised fine-grained image recognition (FGIR) is to distinguish sub-ordinate categories based on data retrieved from the Internet, which can significantly mitigate the dependence of deep learning on manually annotated labels. Most current fine-grained image recognition algorithms use a large-scale data-driven deep learning paradigm, which relies heavily on manually annotated labels. However, there is a large amount of weakly labeled free data on the Internet. To utilize fine-grained web data effectively, this paper proposes a Graph Representation and Metric Learning (GRML) framework to learn discriminative and effective holistic–local features by graph representation for web fine-grained images and to handle noisy labels simultaneously, thus effectively using webly supervised data for training. Specifically, we first design an attention-focused module to locate the most discriminative region with different spatial aspects and sizes. Next, a structured instance graph is constructed to correlate holistic and local features to model the holistic–local information interaction, while a graph prototype that contains both holistic and local information for each category is introduced to learn category-level graph representation to assist in processing the noisy labels. Finally, a graph matching module is further employed to explore the holistic–local information interaction through intra-graph node information propagation as well as to evaluate the similarity score between each instance graph and its corresponding category-level graph prototype through inter-graph node information propagation. Extensive experiments were conducted on three webly supervised FGIR benchmark datasets, Web-Bird, Web-Aircraft and Web-Car, with classification accuracy of 76.62%, 85.79% and 82.99%, respectively. In comparison with Peer-learning, the classification accuracies of the three datasets separately improved 2.47%, 4.72% and 1.59%.

Fine-grained Image Recognition Research Articles

Related Topics

Articles published on Fine-grained Image Recognition

Adopting multiple vision transformer layers for fine-grained image representation

Fine-grained Image Recognition via Attention Interaction and Counterfactual Attention Network

Fine-Grained Image Recognition by Means of Integrating Transformer Encoder Blocks in a Robust Single-Stage Object Detector

SelectAugment: Hierarchical Deterministic Sample Selection for Data Augmentation

Token-Selective Vision Transformer for fine-grained image recognition of marine organisms

INTS-Net: Improved Navigator-Teacher-Scrutinizer Network for Fine-Grained Visual Categorization

Hybrid Granularities Transformer for Fine-Grained Image Recognition

Hierarchical full-attention neural architecture search based on search space compression

A few-shot fine-grained image recognition method

A teacher-student based attention network for fine-grained image recognition

Siamese transformer with hierarchical concept embedding for fine-grained image recognition

Class-attention-based lesion proposal convolutional neural network for strawberry diseases identification.

Selecting and fusing coarse-and-fine granularity features for fine-grained image recognition

Associating multiple vision transformer layers for fine-grained image representation

Webly Supervised Fine-Grained Image Recognition with Graph Representation and Metric Learning

Fine-Grained Image Analysis With Deep Learning: A Survey

Fine-grained image recognition via trusted multi-granularity information fusion

Fine-grained Image Recognition Method using Discriminative Region-based Data Augmentation

3D Convolution ViT Network-based Fine-grained Image Recognition

Discriminative Feature Mining and Enhancement Network for Low-Resolution Fine-Grained Image Recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Fine-grained Image Recognition Research Articles

Related Topics

Articles published on Fine-grained Image Recognition

Adopting multiple vision transformer layers for fine-grained image representation

Fine-grained Image Recognition via Attention Interaction and Counterfactual Attention Network

Fine-Grained Image Recognition by Means of Integrating Transformer Encoder Blocks in a Robust Single-Stage Object Detector

SelectAugment: Hierarchical Deterministic Sample Selection for Data Augmentation

Token-Selective Vision Transformer for fine-grained image recognition of marine organisms

INTS-Net: Improved Navigator-Teacher-Scrutinizer Network for Fine-Grained Visual Categorization

Hybrid Granularities Transformer for Fine-Grained Image Recognition

Hierarchical full-attention neural architecture search based on search space compression

A few-shot fine-grained image recognition method

A teacher-student based attention network for fine-grained image recognition

Siamese transformer with hierarchical concept embedding for fine-grained image recognition

Class-attention-based lesion proposal convolutional neural network for strawberry diseases identification.

Selecting and fusing coarse-and-fine granularity features for fine-grained image recognition

Associating multiple vision transformer layers for fine-grained image representation

Webly Supervised Fine-Grained Image Recognition with Graph Representation and Metric Learning

Fine-Grained Image Analysis With Deep Learning: A Survey

Fine-grained image recognition via trusted multi-granularity information fusion

Fine-grained Image Recognition Method using Discriminative Region-based Data Augmentation

3D Convolution ViT Network-based Fine-grained Image Recognition

Discriminative Feature Mining and Enhancement Network for Low-Resolution Fine-Grained Image Recognition