Stanford Dogs Research Articles

Insect diversity monitoring is crucial for biological pest control in agriculture and forestry. Modern monitoring of insect species relies heavily on fine-grained image classification models. Fine-grained image classification faces challenges such as small inter-class differences and large intra-class variances, which are even more pronounced in insect scenes where insect species often exhibit significant morphological differences across multiple life stages. To address these challenges, we introduce segmentation and clustering operations into the image classification task and design a novel network model training framework for fine-grained classification of insect images using multi-modality clustering and approximate mask methods, named PCAM-Frame. In the first stage of the framework, we adopt the Polymorphic Clustering Module, and segmentation and clustering operations are employed to distinguish various morphologies of insects at different life stages, allowing the model to differentiate between samples at different life stages during training. The second stage consists of a feature extraction network, called Basenet, which can be any mainstream network that performs well in fine-grained image classification tasks, aiming to provide pre-classification confidence for the next stage. In the third stage, we apply the Approximate Masking Module to mask the common attention regions of the most likely classes and continuously adjust the convergence direction of the model during training using a Deviation Loss function. We apply PCAM-Frame with multiple classification networks as the Basenet in the second stage and conduct extensive experiments on the Insecta dataset of iNaturalist 2017 and IP102 dataset, achieving improvements of 2.2% and 1.4%, respectively. Generalization experiments on other fine-grained image classification datasets such as CUB200-2011 and Stanford Dogs also demonstrate positive effects. These experiments validate the pertinence and effectiveness of our framework PCAM-Frame in fine-grained image classification tasks under complex conditions, particularly in insect scenes.

Read full abstract

The aim of weakly supervised object co-localization is to locate different objects of the same superclass in a dataset. Recent methods achieve impressive co-localization performance by multiple instance learning and self-supervised learning. However, these methods ignore the common part information shared by fine-grained objects and the influence of the complementary parts on the co-localization of the fine-grained objects. To solve these issues, we propose a complementary parts contrastive learning method for fine-grained weakly supervised object co-localization. The proposed method follows such an assumption that fine-grained object parts with the same/different semantic meaning should have similar/dissimilar feature representations in the feature space. The proposed method tackles two critical issues in this task: <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i</i> ) how to spread the model’s attention and suppress the complex background noise, and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ii</i> ) how to leverage the cross-category common parts information to mitigate the context co-occurrence problem. To address <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i</i> ), we attempt to integrate local and context cues via three types of attention including self-supervised attention, channel, and spatial attention to spread the model’s attention toward automatically identifying and localizing most discriminative parts of objects in the fine-grained images. To solve <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ii</i> ), we propose a cross-category object complementarity part contrastive learning module to identify the extracted part regions with different semantic information by pulling the same part features closer and pushing different part features away, which can mitigate the confounding bias caused by the co-occurrence surroundings within specific classes. Extensive qualitative and quantitative evaluations demonstrate the effectiveness of the proposed method on four fine-grained co-localization datasets: CUB-200-2011, Stanford Cars, FGVC-Aircraft, and Stanford Dogs. Code and models are available at https://github.com/Zhao-fan/CPCL.

Read full abstract

Stanford Dogs Research Articles

Related Topics

Articles published on Stanford Dogs

Promoting the Shift From Pixel-Level Correlations to Object Semantics Learning by Rethinking Computer Vision Benchmark Data Sets.

Fine-Grained Few-Shot Image Classification Based on Feature Dual Reconstruction

Adversarially attack feature similarity for fine-grained visual classification

KLSANet: Key local semantic alignment Network for few-shot image classification

Polymorphic Clustering and Approximate Masking Framework for Fine-Grained Insect Image Classification

Feature alignment via mutual mapping for few-shot fine-grained visual classification

Keep the Faith: Faithful Explanations in Convolutional Neural Networks for Case-Based Reasoning

Robust fine‐grained visual recognition with images based on internet of things

Fine-grained image classification based on TinyVit object location and graph convolution network

Pyramid hybrid pooling quantization for efficient fine-grained image retrieval

Explicitly learning augmentation invariance for image classification by Consistent Augmentation

Complementary Parts Contrastive Learning for Fine-Grained Weakly Supervised Object Co-Localization

Improved transfer learning using textural features conflation and dynamically fine-tuned layers.

Patch-Level Consistency Regularization in Self-Supervised Transfer Learning for Fine-Grained Image Recognition

Abnormality Detection of Blast Furnace Tuyere Based on Knowledge Distillation and a Vision Transformer

Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification

How to train your pre-trained GAN models

Fine-Grained Visual Categorization: A Spatial–Frequency Feature Fusion Perspective

Hybrid Granularities Transformer for Fine-Grained Image Recognition

Loop and distillation: Attention weights fusion transformer for fine‐grained representation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Stanford Dogs Research Articles

Related Topics

Articles published on Stanford Dogs

Promoting the Shift From Pixel-Level Correlations to Object Semantics Learning by Rethinking Computer Vision Benchmark Data Sets.

Fine-Grained Few-Shot Image Classification Based on Feature Dual Reconstruction

Adversarially attack feature similarity for fine-grained visual classification

KLSANet: Key local semantic alignment Network for few-shot image classification

Polymorphic Clustering and Approximate Masking Framework for Fine-Grained Insect Image Classification

Feature alignment via mutual mapping for few-shot fine-grained visual classification

Keep the Faith: Faithful Explanations in Convolutional Neural Networks for Case-Based Reasoning

Robust fine‐grained visual recognition with images based on internet of things

Fine-grained image classification based on TinyVit object location and graph convolution network

Pyramid hybrid pooling quantization for efficient fine-grained image retrieval

Explicitly learning augmentation invariance for image classification by Consistent Augmentation

Complementary Parts Contrastive Learning for Fine-Grained Weakly Supervised Object Co-Localization

Improved transfer learning using textural features conflation and dynamically fine-tuned layers.

Patch-Level Consistency Regularization in Self-Supervised Transfer Learning for Fine-Grained Image Recognition

Abnormality Detection of Blast Furnace Tuyere Based on Knowledge Distillation and a Vision Transformer

Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification

How to train your pre-trained GAN models

Fine-Grained Visual Categorization: A Spatial–Frequency Feature Fusion Perspective

Hybrid Granularities Transformer for Fine-Grained Image Recognition

Loop and distillation: Attention weights fusion transformer for fine‐grained representation