Large Intra-class Variations Research Articles

By characterizing each image set as a nonsingular covariance matrix on the symmetric positive definite (SPD) manifold, the approaches of visual content classification with image sets have made impressive progress. However, the key challenge of unhelpfully large intraclass variability and interclass similarity of representations remains open to date. Although, several recent studies have mitigated the two problems by jointly learning the embedding mapping and the similarity metric on the original SPD manifold, their inherent shallow and linear feature transformation mechanism are not powerful enough to capture useful geometric features, especially in complex scenarios. To this end, this article explores a novel approach, termed SPD manifold deep metric learning (SMDML), for image set classification. Specifically, SMDML first selects a prevailing SPD manifold neural network (SPDNet) as the backbone (encoder) to derive an SPD matrix nonlinear representation. To counteract the degradation of structural information during multistage feature embedding, we construct a Riemannian decoder at the end of the encoder, trained by a reconstruction error term (RT), to induce the generated low-dimensional feature manifold of the hidden layer to capture the pivotal information about the visual data describing the imaged scene. We demonstrate through theory and experiments that it is feasible to replace the Riemannian metric with Euclidean distance in RT. Then, the ReCov layer is introduced into the established Riemannian network to regularize the local statistical information within each input feature matrix, which enhances the effectiveness of the learning process. The theoretical analysis of the activation function used in the ReCov layer in terms of continuity and conditions for generating positive definite matrices is beneficial for network design. Inspired by the fact that the single cross-entropy loss used for training is unable to effectively parse the geometric distribution of the deep representations, we finally endow the suggested model with a novel metric learning regularization term. By explicitly incorporating the encoding and processing of the data variations into the network learning process, this term can not only derive a powerful Riemannian representation but also train an effective classifier. The experimental results show the superiority of the proposed approach on three typical visual classification tasks.

Insect diversity monitoring is crucial for biological pest control in agriculture and forestry. Modern monitoring of insect species relies heavily on fine-grained image classification models. Fine-grained image classification faces challenges such as small inter-class differences and large intra-class variances, which are even more pronounced in insect scenes where insect species often exhibit significant morphological differences across multiple life stages. To address these challenges, we introduce segmentation and clustering operations into the image classification task and design a novel network model training framework for fine-grained classification of insect images using multi-modality clustering and approximate mask methods, named PCAM-Frame. In the first stage of the framework, we adopt the Polymorphic Clustering Module, and segmentation and clustering operations are employed to distinguish various morphologies of insects at different life stages, allowing the model to differentiate between samples at different life stages during training. The second stage consists of a feature extraction network, called Basenet, which can be any mainstream network that performs well in fine-grained image classification tasks, aiming to provide pre-classification confidence for the next stage. In the third stage, we apply the Approximate Masking Module to mask the common attention regions of the most likely classes and continuously adjust the convergence direction of the model during training using a Deviation Loss function. We apply PCAM-Frame with multiple classification networks as the Basenet in the second stage and conduct extensive experiments on the Insecta dataset of iNaturalist 2017 and IP102 dataset, achieving improvements of 2.2% and 1.4%, respectively. Generalization experiments on other fine-grained image classification datasets such as CUB200-2011 and Stanford Dogs also demonstrate positive effects. These experiments validate the pertinence and effectiveness of our framework PCAM-Frame in fine-grained image classification tasks under complex conditions, particularly in insect scenes.

Large Intra-class Variations Research Articles

Related Topics

Articles published on Large Intra-class Variations

Multi-view visual semantic embedding for cross-modal image–text retrieval

KBCNet: Independently keypoint learning for Small Object Semantic Correspondence

A lightweight dual-attention network for tomato leaf disease identification.

SPD Manifold Deep Metric Learning for Image Set Classification.

MIST: Multi-instance selective transformer for histopathological subtype prediction

Hierarchical online contrastive anomaly detection for fetal arrhythmia diagnosis in ultrasound

Graph Representation and Prototype Learning for webly supervised fine-grained image recognition

A class distribution learning method for few-shot remote sensing scene classification

An efficient Fusion-Purification Network for Cervical pap-smear image classification

Polymorphic Clustering and Approximate Masking Framework for Fine-Grained Insect Image Classification

Few-Shot Class-Incremental Learning for Medical Time Series Classification.

Delving into Multimodal Prompting for Fine-Grained Visual Classification

Multi-Prototype Space Learning for Commonsense-Based Scene Graph Generation

Hyperspectral image classification using Walsh Hadamard transform- based key band selection and deep convolutional neural networks

Category attention guided network for semantic segmentation of Fine-Resolution remote sensing images

FPIRST: Fatigue Driving Recognition Method Based on Feature Parameter Images and a Residual Swin Transformer.

Prototype Correlation Matching and Class-Relation Reasoning for Few-Shot Medical Image Segmentation.

Dual-branch Branch Networks Based on Contrastive Learning for Long-Tailed Remote Sensing

Fine-grained Recognition with Learnable Semantic Data Augmentation.

Multi-Surface Multi-Technique (MUST) Latent Fingerprint Database

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large Intra-class Variations Research Articles

Related Topics

Articles published on Large Intra-class Variations

Multi-view visual semantic embedding for cross-modal image–text retrieval

KBCNet: Independently keypoint learning for Small Object Semantic Correspondence

A lightweight dual-attention network for tomato leaf disease identification.

SPD Manifold Deep Metric Learning for Image Set Classification.

MIST: Multi-instance selective transformer for histopathological subtype prediction

Hierarchical online contrastive anomaly detection for fetal arrhythmia diagnosis in ultrasound

Graph Representation and Prototype Learning for webly supervised fine-grained image recognition

A class distribution learning method for few-shot remote sensing scene classification

An efficient Fusion-Purification Network for Cervical pap-smear image classification

Polymorphic Clustering and Approximate Masking Framework for Fine-Grained Insect Image Classification

Few-Shot Class-Incremental Learning for Medical Time Series Classification.

Delving into Multimodal Prompting for Fine-Grained Visual Classification

Multi-Prototype Space Learning for Commonsense-Based Scene Graph Generation

Hyperspectral image classification using Walsh Hadamard transform- based key band selection and deep convolutional neural networks

Category attention guided network for semantic segmentation of Fine-Resolution remote sensing images

FPIRST: Fatigue Driving Recognition Method Based on Feature Parameter Images and a Residual Swin Transformer.

Prototype Correlation Matching and Class-Relation Reasoning for Few-Shot Medical Image Segmentation.

Dual-branch Branch Networks Based on Contrastive Learning for Long-Tailed Remote Sensing

Fine-grained Recognition with Learnable Semantic Data Augmentation.

Multi-Surface Multi-Technique (MUST) Latent Fingerprint Database