Classical Embedding Research Articles

With prior knowledge of seen objects, humans have a remarkable ability to recognize novel objects using shared and distinct local attributes. This is significant for the challenging tasks of zero-shot learning (ZSL) and fine-grained visual classification (FGVC), where the discriminative attributes of objects have played an important role. Inspired by human visual attention, neural networks have widely exploited the attention mechanism to learn the locally discriminative attributes for challenging tasks. Though greatly promoted the development of these fields, existing works mainly focus on learning the region embeddings of different attribute features and neglect the importance of discriminative attribute localization. It is also unclear whether the learned attention truly matches the real human attention. To tackle this problem, this paper proposes to employ real human gaze data for visual recognition networks to learn from human attention. Specifically, we design a unified Attribute Attention Network (A 2Net) that learns from human attention for both ZSL and FGVC tasks. The overall model consists of an attribute attention branch and a baseline classification network. On top of the image feature maps provided by the baseline classification network, the attribute attention branch employs attribute prototypes to produce attribute attention maps and attribute features. The attribute attention maps are converted to gaze-like attentions to be aligned with real human gaze attention. To guarantee the effectiveness of attribute feature learning, we further align the extracted attribute features with attribute-defined class embeddings. To facilitate learning from human gaze attention for the visual recognition problems, we design a bird classification game to collect real human gaze data using the CUB dataset via an eye-tracker device. Experiments on ZSL and FGVC tasks without/with real human gaze data validate the benefits and accuracy of our proposed model. This work supports the promising benefits of collecting human gaze datasets and automatic gaze estimation algorithms learning from human attention for high-level computer vision tasks.

Read full abstract

In this paper, we propose a technique for hierarchical yoga pose classification (YPC) in a multi-stage multi-tasking framework. We propose a three-stage transfer learning based end-to-end training methodology. Novelty lies in (a) proposed supervised contrastive combined loss function for stage-1 training, (b) proposed Encoder–Decoder network architecture with attention mechanism for stage-3 training, (c) proposed spatial context aware multi-tasking combined loss function for stage-3 training. Firstly, for stage-1 training, we propose the usage of linear combination of three loss functions: cross-entropy, self-supervised contrastive loss and supervised contrastive loss in a multi-tasking manner. We introduce radial and cosine margin into the formulation of self-supervised and supervised contrastive loss to pull feature embeddings of same class closer together compared to feature embeddings of different classes. Weights learned over stage-1 training are subsequently fine-tuned over cross-entropy multi-tasking loss in stage-2. These stage-2 weights are transfer learned and are further fine-tuned in stage-3 training. For stage-3 training, we propose the usage of spatial context aware multi-tasking combined loss function. This loss function leverages on the fine-grained spatial features obtained from HiResCAM. These are processed in parallel with features obtained from XGrad-CAM (sensitivity and conservation axioms satisfying features) to further supervise the learning of hierarchical yoga pose classifier. We exemplify our methodology on the publicly available Yoga-82 large-scale dataset. We report peak Top-1 YPC accuracy of 95.89% over 6 pose classes (Yoga-6), 93.85% over 20 pose classes (Yoga-20) and 90.0% over 82 pose classes (Yoga-82). Our proposed method achieves 6.1% improvement over Top-1 classification accuracy in Yoga-6 hierarchy, 9.3% improvement in Yoga-20 hierarchy and 10.9% improvement in Yoga-82 hierarchy in comparison with state-of-the-art (SOTA) methodology. We achieve the current best Top-1 classification accuracies in all the three YPC hierarchies.

Read full abstract

Classical Embedding Research Articles

Related Topics

Articles published on Classical Embedding

Learning From Human Attention for Attribute-Assisted Visual Recognition.

On hyperelliptic curves of odd degree and genus g with 6 torsion points of order 2g + 1

Embedded Many‐Body Green's Function Methods for Electronic Excitations in Complex Molecular Systems

Adaptive class token knowledge distillation for efficient vision transformer

Relative injective modules, superstability and noetherian categories

On the modulus of continuity of fractional Orlicz-Sobolev functions

Fault diagnosis of induction motor in the cooling water supply system using a multi-channel data fusion transformer with limited sample conditions*

Prompt-guided DETR with RoI-pruned masked attention for open-vocabulary object detection

Contrastive and uncertainty-aware nuclei segmentation and classification

Fate of charged wormhole structures utilizing Karmarkar approach

A global higher regularity result for the static relaxed micromorphic model on smooth domains

Federated Learning With Only Positive Labels by Exploring Label Correlations.

Particle Swarm Optimization for 5g and Beyond

Fair large kernel embedding with relation-specific features extraction for link prediction

TurboSVM-FL: Boosting Federated Learning through SVM Aggregation for Lazy Clients

Unsupervised twitter social bot detection using deep contrastive graph clustering

Zero-Shot Aerial Object Detection with Visual Description Regularization

CAM based fine-grained spatial feature supervision for hierarchical yoga pose classification using multi-stage transfer learning

An h$h$‐principle for embeddings transverse to a contact structure

CR-TransR: A Knowledge Graph Embedding Model for Cultural Domain

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Classical Embedding Research Articles

Related Topics

Articles published on Classical Embedding

Learning From Human Attention for Attribute-Assisted Visual Recognition.

On hyperelliptic curves of odd degree and genus g with 6 torsion points of order 2g + 1

Embedded Many‐Body Green's Function Methods for Electronic Excitations in Complex Molecular Systems

Adaptive class token knowledge distillation for efficient vision transformer

Relative injective modules, superstability and noetherian categories

On the modulus of continuity of fractional Orlicz-Sobolev functions

Fault diagnosis of induction motor in the cooling water supply system using a multi-channel data fusion transformer with limited sample conditions*

Prompt-guided DETR with RoI-pruned masked attention for open-vocabulary object detection

Contrastive and uncertainty-aware nuclei segmentation and classification

Fate of charged wormhole structures utilizing Karmarkar approach

A global higher regularity result for the static relaxed micromorphic model on smooth domains

Federated Learning With Only Positive Labels by Exploring Label Correlations.

Particle Swarm Optimization for 5g and Beyond

Fair large kernel embedding with relation-specific features extraction for link prediction

TurboSVM-FL: Boosting Federated Learning through SVM Aggregation for Lazy Clients

Unsupervised twitter social bot detection using deep contrastive graph clustering

Zero-Shot Aerial Object Detection with Visual Description Regularization

CAM based fine-grained spatial feature supervision for hierarchical yoga pose classification using multi-stage transfer learning

An h$h$‐principle for embeddings transverse to a contact structure

CR-TransR: A Knowledge Graph Embedding Model for Cultural Domain