Query Image Research Articles

The task of Few-shot learning (FSL) aims to transfer the knowledge learned from base categories with sufficient labelled data to novel categories with scarce known information. It is currently an important research question and has great practical values in the real-world applications. Despite extensive previous efforts are made on few-shot learning tasks, we emphasize that most existing methods did not take into account the distributional shift caused by sample selection bias in the FSL scenario. Such a selection bias can induce spurious correlation between the semantic causal features, that are causally and semantically related to the class label, and the other non-causal features. Critically, the former ones should be invariant across changes in distributions, highly related to the classes of interest, and thus well generalizable to novel classes, while the latter ones are not stable to changes in the distribution. To resolve this problem, we propose a novel data augmentation strategy dubbed as PatchMix that can break this spurious dependency by replacing the patch-level information and supervision of the query images with random gallery images from different classes from the query ones. We theoretically show that such an augmentation mechanism, different from existing ones, is able to identify the causal features. To further make these features to be discriminative enough for classification, we propose Correlation-guided Reconstruction (CGR) and Hardness-Aware module for instance discrimination and easier discrimination between similar classes. Moreover, such a framework can be adapted to the unsupervised FSL scenario. The utility of our method is demonstrated on the state-of-the-art results consistently achieved on several benchmarks including miniImageNet, tieredImageNet, CIFAR-FS , CUB, Cars, Places and Plantae, in all settings of single-domain, cross-domain and unsupervised FSL. By studying the intra-variance property of learned features and visualizing the learned features, we further quantitatively and qualitatively show that such a promising result is due to the effectiveness in learning causal features.

AbstractThe interpretation of medical images into a natural language is a developing field of artificial intelligence (AI) called image captioning. This field integrates two branches of artificial intelligence which are computer vision and natural language processing. This is a challenging topic that goes beyond object recognition, segmentation, and classification since it demands an understanding of the relationships between various components in an image and how these objects function as visual representations. The content-based image retrieval (CBIR) uses an image captioning model to generate captions for the user query image. The common architecture of medical image captioning systems consists mainly of an image feature extractor subsystem followed by a caption generation lingual subsystem. We aim in this paper to build an optimized model for histopathological captions of stomach adenocarcinoma endoscopic biopsy specimens. For the image feature extraction subsystem, we did two evaluations; first, we tested 5 different vision models (VGG, ResNet, PVT, SWIN-Large, and ConvNEXT-Large) using (LSTM, RNN, and bidirectional-RNN) and then compare the vision models with (LSTM-without augmentation, LSTM-with augmentation and BioLinkBERT-Large as an embedding layer-with augmentation) to find the accurate one. Second, we tested 3 different concatenations of pairs of vision models (SWIN-Large, PVT_v2_b5, and ConvNEXT-Large) to get among them the most expressive extracted feature vector of the image. For the caption generation lingual subsystem, we tested a pre-trained language embedding model which is BioLinkBERT-Large compared to LSTM in both evaluations, to select from them the most accurate model. Our experiments showed that building a captioning system that uses a concatenation of the two models ConvNEXT-Large and PVT_v2_b5 as an image feature extractor, combined with the BioLinkBERT-Large language embedding model produces the best results among the other combinations.

Query Image Research Articles

Related Topics

Articles published on Query Image

Power amplifier circuit defect detection based on improved Patch SVDD

PatchMix Augmentation to Identify Causal Features in Few-shot Learning.

ISimLoc: Visual Global Localization for Previously Unseen Environments With Simulated Images

Enhanced descriptive captioning model for histopathological patches

A Large-scale Virtual Dataset and Egocentric Localization for Disaster Responses.

Attentional prototype inference for few-shot segmentation

Learned Data-aware Image Representations of Line Charts for Similarity Search

Efficient improvement of classification accuracy via selective test-time augmentation

Learning global image representation with generalized‐mean pooling and smoothed average precision for large‐scale CBIR

ConCoNet: Class-agnostic counting with positive and negative exemplars

Gaze-Dependent Image Re-Ranking Technique for Enhancing Content-Based Image Retrieval

A cross-view geo-localization method guided by relation-aware global attention

Semantic-guided de-attention with sharpened triplet marginal loss for visual place recognition

Query-based image tagging model using ensemble learning with enhanced artificial bee colony optimization

Quaternion-Valued Correlation Learning for Few-Shot Semantic Segmentation

Cross domain 2D-3D descriptor matching for unconstrained 6-DOF pose estimation

Knowledge transduction for cross-domain few-shot learning

Co-attention enabled content-based image retrieval

CLoc: Confident Initial Estimation of Long-Term Visual Localization Using a Few Sequential Images in Large-Scale Spaces

PIHA: Detection method using perceptual image hashing against query-based adversarial attacks

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Query Image Research Articles

Related Topics

Articles published on Query Image

Power amplifier circuit defect detection based on improved Patch SVDD

PatchMix Augmentation to Identify Causal Features in Few-shot Learning.

ISimLoc: Visual Global Localization for Previously Unseen Environments With Simulated Images

Enhanced descriptive captioning model for histopathological patches

A Large-scale Virtual Dataset and Egocentric Localization for Disaster Responses.

Attentional prototype inference for few-shot segmentation

Learned Data-aware Image Representations of Line Charts for Similarity Search

Efficient improvement of classification accuracy via selective test-time augmentation

Learning global image representation with generalized‐mean pooling and smoothed average precision for large‐scale CBIR

ConCoNet: Class-agnostic counting with positive and negative exemplars

Gaze-Dependent Image Re-Ranking Technique for Enhancing Content-Based Image Retrieval

A cross-view geo-localization method guided by relation-aware global attention

Semantic-guided de-attention with sharpened triplet marginal loss for visual place recognition

Query-based image tagging model using ensemble learning with enhanced artificial bee colony optimization

Quaternion-Valued Correlation Learning for Few-Shot Semantic Segmentation

Cross domain 2D-3D descriptor matching for unconstrained 6-DOF pose estimation

Knowledge transduction for cross-domain few-shot learning

Co-attention enabled content-based image retrieval

CLoc: Confident Initial Estimation of Long-Term Visual Localization Using a Few Sequential Images in Large-Scale Spaces

PIHA: Detection method using perceptual image hashing against query-based adversarial attacks