Intrinsic structure exploitation with dual alignment for universal visual recognition

Yuyao Zhai,Liang Chen,Minghua Deng

doi:10.1016/j.knosys.2024.112156

Abstract

This research explores the complex yet applicable domain of universal visual recognition, where unlabeled datasets may encompass additional, as-of-yet unidentified classes and present a skewed feature distribution within labeled datasets. The primary challenge lies in concurrently distinguishing instances of known classes and uncovering new classes within the unlabeled data. This difficulty is compounded by two factors: category shift and distribution shift. Current methodologies typically overlook the potential of harnessing the inherent structural relationship between labeled and unlabeled datasets, and they fail to effectively isolate the semantic variability of the newly identified classes from the established ones. To overcome these hurdles, we introduce an innovative framework, which we have termed MESA (short for Manifold structure Exploitation with double Spaces dual Alignment), to advance the field of universal visual recognition. Our approach begins by identifying geometrically and semantically related pairs of nearest neighbors between labeled and unlabeled data, which serve as anchor points to foster an intrinsic match between the two datasets. By integrating an affinity score for similarity, we establish a novel cross-view, anchor-guided, self-supervised learning paradigm. This model is designed to spread the learned labels’ information and to conjoin novel semantic insights within the predictive space. To intensify the matching accuracy for the known classes and enhance the differentiation of novel classes, we further implement a confident-prototype-driven, self-supervised learning paradigm in a cross-view manner. This strategy is aimed at revealing the macroscopic category configuration within the embedding space. MESA employs a pioneering bidirectional dual-alignment mechanism that operates simultaneously in the embedding and prediction spaces, hence providing a more robust approach to confronting challenges brought about by distribution and category shifts. Our exhaustive evaluation across several extensive image recognition benchmarks substantiates MESA’s superior performance over the latest cutting-edge methods, marking a significant leap forward in the sphere of universal visual recognition. Finally, MESA is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/MESA.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Intrinsic structure exploitation with dual alignment for universal visual recognition

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Similar Papers

Adversarial co-training for semantic segmentation over medical images
Haoyu Xie ... Xingwei Wang
Computers in Biology and Medicine | VOL. 157
Haoyu Xie, et. al.Haoyu Xie ... Xingwei Wang
05 Mar 2023
Computers in Biology and Medicine | VOL. 157

Wasserstein distance regularized graph neural networks
Yong Shi ... Lingfeng Niu
Information Sciences | VOL. 670
Yong Shi, et. al.Yong Shi ... Lingfeng Niu
15 Apr 2024
Information Sciences | VOL. 670

Deep Semi-Supervised Metric Learning with Dual Alignment for Cervical Cancer Cell Detection
Zhizhong Chai ... Huangjing Lin
-
Zhizhong Chai, et. al.Zhizhong Chai ... Huangjing Lin
28 Mar 2022
28 Mar 2022

Dual adaptive alignment and partitioning network for visible and infrared cross-modality person re-identification
Qiang Liu ... Qizhi Teng
Applied Intelligence | VOL. 52
Qiang Liu, et. al.Qiang Liu ... Qizhi Teng
05 May 2021
Applied Intelligence | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Intrinsic structure exploitation with dual alignment for universal visual recognition

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems