Abstract

This research explores the complex yet applicable domain of universal visual recognition, where unlabeled datasets may encompass additional, as-of-yet unidentified classes and present a skewed feature distribution within labeled datasets. The primary challenge lies in concurrently distinguishing instances of known classes and uncovering new classes within the unlabeled data. This difficulty is compounded by two factors: category shift and distribution shift. Current methodologies typically overlook the potential of harnessing the inherent structural relationship between labeled and unlabeled datasets, and they fail to effectively isolate the semantic variability of the newly identified classes from the established ones. To overcome these hurdles, we introduce an innovative framework, which we have termed MESA (short for Manifold structure Exploitation with double Spaces dual Alignment), to advance the field of universal visual recognition. Our approach begins by identifying geometrically and semantically related pairs of nearest neighbors between labeled and unlabeled data, which serve as anchor points to foster an intrinsic match between the two datasets. By integrating an affinity score for similarity, we establish a novel cross-view, anchor-guided, self-supervised learning paradigm. This model is designed to spread the learned labels’ information and to conjoin novel semantic insights within the predictive space. To intensify the matching accuracy for the known classes and enhance the differentiation of novel classes, we further implement a confident-prototype-driven, self-supervised learning paradigm in a cross-view manner. This strategy is aimed at revealing the macroscopic category configuration within the embedding space. MESA employs a pioneering bidirectional dual-alignment mechanism that operates simultaneously in the embedding and prediction spaces, hence providing a more robust approach to confronting challenges brought about by distribution and category shifts. Our exhaustive evaluation across several extensive image recognition benchmarks substantiates MESA’s superior performance over the latest cutting-edge methods, marking a significant leap forward in the sphere of universal visual recognition. Finally, MESA is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/AimeeStar/MESA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call