X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

Yinan He,Ziwei Liu,Kun Wang,Jing Shao,Yu Qiao,Zhenfei Yin,Gengshi Huang,Lu Sheng,Jianing Teng,Siyu Chen

doi:10.1007/978-3-031-19809-0_29

Abstract

In computer vision, pre-training models based on large-scale supervised learning have been proven effective over the past few years. However, existing works mostly focus on learning from individual task with single data source (e.g., ImageNet for classification or COCO for detection). This restricted form limits their generalizability and usability due to the lack of vast semantic information from various tasks and data sources. Here, we demonstrate that jointly learning from heterogeneous tasks and multiple data sources contributes to universal visual representation, leading to better transferring results of various downstream tasks. Thus, learning how to bridge the gaps among different tasks and data sources is the key, but it still remains an open question. In this work, we propose a representation learning framework called X-Learner, which learns the universal feature of multiple vision tasks supervised by various sources, with expansion and squeeze stage: 1) Expansion Stage: X-Learner learns the task-specific feature to alleviate task interference and enrich the representation by reconciliation layer. 2) Squeeze Stage: X-Learner condenses the model to a reasonable size and learns the universal and generalizable representation for various tasks transferring. Extensive experiments demonstrate that X-Learner achieves strong performance on different tasks without extra annotations, modalities and computational costs compared to existing representation learning methods. Notably, a single X-Learner model shows remarkable gains of 3.0%, 3.3% and 1.8% over current pre-trained models on 12 downstream datasets for classification, object detection and semantic segmentation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

DenseCL: A simple framework for self-supervised dense visual pre-training
Xinlong Wang ... Tao Kong
Visual Informatics | VOL. 7
Xinlong Wang, et. al.Xinlong Wang ... Tao Kong
24 Sep 2022
Visual Informatics | VOL. 7

CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation
Ming Zhang ... Ji Qi
Remote Sensing | VOL. 16
Ming Zhang, et. al.Ming Zhang ... Ji Qi
06 Apr 2024
Remote Sensing | VOL. 16

Table2Vec-automated universal representation learning of enterprise data DNA for benchmarkable and explainable enterprise data science
Longbing Cao ... Chengzhang Zhu
Scientific Reports | VOL. 11
Longbing Cao, et. al.Longbing Cao ... Chengzhang Zhu
01 Dec 2021
Scientific Reports | VOL. 11

Dense Semantic Contrast for Self-Supervised Visual Representation Learning
Xiaoni Li ... Yu Zhou
-
Xiaoni Li, et. al.Xiaoni Li ... Yu Zhou
17 Oct 2021
17 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

Abstract

Talk to us

Similar Papers