Existing cross-domain classification and detection methods usually apply a consistency constraint between the target sample and its self-augmentation for unsupervised learning without considering the essential source knowledge. In this paper, we propose a Source-guided Target Feature Reconstruction (STFR) module for cross-domain visual tasks, which applies source visual words to reconstruct the target features. Since the reconstructed target features contain the source knowledge, they can be treated as a bridge to connect the source and target domains. Therefore, using them for consistency learning can enhance the target representation and reduce the domain bias. Technically, source visual words are selected and updated according to the source feature distribution, and applied to reconstruct the given target feature via a weighted combination strategy. After that, consistency constraints are built between the reconstructed and original target features for domain alignment. Furthermore, STFR is connected with the optimal transportation algorithm theoretically, which explains the rationality of the proposed module. Extensive experiments on nine benchmarks and two cross-domain visual tasks prove the effectiveness of the proposed STFR module, e.g., 1) cross-domain image classification: obtaining average accuracy of 91.0%, 73.9%, and 87.4% on Office-31, Office-Home, and VisDA-2017, respectively; 2) cross-domain object detection: obtaining mAP of 44.50% on Cityscapes → Foggy Cityscapes, AP on car of 78.10% on Cityscapes → KITTI, MR -2 of 8.63%, 12.27%, 22.10%, and 40.58% on COCOPersons → Caltech, CityPersons → Caltech, COCOPersons → CityPersons, and Caltech → CityPersons, respectively.
Read full abstract