Abstract
Unsupervised domain adaptation (UDA) is an important research topic in semantic segmentation tasks, wherein pixel-wise annotations are often difficult to collect in a test environment due to their high labeling costs. Previous UDA-based studies trained their segmentation networks using labeled synthetic data and unlabeled realistic data as source and target domains, respectively. However, they often fail to distinguish semantically similar classes, such as person vs. rider and road vs. sidewalk, because these classes are prone to confusion in domain-shifted environments. In this paper, we introduce a Language-Conditioned Masked Segmentation Model (LC-MSM), which is a new framework for the joint learning of context relations and domain-agnostic information for domain-adaptive semantic segmentation. Specifically, we reconstruct semantic labels with masked image conditions on the generalized text embeddings of the corresponding semantic class from OpenCLIP, which contains domain-invariant knowledge from large-scale data. To this end, we correlate the generalized text embeddings onto the per-pixel image feature of a masked image that learned the spatial context to further append domain-agnostic language information to the semantic decoder. This facilitates the generalization of our model to the target domain via the learning of context information within individual training instances, while considering cross-domain representations spanning the entire dataset. LC-MSM achieves an unprecedented UDA performance of 71.8 and 62.8 mIoU on GTA→Cityscapes and SYNTHIA→Cityscapes, respectively, which corresponds to an improvement of +3.5 and +1.9 percent points over the baseline method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.