Abstract

Domain adaptation aims at extracting knowledge from an auxiliary source domain to assist the learning task in a target domain. When the data distribution of the target domain is different from that of the source domain, the direct use of source data for building a classifier for the target learning task cannot achieve promising performance. In this work, we propose a novel unsupervised domain adaptation method called Feature Selection for Domain Adaptation (FSDA), in which we aim to select a set of informative features. The benefits are two-fold. The first is to reduce the mismatch between the data distributions in the source and target domains by selecting a set of informative features in which they share similar properties. The second is to remove noisy features in the source domain such that the learning performance can be enhanced. We formulate a new sparse learning model for structured multiple outputs, including a vector to select informative features that can be used to jointly minimize the domain discrepancy and eliminate noisy features, and a classifier to perform prediction on the selected features. We develop a cutting-plane algorithm to solve the resulting optimization problem. Extensive experiments on real-world data sets are tested to demonstrate the effectiveness of the proposed method compared with the other existing methods.

Highlights

  • In standard machine learning, in order to obtain an effective classifier, one usually has to collect and label a large amount of training data, which is often labor intensive and expensive

  • We develop a cutting-plane algorithm to iteratively pick up informative features and train the classifier on them

  • In the domain adaptation scenario, to pick up informative features, we introduce a binary vector β ∈ {0, 1}d to scale an instance x by (x β), where the value 1 indicates that the corresponding feature is selected

Read more

Summary

Introduction

In order to obtain an effective classifier, one usually has to collect and label a large amount of training data, which is often labor intensive and expensive. In order to reduce the effort of collecting labeled training data in a target domain of interest, domain adaptation is employed to leverage abundant labeled data from an auxiliary source domain [1]–[3]. According to the availability of labeled target data, we can divide domain adaptation into two categories: supervised and unsupervised settings. Requires some labeled target data for training [12]; while unsupervised domain adaptation uses unlabeled target data for training [13], [14]. We consider unsupervised domain adaptation, which is more challenging compared to the supervised setting

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.