Abstract

In various applications such as spam mail classification, the performance of classifiers deteriorates over time. Although retraining classifiers using labeled data helps to maintain the performance, continuously preparing labeled data is quite expensive. In this paper, we propose a method to learn classifiers by using newly obtained unlabeled data, which are easy to prepare, as well as labeled data collected beforehand. A major reason for the performance deterioration is the emergence of new features that do not appear in the training phase. Another major reason is the change of the distribution between the training and test phases. The proposed method learns the latest classifiers that overcome both problems. With the proposed method, the conditional distribution of new features given existing features is learned using the unlabeled data. In addition, the proposed method estimates the density ratio between training and test distributions by using the labeled and unlabeled data. We approximate the classification error of a classifier, which exploits new features as well as existing features, at the test phase by incorporating both the conditional distribution of new features and the densityratio, simultaneously. By minimizing the approximated error while integrating out new feature values, we obtain a classifier that exploits new features and fits on the test phase. The effectiveness of the proposed method is demonstrated with experiments using synthetic and real-world data sets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.