Abstract

In traditional Machine Learning, the predictions of the algorithms are based on the assumption that the problem data follows the same distribution in both the training and the test datasets. However, in real world problems this condition does not hold and, for instance, the distribution of the covariates changes whereas the conditional distribution of the targets remains unchanged. If this particular situation takes place, we are facing a covariate shift problem where standard error estimation may be no longer accurate. In this context, the importance is a measure commonly used to alleviate the influence of covariate shift on error estimations. The main drawback is that the importance is not easy to compute. The Kullback–Leibler Importance Estimation Procedure (KLIEP) has been shown to be capable of estimating importance in a promising way. Despite the good performance of this procedure, it fails to ignore target information, since it only includes the covariates information for computing the importance. In this direction, this paper explores the potential improvement in the performance of the method if information about the targets is taken into account in the computation of the importance. Then, a redefinition of the importance arises in order to be generalized in this way. Besides the potential improvement in performance, including target information make possible the application to a real application about plankton classification that motivates this research and characterized by its great dimensionality, since considering targets rather than covariates reduces the computation and the noise in the covariates. The impact of taking target information into account is also explored when Logistic Regression (LR), Kernel Mean Matching (KMM), Ensemble Kernel Mean Matching (EKMM) and the naive predecessor of KLIEP called Kernel Density Estimation (KDE) methods estimate the importance. The results of the experiments lead to conclude that the error estimation is more accurate using target information when either density or probabilities are involved in the importance computation, and, especially in case of the more promising method KLIEP.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.