Abstract

Tabular data is a widely used data form in many fields such as product marketing. In some cases, the domain shift between source and target domain of tabular data may occur with the changing of collection conditions such as time. The extant methods on tabular data mainly consist of neural-network-based methods and tree-based methods. They both meet challenges induced by domain shift on tabular data. First, neural-network-based methods are lack of effective mechanism to extract the features of tabular data and the performance may not be higher than tree-based models. Second, tree-based methods are lack of effective feature representations to model the associations between source domain and target domain. To improve the performance of tree-based methods for domain shift, a novel pseudo-label based domain adaptation method is proposed for the tree-based method called Xgboost. The proposed method consists of pseudo-label generation and selection strategies. The pseudo-label generation strategy can control the effects of pseudo-labels on Xgboost in a more flexible way by setting proper values of pseudo-labels. The pseudo-label selection strategy can select the pseudo-labels with high confidences under a consistency condition based on the outputs of Xgboost. The quality of pseudo-labels for the data in target domain is improved and so does the performance of Xgboost trained by the data in both source domain and target domain. In the experiment, several UCI datasets and 5G terminal datasets are used to show that the proposed methods can effectively improve the performance of Xgboost.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call