Abstract

With the emerging of new data collection ways, the features are incremental and accumulated gradually. Due to the expansion of feature spaces, it is more common that there are unknown biases between the distribution of training and testing datasets. It is known as the unknown data selection bias, which belongs to the learning scenario with non-i.i.d samples. The performance of traditional approaches, which need the i.i.d. assumption, will be aggravated seriously. How to design an algorithm to address the problem of data selection bias in this feature incremental scenario is crucial but rarely studied. In this paper, we propose a feature incremental classification algorithm with causality. Firstly, we embed the confounding variable balance algorithm in causal learning into the prediction modeling and utilize the logical regression algorithm with balancing regular terms as a baseline. Then, to satisfy the special requirement of feature increment, we design a new regularizer, which maintains the consistency of the regression coefficients between the data in the current and previous stages. It retains the correlation between the old features and labels. Finally, we propose the Multiple Balancing Logistic Regression model (MBRLR) to jointly optimize the balancing regularizer and weighted logistic regression model with multiple feature sets. We also present theoretical results to show that our proposed algorithm can make precise and stable predictions. Besides, the numerical results also demonstrate that our MBRLR algorithm is superior to other methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.