Abstract

Multi-source causal feature selection captures causal relevance of the features with the class attribute in different datasets and are very important to improve the stability and reliability of prediction models. The Multi-source Causal Feature Selection (MCFS) is the most advanced method that can simultaneously select features on multiple datasets. However, it only considers the causal relevance between a single feature and class attributes, which ignores the causal relevance among multiple features. In addition, MCFS uses exhaustive method to obtain the optimal causal feature set on multiple datasets, which is time-consuming. Focusing on the two problems, firstly we propose the Multiple Causal Relevance, which can remove redundant information hidden in pairwise causal relevance. Secondly, we analyze the Markov blanket of multi-source class attributes, where the upper and lower bounds of optimal causal feature set are proven to reduce the search range of features and improve the efficiency of the algorithm. Finally, we propose a multi-source causal Feature Selection method based on Multiple Causal Relevance (MCRFS) and use synthetic datasets and binary and multiclassification real datasets with 2 feature selection methods, extensive experiments show that the accuracy and efficiency of MCRFS method on SVM and KNN classifiers are better than two comparison methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call