Abstract

交互的特征是指那些分开考虑对目标集不相关或弱相关,但合在一起考虑却对目标集高度相关的特征。特征交互现象广泛存在,但找出有交互作用的特征却是一项具有挑战性的任务。本文旨在对基于聚类的FAST特征选择算法进行改进,在其基础上考虑特征的交互作用,首先去掉FAST的移除不相关特征的部分,接着加入交互权值变量,使得在移除不相关和冗余特征的同时,保留有交互作用的特征。为了对两个算法进行对比分析,我们选取了5个不同领域的16个公开数据集进行实证分析,并使用4种分类器对实验结果进行评估,包括C5.0、Bayes Net、Neural Net和Logistic,接着从选择的特征个数、算法运行时间和分类器的准确率3个方面对两个算法进行比较。实验结果表明,两者选择的特征个数相差不大,有时IWFAST甚至可以减少特征个数,同时IWFAST能提高分类器的准确率,尤其对于特征数量较多的情形,以及Game和Life领域。美中不足的是,IWFAST的运行时间较长,但仍在可接受的范围内。 Interacting features are those appear to be irrelevant or weakly relevant with the class individually, but it may highly correlate to the class when it combined. Feature interaction is almost everywhere, but discovering feature interaction is a challenging task in feature selection. The purpose of this paper is to improve the FAST feature selection algorithm based on cluster by considering feature interaction. Firstly, deleted the irrelevant feature removal section, then brought in an interaction weight factor, so that we can retain interacted features when removed the irrelevant and redundant ones. In order to do the comparison between this two algorithms, we selected 16 public data sets which cover 5 different domains on the empirical analysis, and used 4 types of classifier to evaluate the results, namely, C5.0, Bayes Net, Neural Net and Logistic. Finally, we compared these two algorithms according to the number of selected features, running time of algorithm and the accuracy of classifier. The experimental result showed that it has little difference on the number of selected features, and sometimes IWFAST can produce smaller subsets of features. Meanwhile, IWFAST can improve the accuracy of the classifier, especially for the high- dimensional data set, or especially for the Game and Life area. The defect is that the running time of IWFAST is long, but is acceptable computational complexity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call