Abstract

With the existing abundance of intelligent and expert systems, there is a need for selecting a subset of highly relevant features with low redundancy. In filter approaches, the feature subsets are iteratively computed by evaluating the candidate features in terms of their relevance with the target class and pairwise redundancies. The use mutual information-based metrics has been extensively studied as an approach to quantifying the relevance and redundancy of candidate features. In this study, a novel filter approach based on ranks of positive instances is proposed. In this approach, redundancy is replaced by diversity to quantify the complementarity of a candidate feature with respect to the already selected subset. Both relevance and diversity are computed in terms of the ranks of positive instances, which is analogous to the computation of the area under the receiver operating characteristic curve (AUC). Experiments conducted on 15 UCI and microarray gene expression data sets have confirmed that the proposed multivariate filter feature selection approach provides better performance scores when compared to other competing multivariate methods as well as benchmark univariate filters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call