Abstract

In this work, we address the problem of supervised feature selection (FS) for high-dimensional datasets with a small number of instances. Here, we propose a novel heuristic FS approach, Conditional Priority Coverage Maximization (CPCM) which seeks to leverage the local information provided by the small set of instances. We define the vote assigned by an instance to a feature as the local relevance of the latter. Also, we show that the proposed voting scheme is asymptotically related to the Bayes’ decision rule for minimum risk classification. Next, we exploit the instance votes for feature selection by posing it as a set-covering problem — we seek to select a subset of features such that they can together cover the instances. This approach avoids the selection of redundant features, while selecting relevant ones. In addition, we formulate the stopping criterion to select a compact subset of features. Through experiments on synthetic and real datasets, we demonstrated that CPCM outperforms other graph based FS techniques and state-of-the-art FS approaches employing mutual information (MI). Further, we evaluated the stability of CPCM to minor variations in the training data and found it to be reasonably robust.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.