Abstract

Feature selection is one of the crucial steps in supervised learning, which influences the entire subsequent classification (or regression) process. The approaches to this task can largely be divided into two categories: filter-based and wrapper-based methods. Generally, the latter produces better results than the former with regard to given learning methods, though it consumes more computational resources for searches over the feature subset space. In this paper, we propose an Efficient wRapper based on a Paired t-Test (ERPT) for choosing features from large-scale data consisting of thousands of variables, such as microarrays. Statistical tests are a reasonable option when the number of features is very large because they have more predictable behavior and can be more efficient than most search methods. The proposed method consists of two phases: decrement phase and increment phase. In the decrement phase, it selects strongly relevant features. In the increment phase, it adds weakly relevant features, given the previously selected features. Our method, combined with naive Bayes classifiers, has been tested in an extensive set of experiments on University of California Irvine (UCI) Machine Learning Repository data. The results showed that the performance of the proposed method is comparable to that of the backward search-based wrapper and superior to that of the forward search-based wrapper. Furthermore, it demonstrated much better performance than the forward search-based wrapper when applied to three microarray data sets, for which the backward search-based wrapper was impractical because of the computational burden involved. The proposed method has the following three merits: (1) it is applicable to data sets having thousands of variables, (2) it provides a theoretically sound and controllable criterion for thresholding features, and (3) it finds feature subsets for the maximizing of classification performance on sparse domains.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.