Abstract

Dimensionality reduction is a necessary task in data mining when working with high dimensional data. A type of dimensionality reduction is feature selection. Feature selection based on feature ranking has received much attention by researchers. The major reasons are its scalability, ease of use, and fast computation. Feature ranking methods can be divided into different categories and may use different measures for ranking features. Recently, ensemble methods have entered in the field of ranking and achieved more accuracy among others. Accordingly, in this paper a Heterogeneous ensemble based algorithm for feature ranking is proposed. The base ranking methods in this ensemble structure are chosen from different categories like information theoretic, distance based, and statistical methods. The results of the base ranking methods are then fused into a final feature subset by means of genetic algorithm. The diversity of the base methods improves the quality of initial population of the genetic algorithm and thus reducing the convergence time of the genetic algorithm. In most of ranking methods, it's the user's task to determine the threshold for choosing the appropriate subset of features. It is a problem, which may cause the user to try many different values to select a good one. In the proposed algorithm, the difficulty of determining a proper threshold by the user is decreased. The performance of the algorithm is evaluated on four different text datasets and the experimental results show that the proposed method outperforms all other five feature ranking methods used for comparison. One advantage of the proposed method is that it is independent to the classification method used for classification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.