Abstract

Effective feature selection methods are important for improving the efficiency and accuracy of text categorization algorithms by removing redundant and irrelevant terms from the corpus. Extensive research has been done to improve the performance of individual feature selection methods. However, it is always a challenge to come up with an individual feature selection method which would outperform other methods in most cases. In this paper, we explore the possibility of improving the overall performance by combining multiple individual feature selection methods. In particular, we propose a method of combining multiple feature selection methods by using an information fusion paradigm, called Combinatorial Fusion Analysis (CFA). A rank-score function and its associated graph, called rank-score graph, are adopted to measure the diversity of different feature selection methods. Our experimental results demonstrated that a combination of multiple feature selection methods can outperform a single method only if each individual feature selection method has unique scoring behavior and relatively high performance. Moreover, it is shown that the rank-score function and rank-score graph are useful for the selection of a combination of feature selection methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call