Abstract

The high-dimensional text data always contains a large quantity of noisy terms which bring negative effects on the performance of text classification. Feature selection is the common solution for dimension reduction in text classification. The choices of feature selection methods for text classification have significant impacts on classification accuracy. According to our literature review, few recent studies of feature selection focus on performance comparisons on feature selection methods. To fill this gap, this paper conducts discussions to compare performances of typical feature selection methods which are commonly involved in previous studies for text classification. Firstly, we introduce and discuss a series of typical feature selection methods in previous studies for text classification in details. Secondly, we conduct comparison experiments on four benchmark datasets to compare the effectiveness of twenty typical feature selection methods in text classification. Finally, we give conclusions on performance of the typical feature selection methods. The result of this paper gives a guideline for selecting appropriate feature selection methods for text classification academic analysis or real-world text classification applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call