Abstract
The high-dimensional text data always contains a large quantity of noisy terms which bring negative effects on the performance of text classification. Feature selection is the common solution for dimension reduction in text classification. The choices of feature selection methods for text classification have significant impacts on classification accuracy. According to our literature review, few recent studies of feature selection focus on performance comparisons on feature selection methods. To fill this gap, this paper conducts discussions to compare performances of typical feature selection methods which are commonly involved in previous studies for text classification. Firstly, we introduce and discuss a series of typical feature selection methods in previous studies for text classification in details. Secondly, we conduct comparison experiments on four benchmark datasets to compare the effectiveness of twenty typical feature selection methods in text classification. Finally, we give conclusions on performance of the typical feature selection methods. The result of this paper gives a guideline for selecting appropriate feature selection methods for text classification academic analysis or real-world text classification applications.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.