Abstract

Feature selection is the core step in text categorization system. The selected feature subset directly influences the results of text categorization system. It firstly analyzed simply several classic feature selection methods and summarized their deficiencies, and then presented feature distinguishability. Subsequently, it introduced fractal dimension into rough sets and provided an attribute reduction algorithm based on fractal dimension. Finally, it combined the reduction algorithm with the feature distinguishability and proposed a comprehensive feature selection method. The comprehensive method firstly uses the feature distinguishability to select features and filter out some terms to reduce the sparsity of feature spaces, and then uses the proposed attribute reduction algorithm to eliminate redundancy, so that it can acquire the feature subset which are more representative. The experimental results show that the comprehensive method is promising.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.