Abstract
Feature selection plays an important role in text categorization. Classic feature selection methods such as document frequency (DF), information gain (IG), mutual information (MI) are commonly applied in text categorization. But usually they only take plain text into account. Knowledge Gain (KG) is a new feature selection method which is proposed in my previous paper. It measures attribute's importance based on Rough Set theory. Experiment shows that it performs well in traditional text classification, and it has obvious advantage in unbalanced corpus in recall rate. Unlike traditional text classification, characteristics of microblog reflected in short text and special structure networks, including user social network and behavior network. This results in less text information and more behavior and social information of microblog users. The classic feature selection algorithms, which are proposed based on text feature, is not applicable. In this paper, we validated that KG which is proposed based on the rough set knowledge can select optimal feature consistently in multi-type feature space of microblog user classification. Experiment shows that it has better performance in multi-type feature selection than other classic feature selection methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.