Abstract

Text categorization (TC) is a problem of assigning a document into predefined classes. One of the most important issues in TC is feature selection. In this paper, we propose a new approach in feature selection called Strong Class Information Words (SCIW). Different from many existing feature selection methods, our method takes many kinds of information into account. Moreover, the method can easily use some implicit regularities of natural language. Our extensive experiments resulted in a good performance on precision by a linear classifier using SCIW feature selection method. The most attractive aspect of the classifier as a combining part in the categorization system is shown in our experiments and the combining system outperforms performances in comparison with conventional classifiers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call