Abstract

Digital revolution that started over fifteen years ago is contributing to the exponential growth in text documents that show up in many forms such as web pages, emails, resumes, scientific reports, digital archives, etc. It is of great importance to develop techniques for automatic text document classification as a service to information consumers. Earlier text document classification techniques have used ‘keyword-based’ features and related statistics to achieve good results. More recently, some of these techniques have been extended to include ‘phrase-based’ and ‘concept-based’ features to achieve better results. Majority of these techniques utilize a very large number of features that are extracted from the training set of documents. We present a hierarchical method for selection of a fewer number of quality features to improve the classification efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call