Hierarchical Approach to Select Feature Vectors for Classification of Text Documents

N Kapalavayi,Gongzhu Hu Gongzhu Hu,S.N.J Murthy

doi:10.1109/aiccsa.2006.205241

Abstract

Digital revolution that started over fifteen years ago is contributing to the exponential growth in text documents that show up in many forms such as web pages, emails, resumes, scientific reports, digital archives, etc. It is of great importance to develop techniques for automatic text document classification as a service to information consumers. Earlier text document classification techniques have used ‘keyword-based’ features and related statistics to achieve good results. More recently, some of these techniques have been extended to include ‘phrase-based’ and ‘concept-based’ features to achieve better results. Majority of these techniques utilize a very large number of features that are extracted from the training set of documents. We present a hierarchical method for selection of a fewer number of quality features to improve the classification efficiency.

Full Text