Integer Representation and B-Tree for Classification of Text Documents: An Integrated Approach

S N Bharath Bhushan,Steven Lawrence Fernandes,Ajit Danti

doi:10.1007/978-981-10-7563-6_50

Abstract

Text document classification is creating more interest because of the availability of the information in the textual or electronic form. Generally, in conventional approaches, representation of text data and classification of text documents are considered as nondependent issues. In this research article, we have considered that overall efficiency of the text classification system depended on the effective representation of text data and efficient methodology for classification of the text documents. Here effective compressed representation for text documents is proposed for the text documents. Followed by a B-Tree-based classification methodology is adapted for classification. The proposed compressed representation and B-Tree methodologies are verified on the publically available large corpus to validate the effectiveness of the proposed models.

Full Text