Abstract
In the recent years, the volume of text documents in the form of digital way has grown up extremely in size. As significance, there is a need to be competent to automatically bring together and classify the documents based on their content. The main goal of text classification is to partition the unstructured set of documents into their respective categories based on its content. The main aim of this research work is to automatically classify the documents which are stored in the personal computer into their relevant categories. This work has two significant phases. In the first phase, the important features are selected for classification and the second phase is the classification of text documents. For selecting the optimal features, this research work proposes a new algorithm, optimization technique for feature selection (OTFS) algorithm. To estimate the proficiency of proposed feature selection algorithm, the OTFS algorithm was compared with the existing approaches artificial bee colony, firefly algorithm, ant colony optimization and particle swarm optimization. In the second phase, this research work proposed machine learning-based automatic text classification (MLearn-ATC) algorithm for text classification. In classification, the MLearn-ATC algorithm was compared with widely used classification techniques probabilistic neural network, support vector machine, K-nearest neighbor and Naive Bayes. From this, the output of first phase is used as the input for classification phase. The decisive results establish that the proposed algorithms achieve the better accuracy for optimizing the features and classifying the text documents based on their content.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.