Abstract
Document classification can be defined as the task of automatically categorizing collections of electronic documents into their annotated classes, based on their contents. In recent years this has become important due to the advent of large amount of data in digital form. For several decades now document classification in the form of text classification systems have been widely implemented in numerous applications such as spam filtering, e-mails, knowledge repositories and ontology mapping. The main objective is to propose a text classification based on the feature selection and preprocessing there by reducing the dimensionality of the feature vector and increase the classification accuracy. We study the advantages of and disadvantages of K-nearest neighbor (KNN) classification and Support Vector Machine (SVM)classification in performing their classification tasks. In our investigation, we found that the well-performing KNN classification approach may suffer from less accurate than the SVM classification.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.