Abstract
Text categorization is the process of assigning documents to a set of previously fixed categories. It is widely used in many data-oriented management applications. Many popular algorithms for text categorization have been proposed, such as Naive Bayes, k-Nearest Neighbor (k-NN), Support Vector Machine (SVM). However, those classification approaches do not perform well in every case, for example, SVM can not identify categories of documents correctly when the texts are in cross zones of multi-categories, k-NN cannot effectively solve the problem of overlapped categories borders. In this paper, we propose an approach named as Multi-class SVM-kNN (MSVM-kNN) which is the combination of SVM and k-NN. In the approach, SVM is first used to identify category borders, then k-NN classifies documents among borders. MSVM-kNN can overcome the shortcomings of SVM and k-NN and improve the performance of multi-class text classification. The experimental results show MSVM-kNN performs better than SVM or kNN.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.