Abstract

Text mining is a technique that can be used for data processing. News classification is one of the text mining applications. Support Vector Machine is an algorithm that can be used for news classification. However, SVM performance is less than optimal when applied to large datasets. The number of attributes used is also a problem in classification. The number of these attributes will affect the performance of the classifier. This research aims to increase the Accuracy of SVM by applying N-gram and Chi-square feature selection. SVM accuracy without addition N-gram and feature selection have an accuracy of 96.40%. SVM accuracy by applying bigram and Chi-square feature selection with 70% feature reduction increased 0.95% has an accuracy of 97.35%. SVM accuracy by applying unigram and Chi-square feature selection with 90% reduction features increased by 1.58% with the highest accuracy value 97.98%. With this best pattern, the testing data is tested, and the results show improvement. SVM accuracy without applying N-gram and without feature selection has an accuracy of 76.80%. SVM accuracy by applying bigram and Chi-square with 70% feature reduction has an accuracy of 82%. SVM accuracy by applying unigram and Chi-square with a 90% reduction feature obtains the highest accuracy of 82.40%. From these studies, SVM performance is influenced by applying N-gram and Chi-square, which affect the number of features. The best text classification performance can be obtained maximally if the N-gram value and the feature amount are determined precisely.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.