Abstract

The objective of this paper is to categorize English documents with the topic “Network Attack” using Multinomial Naive Bayes method and. It then compares with K-Nearest Neighbors (KNN), Support Vector Machine Linear (SVM Linear) and Random Forest. The classification process was conducted using some feature extraction methods, such as Term Frequency-Inverse Document Frequency (TF-IDF) extraction, Count Vector, and Document Vector (Doc2vec). The experimental result showed that MNB with TF-IDF got an accuracy of 76.00%. The TF-IDF with KNN method, SVM Linear, Random Forest results from efficiency 72.66%, 78.66% and 81.66% respectively, and using Count Vector were 60.00%, 77.00%, 70.66% and 81.00% (MNB, KNN, SVM Linear, Random Forest). The experimental was also conducted using the Random Forest method (as the classifier) and Document Vector (as the feature extraction method). Thus it is obtained the accuracy of 63.33%. The MNB method was quite better to classify the document than KNN method. However, SVM and Random Forest methods were better than the MNB and KNN methods. It can be concluded that the use of TF-IDF was generally better than using Count Vector and Doc2vec. However, the Count Vector had better result compared to TF-IDF under MNB Classifies

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call