This paper presents a comparative study of machine learning models namely K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Tree, and Logistic Regression on two datasets: 20Newsgroups and Wine datasets. Based on basic accuracy, and precision, and recall of the models, F1 measures were assessed. When compared with other classifiers such as KNN and Decisions Trees, SVM and Logistic regression gave better results especially in the case of the 20Newsgroups dataset dominated by textual high dimensional data. KNN had poor recall results and Decision Tree was moderate. In the Wine dataset, since the structure of data is comparatively less complex in our context then all the models yielded almost similar results with accuracy and precision factors very close to 1.0 which of course manifested that the choice of the model does not affect much on simple data. These results stress the importance of the choice of the appropriate model for tasks with a certain level of data complexity; detailed models show the greatest efficiency at the accomplishment of complicated tasks, though at the same time, they are not required for simple, structured data.
Read full abstract