Two-Stage Hybrid Data Classifiers Based on SVM and kNN Algorithms

Liliya A Demidova

doi:10.3390/sym13040615

Abstract

The paper considers a solution to the problem of developing two-stage hybrid SVM-kNN classifiers with the aim to increase the data classification quality by refining the classification decisions near the class boundary defined by the SVM classifier. In the first stage, the SVM classifier with default parameters values is developed. Here, the training dataset is designed on the basis of the initial dataset. When developing the SVM classifier, a binary SVM algorithm or one-class SVM algorithm is used. Based on the results of the training of the SVM classifier, two variants of the training dataset are formed for the development of the kNN classifier: a variant that uses all objects from the original training dataset located inside the strip dividing the classes, and a variant that uses only those objects from the initial training dataset that are located inside the area containing all misclassified objects from the class dividing strip. In the second stage, the kNN classifier is developed using the new training dataset above-mentioned. The values of the parameters of the kNN classifier are determined during training to maximize the data classification quality. The data classification quality using the two-stage hybrid SVM-kNN classifier was assessed using various indicators on the test dataset. In the case of the improvement of the quality of classification near the class boundary defined by the SVM classifier using the kNN classifier, the two-stage hybrid SVM-kNN classifier is recommended for further use. The experimental results approve the feasibility of using two-stage hybrid SVM-kNN classifiers in the data classification problem. The experimental results obtained with the application of various datasets confirm the feasibility of using two-stage hybrid SVM-kNN classifiers in the data classification problem.

Highlights

The main goal of the paper is as follows: in the context of developing hybrid classifiers using the support vector machine algorithm (SVM) algorithm, we explored the possibility of creating various versions of twostage SVM-k-nearest neighbors algorithm (kNN) classifiers, in which, in the first stage, the one-class SVM classifier will be used along with the binary SVM classifier
For each of the versions of the classifiers, we proposed implementing two variants of the formation of the training set for the kNN classifier: a variant using all objects from the initial training set located inside the strip dividing the classes, and a variant using only those objects from the initial training set, which is located within an area containing all of the misclassified objects from the strip dividing the classes
The obtained results allow us to speak about the prospects of using twostage hybrid SVM-kNN classifiers with the aim to increase the data classification quality

Summary

Introduction

Data classification problems arise and are solved in many areas of human activity [1,2,3,4,5,6,7].Such problems include the problems of credit risk analysis [1], medical diagnostics [2], text categorization [4], the identification of facial images [7], etc.Nowadays, dozens of algorithms and classification methods have been developed, among which linear and logistic regressions [8], Bayesian classifier [8,9], decision rules [10], decision trees [8,11], random forest algorithm (RF) [12], algorithms based on neural networks [13], k-nearest neighbors algorithm (kNN) [14,15,16], and the support vector machine algorithm (SVM) [17,18,19,20] should be highlighted.During the development of any classifier, it is trained and tested. Data classification problems arise and are solved in many areas of human activity [1,2,3,4,5,6,7]. Such problems include the problems of credit risk analysis [1], medical diagnostics [2], text categorization [4], the identification of facial images [7], etc. During the development of any classifier, it is trained and tested. The development of a classifier can be performed using the principles of k-fold validation. A well-trained classifier can be applied to classify new objects

Objectives

Findings

Discussion

Conclusion