Abstract

Support Vector Machine is widely used for classification in machine learning due to the flexibility offered by the various kernels. One of the kernels popularly used is based on the radial basis function (RBF). However, for using this kernel the parameter such as sigma needs to be selected properly by various techniques. Often this parameter is selected by the grid search technique using cross validation in machine learning. Although this technique gives accurate parameter, it is very time consuming. Distance between two Classes (DBTC) is another promising technique to find sigma, which is preferred after the grid search technique when accuracy is concerned. If execution time is considered then it is preferred than grid search. Outliers are point, which appear to markedly deviate from other observations, and they are removed from the data to have more uniformity. The study in this paper deals with the effect of outlier removal on grid search and DBTC. The outliers are detected by the Local Outlier Factor (LOF) technique. Outliers ranging from 1% to 10% are removed from the data and grid search and DBTC techniques are carried out to find the optimal hyperparameter sigma. This sigma is then used to find the machine learning accuracy, to conclude which technique is better. The findings reveal that the behavior of accuracy with outliers removal is random for grid search of SVM and there is no definite pattern observed for 4 out of 7 datasets. Similarly, for DBTC, the behavior of accuracy is random with outlier removal for 5 out of 7 datasets. In addition, outlier removal does not affect DBTC for 6 out of 7 datasets for the predefined sigma values considered in this study.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call