Region based Support Vector Machine algorithm for medical diagnosis on Pima Indian Diabetes dataset

Savvas Karatsiolis,Christos N Schizas

doi:10.1109/bibe.2012.6399663

Abstract

The problem of diagnosing Pima Indian Diabetes from data obtained from the UCI Repository of Machine Learning Databases[6] is handled with a modified Support Vector Machine strategy. Performance comparison with previous studies is presented in order to demonstrate the proposed algorithm's advantages over various classification methods. The goal of the paper is to provide the grasp of a methodology that can be efficiently used to raise classification success rates obtained by the use of conventional approaches such as Neural Networks, RBF networks and K-nearest neighbors. The suggested algorithm divides the training set into two subsets: one that arises from the joining of coherent data regions and one that comprises of the data portion that is difficult to be clustered. Consequently, the first subset is used to train a Support Vector Machine with a RBF kernel and the second subset is used to train another Support Vector Machine with a polynomial kernel. During classification the algorithm is capable of identifying which of the two Support Vector Machine models to use. The intuition behind the suggested algorithm relies on the expectation that the RBF Support Vector Machine model is more appropriate to use on data sets of different characteristics than the polynomial kernel. In the specific study case the suggested algorithm raised average classification success rate to 82.2% while the best performance obtained by previous studies was 81% given by a fine tuned highly complex ARTMAP-IC model.

Full Text