Gene expression cancer classification using modified K-Nearest Neighbors technique

Sarah M Ayyad,Ahmed I Saleh,Labib M Labib

doi:10.1016/j.biosystems.2018.12.009

Abstract

Gene expression microarray classification is a crucial research field as it has been employed in cancer prediction and diagnosis systems. Gene expression data are composed of dozens of samples characterized by thousands of genes. Hence, an accurate and effective classification of such samples is a challenge. Machine learning techniques have been broadly utilized to build substantial and precise classification models. This paper proposes a new classification technique for gene expression data, which is called Modified k-nearest neighbor (MKNN). MKNN is applied in two scenarios namely; smallest modified KNN (SMKNN) and largest modified KNN (LMKNN). Both implementations are undertaken to enhance the performance of KNN. The key idea is to employ robust neighbors from training data by using a new weighting strategy. Several experiments have been performed on six different gene expression datasets. Experiments have shown that MKNN in its both scenarios outperforms traditional as well as recent ones. MKNN has been compared against (i) KNN, (ii) weighted KNN, (iii) support vector machine (SVM), (iv) fuzzy support vector machine, (v) brain emotional learning (BEL) in terms of classification accuracy, precision, and recall. On the other hand, results show that MKNN introduces smaller testing time than both KNN and weighted KNN.

Full Text