Abstract

Gene expression data is illustration levels of genes that DNA encode into the protein such as muscle or brain cells. However, some abnormal cells may evolve from unnatural expression levels. So, finding a subset of informative gene would be beneficial to biologists because it can help to identify discriminate genes. Unfortunately, genes grow rapidly up into the tens of thousands gene which make it difficult for classifying processes such as curse of dimensionality and misclassification problems. This paper proposes classification models based-on incremental learning algorithm and feature selection on gene expression data. Three feature selection methods: Correlation based Feature Selection (Cfs), Gain Ratio (GR), and Information Gain (Info) combined with Incremental Learning Algorithm based-on Mahalanobis Distance (ILM). Result of the experiment represented proposes models CfsILM, GRILM and InfoILM not only reduce many dimensions, save time-resource but also improve accuracy rate. Particularly, CfsILM was outstanding than other models on three public gene expression datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call