Abstract

Microarray technology is a novel method to monitor the expression levels of an enormous number of genes simultaneous. These gene expressions are being used to detect various forms of diseases. The problem is not all genes are important; some genes can be redundant or irrelevant. These irrelevant genes add a computational workload to the prediction process. Therefore, this study aims at (1) identifying the most important genes that cause of Alzheimer's Disease (AD) by using feature (gene) selection to reduce the high-dimensional data size. Hence, a process for gene selection is twofold; removing the irrelevant genes and selecting the informative genes, and (2) predicting AD patients based on the selected subset of genes. In this paper, gene selection methods have been implemented, including Analysis of Variance (ANOVA) and Mutual Information (MI). In addition to, the k-means algorithm as a gene selection has been suggested. It is also presumed that the relevant genes have been existed in a same cluster, while the insignificant genes are really not belonging to the any cluster. The proposed system is applied on a high dimensional dataset namely AD dataset that contains 16382 genes. After picking the informative genes, prediction is performed with Convolutional Neural Network (CNN) that is commonly used in multiple prediction tasks. The proposed system performance was evaluated using the accuracy of the prediction. The results show that k-means clustering based gene selection can be performed to produce subset of key genes. The k-means algorithm with CNN model returns 0.929 accuracy based on genes subset from ANOVA method while k-means algorithm and CNN model achieve 0.886 accuracy based on genes subset from MI method. Thus, Genes subset selected is achieved a better accuracy at prediction and a little time of processing

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call