Abstract

Gene expression dataset contains a small number of tissues and thousands or tens of thousands of noisy and redundant genes. This can lead to possibly overfitting and curse of dimensionality or even complete failure in the analysis of microarray. It can deteriorate the capability of the classification algorithms and also increase the computational burden. To overcome these challenges, in this study, we proposed a new hybrid wrapper approach to determine the optimal gene subsets from gene expression profiling. This proposed approach is a combination of Teaching Learning-based Optimization (TLBO) and Simulated Annealing (SA) algorithm, called TLBOSA which can help to reveal the hidden patterns of tumors and enhance the interpretability of the selected genes. The proposed method comprises two steps. In the first step, Correlation-based Feature Selection (CFS) is applied to select subsets of optimal genes and filter the redundant genes from the biological datasets. In the second step, simulated annealing is incorporated with the TLBO algorithm, and used to increase the solution quality after each iteration of TLBO algorithm. It can also identify the subset of the most informative genes that can contribute to the accurate detection of cancer. A new encoding scheme is also introduced to transform the continuous version of TLBOSA to binary. It utilizes Support Vector Machine (SVM) classifier as a fitness function to select biomarkers that can classify biological tissues of binary and multi-class cancers. The performance of the proposed approach is evaluated on ten sets of microarray data and compared with well-known wrapper methods in the literature. Experimental results and statistical analysis demonstrate that the proposed method has significantly selected discriminating input genes and achieved high classification accuracy. Specifically, it achieves high prediction accuracy on Small-Blue-Round-Cell Tumour (SBRCT) data at 99.91% with 05 gene subsets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call