Abstract
The microarray gene expression data has a large number of genes with different expression levels. Analyzing and classifying datasets with entire gene space is quite difficult because there are only a few genes that are informative. The identification of bio-marker genes is significant because it improves the diagnosis of cancer disease and personalized medicine is suggested accordingly. Initially, the parallelized minimum redundancy and maximum relevance ensemble (mRMRe) is employed to select top m informative genes. The selected genes are then fed into the Genetic Algorithm (GA) that selects the optimal set of genes heuristically, which uses Mahalanobis Distance (MD) as the distance measure. This proposed method (mRMRe-GA) is applied to four microarray datasets using Support Vector Machine (SVM) as a classifier. The Leave One out Cross Validation (LOOCV) method is used to analyze the performance of the classifier. Comparative study of the proposed mRMRe-GA method is carried out with other methods. The proposed mRMRe-GA method significantly improves the classification accuracy with less number of selected genes.
Highlights
Identification and selection of informative genes is the main challenge in analyzing high-dimensional microarray data
The selected genes are fed into the Genetic Algorithm (GA) that selects the optimal set of genes heuristically, which uses Mahalanobis Distance (MD) as the distance measure
6.2.1 minimum redundancy and maximum relevance ensemble (mRMRe) The mRMRe was used to select the topmost informative genes from four microarray benchmark datasets, and Support Vector Machine (SVM) classifier was employed for classification, which resulted in highest accuracy
Summary
Identification and selection of informative genes is the main challenge in analyzing high-dimensional microarray data. The measurement of the expression levels of genes in DNA microarray facilitates the researchers to address the issues in cancer classification and paves way for personalized medicine. The cancer datasets are usually vast and the number of features mainly influences the analytical accuracy. Lack of a powerful method to analyze the data for all genes simultaneously is the most difficult challenge. The entire dataset can be reduced to a set of a minimal number of differentially expressed genes that classifies the samples into cancer vs normal cases. Identification of differentially expressed genes is the primary task in microarray analysis [1]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.