Abstract
We detect key information of high-dimensional microarray profiles based on wavelet analysis and genetic algorithm. Firstly, wavelet transform is employed to extract approximation coefficients at 2nd level, which remove noise and reduce dimensionality. Genetic algorithm (GA) is performed to select the optimized features. Experiments are performed on four datasets, and experimental results prove that approximation coefficients are efficient way to characterize the microarray data. Furthermore, in order to detect the key genes in the classification of cancer tissue, we reconstruct the approximation part of gene profiles based on orthogonal approximation coefficients. The significant genes are selected based on reconstructed approximation information using genetic algorithm. Experiments prove that good performance of classification is achieved based on the selected key genes.
Highlights
Hugeadvances in DNA microarrayhave allowed the scientist to test thousands of genes in normal or tumor tissues on a single array and check whether those genes are active, hyperactive, or silent
In Huang and Zheng’s study [7], they reshuffled the dataset randomly. They performed the experiments with 20 random splittings of the original datasets, which means that each randomized training and test set contains the same amount of samples of each class compared with the original training and test set. They concluded the results of different methods, such as least 92.86% of squares support vector machine (LS-SVM), 94.40% of Principle component analysis (PCA), 93.58% of kernel PCA (KPCA), 94.65% of penalized independent component regression (P-ICR), 93.83% of penalized principal component regression (P-PCR), and nearest shrunken centroid classifier (PAM)
After we reconstruct the approximation at 2nd level based on approximation coefficients, 7 Genetic algorithm (GA) features selected from reconstructed approximation achieve the 100% performance, which are corresponding to 7 significant genes (“32556 at,” “33415 at,” “33725 at,” “34775 at,” “36122 at,” “36340 at,” “40578 s at”)
Summary
Hugeadvances in DNA microarrayhave allowed the scientist to test thousands of genes in normal or tumor tissues on a single array and check whether those genes are active, hyperactive, or silent. Tan and Gilbert [11] focus on three different supervised machine learning techniques in cancer classification, namely C4.5 decision tree, and bagged and boosted decision trees They have performed classification tasks on seven publicly-available cancerous microarray data and compared the classification/prediction performance of these methods. Instead of transforming uncorrelated components, like PCA and LDA, independent component analysis (ICA) attempts to achieve statistically independent components in the transform for feature extraction All these methods do not detect the localized features of microarray data. Approximation coefficients compress the microarray data and hold the major information of data, not losing time property of data The transforms, such as PCA, LDA, and ICA, are based on training dataset. The third important advantage of wavelet transform is that the significant genes can be detected based on the reconstruction information of decomposition coefficients at different level. Experiments are carried out on four datasets, and key genes are detected based on GA features selected from reconstructed approximation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.