Abstract
BackgroundIn discriminant analysis of microarray data, usually a small number of samples are expressed by a large number of genes. It is not only difficult but also unnecessary to conduct the discriminant analysis with all the genes. Hence, gene selection is usually performed to select important genes.ResultsA gene selection method searches for an optimal or near optimal subset of genes with respect to a given evaluation criterion. In this paper, we propose a new evaluation criterion, named the leave-one-out calculation (LOOC, A list of abbreviations appears just above the list of references) measure. A gene selection method, named leave-one-out calculation sequential forward selection (LOOCSFS) algorithm, is then presented by combining the LOOC measure with the sequential forward selection scheme. Further, a novel gene selection algorithm, the gradient-based leave-one-out gene selection (GLGS) algorithm, is also proposed. Both of the gene selection algorithms originate from an efficient and exact calculation of the leave-one-out cross-validation error of the least squares support vector machine (LS-SVM). The proposed approaches are applied to two microarray datasets and compared to other well-known gene selection methods using codes available from the second author.ConclusionThe proposed gene selection approaches can provide gene subsets leading to more accurate classification results, while their computational complexity is comparable to the existing methods. The GLGS algorithm can also better scale to datasets with a very large number of genes.
Highlights
In discriminant analysis of microarray data, usually a small number of samples are expressed by a large number of genes
Datasets we present performance of the proposed gene selection algorithms, i.e. the leave-one-out calculation sequential forward selection (LOOCSFS) and the gradient-based leave-one-out gene selection (GLGS) algorithms on two public domain datasets
We have proposed two gene selection algorithms, the LOOCSFS and the GLGS algorithms based on an efficient and exact calculation of the leave-one-out cross-validation error of least squares support vector machine (LS-SVM)
Summary
In discriminant analysis of microarray data, usually a small number of samples are expressed by a large number of genes. It is difficult and unnecessary to conduct the discriminant analysis with all the genes. Gene selection is usually performed to select important genes. Given some microarray data characterized by a large number of genes' expressions, a typical discriminant analysis constructs a classifier based on the given data to distinguish between different disease types. A gene selection procedure to select the most informative genes from the whole gene set is usually employed. A preceding gene selection procedure can (page number not for citation purposes)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.