Abstract
BackgroundSelection of influential genes with microarray data often faces the difficulties of a large number of genes and a relatively small group of subjects. In addition to the curse of dimensionality, many gene selection methods weight the contribution from each individual subject equally. This equal-contribution assumption cannot account for the possible dependence among subjects who associate similarly to the disease, and may restrict the selection of influential genes.ResultsA novel approach to gene selection is proposed based on kernel similarities and kernel weights. We do not assume uniformity for subject contribution. Weights are calculated via regularized least squares support vector regression (RLS-SVR) of class levels on kernel similarities and are used to weight subject contribution. The cumulative sum of weighted expression levels are next ranked to select responsible genes. These procedures also work for multiclass classification. We demonstrate this algorithm on acute leukemia, colon cancer, small, round blue cell tumors of childhood, breast cancer, and lung cancer studies, using kernel Fisher discriminant analysis and support vector machines as classifiers. Other procedures are compared as well.ConclusionThis approach is easy to implement and fast in computation for both binary and multiclass problems. The gene set provided by the RLS-SVR weight-based approach contains a less number of genes, and achieves a higher accuracy than other procedures.
Highlights
Selection of influential genes with microarray data often faces the difficulties of a large number of genes and a relatively small group of subjects
We introduce the proposed gene selection algorithm, discuss briefly the regularized least squares support vector regression (RLS-SVR), and outline classification rules based on the selected genes
In the following we introduce the principle of the proposed gene selection procedures, and illustrate the regularized least squares (RLS)-SVR algorithm for assigning weights and support vector machines (SVMs) classification
Summary
Selection of influential genes with microarray data often faces the difficulties of a large number of genes and a relatively small group of subjects. Golub et al [1] and Brown et al [2] considered the classification of known disease status (called class prediction or supervised learning) using microarray data These gene expression values are recorded from a large number of genes, where only a small subset is associated with the disease class labels. In the community of machine learning, many procedures, termed as gene selection, variable selection, or feature selection, have been developed to identify or to select a subset of genes with distinctive features Both the proportion of "relevant" genes and the number of tissues (subjects) are usually small, as compared to the number of genes, and lead to difficulties in finding a stable solution. The dimension reduction for gene selection as well as for finding influential genes is essential
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.