Abstract

An important application of microarray technology is the assignment of new subjects to known clinical groups (class prediction), but the huge number of screened genes and the small number of samples make this task difficult. To overcome this problem, the usual approach has been to extract a small subset of significant genes (gene selection) or to use the whole set of genes to build latent components (dimension reduction), then applying some usual multivariate classification procedure. Alternatively, both aims -gene selection and class prediction- can be achieved at the same time by using methods based on Partial Least Squares (PLS), as reported in the present work. We present an iterative PLS algorithm based on backward variable elimination through the “Variable Influence on Projection” (VIP) statistic, which finds an optimal PLS model through training and test sets. It simultaneously manages to reduce the number of selected genes by an iterative procedure and finds the best number of PLS factors to reach an optimal classification performance. It is a simple approach that uses only one mathematical method, maintains the identification of discriminatory genes, and builds an optimal predicting model with a fast computation. The algorithm runs as a module of the SIMFIT statistical package, where the optimal model and datasets can be re-run to further interpret the system through additional PLS options, such as scores and loadings plots, or class assignment of new samples. The proposed algorithm was tested under different scenarios occurring in microarray analysis using simulated data. The results are also compared against different classification methods such as KNN, PAM, SVM, RF and standard PLS. Keywords: Classification, gene selection, microarray, partial least squares, PLS, VIP statistic.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.