Abstract

Three methods for variable selection are described, namely the t-statistic, Partial Least Squares Discriminant Analysis (PLS-DA) weights and regression coefficients, with the aim of determining which variables are the most significant markers for discriminating between two groups: a variable’s level of significance is related to its magnitude. Monte-Carlo methods are employed to determine empirical significance of variables, by permuting randomly the class membership 5000 times to obtain null distributions, and comparing the observed statistic for each variable with the null distribution. Seven simulations consisting of 200 samples, divided equally between two classes, and 300 variables, are constructed; in one dataset there are no induced correlations between variables, in two datasets correlations are induced but there is no induced separation between the classes, and in four datasets, separation is induced by selecting 20 of the variables to be discriminators. In addition two metabolomic datasets were analysed consisting of the GCMS of urinary extracts from mice both to determine the effect of stress and to determine the effect of diet on the urinary chemosignal. It is shown that the t-statistic combined with Monte-Carlo permutations provides similar results to PLS weights. PLS regression coefficients find the least number of markers but, for the simulations, the lowest False Positives rates.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.