Abstract

The use of modern statistical methodology to overcome the known pitfalls of classical regression models in the analysis of large numbers of highly correlated data, has increased considerably in recent years. Statisticians in the field of chemometrics and OMICS research have developed a new method called Orthogonal projections to latent structures (OPLS). In comparison with the regular partial least squares (PLS) regression, OPLS provides a simpler method with the additional advantage that the orthogonal variation can be analyzed separately. Use of the OPLS model has spread to fields other than its origin but it is not yet applied to the field of epidemiology, which is a wide field of research. In public health and clinical research, there are situations in which large numbers of correlated variables need to be modeled. The authors successfully applied OPLS-DA to model large numbers of variables in a case-control study and compared it with discriminant analysis done by partial least squares regression. Prior to fitting the models, the dataset was split into two parts:  a training set and a prediction set. Models fitted on the training dataset were later tested for validity in the prediction dataset. The OPLS-DA was compared with PLS-DA for model fitness, diagnostics and model interpretability. Both models suited the data but OPLS-DA was preferable. The authors encourage the use of these methods to increase study power and statistical validity in epidemiology and similar settings in which large numbers of correlated variables need to be modeled.   Key words: Partial least squares regression, orthogonal projections to latent structures, logistic regression, multicollinearity, injury epidemiology, burns.

Highlights

  • The traditional regression models in classical statistics have been shown to get problematic when there are large numbers of variables and a small sample size.Multicolinearity and missing values make the situation even more complex

  • Use of the Orthogonal projections to latent structures (OPLS) model has spread to fields other than its origin but it is not yet applied to the field of epidemiology, which is a wide field of research

  • The authors successfully applied OPLS-DA to model large numbers of variables in a casecontrol study and compared it with discriminant analysis done by partial least squares regression

Read more

Summary

Introduction

The traditional regression models in classical statistics have been shown to get problematic when there are large numbers of variables and a small sample size.Multicolinearity and missing values make the situation even more complex. Multicolinearity increases standard errors of regression coefficients and decreases power, and makes it difficult to separate individual effects of predictor variables, making the regression coefficients less reliable (Dohoo et al, 1997b) Such limitations may lead to either bias or loss of power in testing hypotheses. The available partial least squares (PLS) regression is a known method of analysis to statisticians in many fields It attenuates the abovementioned problems but PLS suffers some limitations such as interpretability problems, multicomponent results and biased coefficients in some situations leading to a higher risk of overlooking real correlations (Eriksson et al, 2006a; Richard and Cramer, 1993). A newer statistical method has been introduced namely orthogonal projections to latent structures (OPLS) It is a modification of the NIPALS PLS algorithm. Later extensions of OPLS gave rise to OPLS-DA in 2005 making it appropriate for use for discriminant analysis along with prediction purposes (Bylesjo et al, 2006a)

Methods
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.