Abstract
BackgroundTo address high-dimensional genomic data, most of the proposed prediction methods make use of genomic data alone without considering clinical data, which are often available and known to have predictive value. Recent studies suggest that combining clinical and genomic information may improve predictions. We consider here methods for classification purposes that simultaneously use both types of variables but apply dimensionality reduction only to the high-dimensional genomic ones.ResultsUsing partial least squares (PLS), we propose some one-step approaches based on three extensions of the least squares (LS)-PLS method for logistic regression. A comparison of their prediction performances via a simulation and on real data sets from cancer studies is conducted.ConclusionIn general, those methods using only clinical data or only genomic data perform poorly. The advantage of using LS-PLS methods for classification and their performances are shown and then used to analyze clinical and genomic data. The corresponding prediction results are encouraging and stable regardless of the data set and/or number of selected features. These extensions have been implemented in the R package lsplsGlm to enhance their use.
Highlights
IntroductionWe consider here methods for classification purposes that simultaneously use both types of variables but apply dimensionality reduction only to the high-dimensional genomic ones
To address high-dimensional genomic data, most of the proposed prediction methods make use of genomic data alone without considering clinical data, which are often available and known to have predictive value
The four methods combining clinical and genomic data provide similar and significantly better misclassification rates and area under the curve (AUC) compared to those of both the Generalized linear model (GLM) and R-partial least square (PLS)
Summary
We consider here methods for classification purposes that simultaneously use both types of variables but apply dimensionality reduction only to the high-dimensional genomic ones. We focus on binary class prediction where the outcome can be for instance alive/dead, or therapeutic success/failure Most of these studies [3,4,5,6,7] include clinical data in addition to genomic data, using most of the proposed prediction methods with only genomic data, which involves some statistical issues. Unless a preliminary step of variable selection is performed, the standard classification methods are not appropriate To address this “large p small n” problem, variable selection or dimensionality reduction methods or a combination of both can be used. An alternative method is the partial least square (PLS) method [9], which takes this link into account
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.