Abstract

BackgroundTo address high-dimensional genomic data, most of the proposed prediction methods make use of genomic data alone without considering clinical data, which are often available and known to have predictive value. Recent studies suggest that combining clinical and genomic information may improve predictions. We consider here methods for classification purposes that simultaneously use both types of variables but apply dimensionality reduction only to the high-dimensional genomic ones.ResultsUsing partial least squares (PLS), we propose some one-step approaches based on three extensions of the least squares (LS)-PLS method for logistic regression. A comparison of their prediction performances via a simulation and on real data sets from cancer studies is conducted.ConclusionIn general, those methods using only clinical data or only genomic data perform poorly. The advantage of using LS-PLS methods for classification and their performances are shown and then used to analyze clinical and genomic data. The corresponding prediction results are encouraging and stable regardless of the data set and/or number of selected features. These extensions have been implemented in the R package lsplsGlm to enhance their use.

Highlights

  • IntroductionWe consider here methods for classification purposes that simultaneously use both types of variables but apply dimensionality reduction only to the high-dimensional genomic ones

  • To address high-dimensional genomic data, most of the proposed prediction methods make use of genomic data alone without considering clinical data, which are often available and known to have predictive value

  • The four methods combining clinical and genomic data provide similar and significantly better misclassification rates and area under the curve (AUC) compared to those of both the Generalized linear model (GLM) and R-partial least square (PLS)

Read more

Summary

Introduction

We consider here methods for classification purposes that simultaneously use both types of variables but apply dimensionality reduction only to the high-dimensional genomic ones. We focus on binary class prediction where the outcome can be for instance alive/dead, or therapeutic success/failure Most of these studies [3,4,5,6,7] include clinical data in addition to genomic data, using most of the proposed prediction methods with only genomic data, which involves some statistical issues. Unless a preliminary step of variable selection is performed, the standard classification methods are not appropriate To address this “large p small n” problem, variable selection or dimensionality reduction methods or a combination of both can be used. An alternative method is the partial least square (PLS) method [9], which takes this link into account

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call