Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures

Isabella Zwiener,Barbara Frisch,Harald Binder

doi:10.1371/journal.pone.0085150

Abstract

Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk prediction models from RNA-Seq data. We specifically consider penalized regression techniques, such as the lasso and componentwise boosting, which can simultaneously consider all measurements and provide both, multivariable regression models for prediction and automated variable selection. However, they might be affected by the typical skewness, mean-variance-dependency or extreme values of RNA-Seq covariates and therefore could benefit from transformations of the latter. In an analytical part, we highlight preferential selection of covariates with large variances, which is problematic due to the mean-variance dependency of RNA-Seq data. In a simulation study, we compare different transformations of RNA-Seq data for potentially improving detection of important genes. Specifically, we consider standardization, the log transformation, a variance-stabilizing transformation, the Box-Cox transformation, and rank-based transformations. In addition, the prediction performance for real data from patients with kidney cancer and acute myeloid leukemia is considered. We show that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation. Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches. Generally, the results illustrate that the distribution and potential transformations of RNA-Seq data need to be considered as a critical step when building risk prediction models by penalized regression techniques.

Highlights

RNA-Seq is a relatively new approach for measuring gene expression by making use of generation sequencing technology
We focus on regularized regression techniques for building signatures from RNA-Seq data, as these simultaneously consider all RNA-Seq measurements, can provide automated selection of important genes, and have generally been a popular class of multivariable approaches for microarray gene expression data
In the following we briefly describe the prominent types of regression models where regularized regression techniques are used, namely generalized linear models and the Cox proportional hazards model

Summary

Introduction

RNA-Seq is a relatively new approach for measuring gene expression by making use of generation sequencing technology. It produces count data having low background noise and allows to detect transcripts even at low expression levels and provides a large dynamic range in terms of fold-changes [1,2]. RNA-Seq is on its way to replace the microarray technology, which has been widely used in the last decades. We focus on regularized regression techniques for building signatures from RNA-Seq data, as these simultaneously consider all RNA-Seq measurements, can provide automated selection of important genes, and have generally been a popular class of multivariable approaches for microarray gene expression data. We will consider the lasso [10] and componentwise likelihood-based boosting [11,12] as representative approaches for regularized regression with variable selection

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Jan 8, 2014
Citations: 149	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Data transformations for variance stabilization in the statistical assessment of quantitative imaging biomarkers
Qi Gong ... Frank W Samuelson
-
Qi Gong, et. al.Qi Gong ... Frank W Samuelson
04 Mar 2019
04 Mar 2019

Data Transformation in Cross-project Defect Prediction
Feng Zhang ... Iman Keivanloo
Empirical Software Engineering | VOL. 22
Feng Zhang, et. al.Feng Zhang ... Iman Keivanloo
14 Apr 2017
Empirical Software Engineering | VOL. 22

A comprehensive implementation of the log, Box-Cox and log-sinh transformations for skewed and censored precipitation data
Zeqing Huang ... Fang Yang
Journal of Hydrology | VOL. 620
Zeqing Huang, et. al.Zeqing Huang ... Fang Yang
07 Mar 2023
Journal of Hydrology | VOL. 620

Data transformations for statistical assessment of quantitative imaging biomarkers: Application to lung nodule volumetry.
Qi Gong ... Qin Li
Statistical methods in medical research | VOL. 29
Qi Gong, et. al.Qi Gong ... Qin Li
05 Mar 2020
Statistical methods in medical research | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE