Gene set analysis controlling for length bias in RNA-seq experiments

Xing Ren,Qiang Hu,Jianmin Wang,Jeffrey C Miecznikowski,Song Liu

doi:10.1186/s13040-017-0125-9

Xing Ren, Qiang Hu + Show 3 more

Open Access

https://doi.org/10.1186/s13040-017-0125-9

Copy DOI

Abstract

BackgroundIn gene set analysis, the researchers are interested in determining the gene sets that are significantly correlated with an outcome, e.g. disease status or treatment. With the rapid development of high throughput sequencing technologies, Ribonucleic acid sequencing (RNA-seq) has become an important alternative to traditional expression arrays in gene expression studies. Challenges exist in adopting the existent algorithms to RNA-seq data given the intrinsic difference of the technologies and data. In RNA-seq experiments, the measure of gene expression is correlated with gene length. This inherent correlation may cause bias in gene set analysis.ResultsWe develop SeqGSA, a new method for gene set analysis with length bias adjustment for RNA-seq data. It extends from the R package GSA designed for microarrays. Our method compares the gene set maxmean statistic against permutations, while also taking into account of the statistics of the other gene sets. To adjust for the gene length bias, we implement a flexible weighted sampling scheme in the restandardization step of our algorithm. We show our method improves the power of identifying significant gene sets that are affected by the length bias. We also show that our method maintains the type I error comparing with another representative method for gene set enrichment test.ConclusionsSeqGSA is a promising tool for testing significant gene pathways with RNA-seq data while adjusting for inherent gene length effect. It enhances the power to detect gene sets affected by the bias and maintains type I error under various situations.

Highlights

Maxmean statistic and restandardization in GSA In GSA the gene-level test statistics are first converted to z statistics using quantile functions, and the z values are aggregated into a gene-set-level maxmean statistic
The unweighted version is similar to the original GSA, except that the t test is replaced with the exact negative binomial test of edgeR, as the latter is considered a more appropriate test for Ribonucleic acid sequencing (RNA-seq) count data
We develop a gene set analysis method for RNA-seq data affected by gene length bias

Summary

Introduction

The researchers are interested in determining the gene sets that are significantly correlated with an outcome, e.g. disease status or treatment. With the rapid development of high throughput sequencing technologies, Ribonucleic acid sequencing (RNA-seq) has become an important alternative to traditional expression arrays in gene expression studies. Challenges exist in adopting the existent algorithms to RNA-seq data given the intrinsic difference of the technologies and data. In RNA-seq experiments, the measure of gene expression is correlated with gene length. This inherent correlation may cause bias in gene set analysis. Ribonucleic acid sequencing (RNA-seq) is a revolutionary tool for gene expression profiling. How to adopt the existent algorithms for expression arrays to RNA-seq data is a challenge in data analysis. Given the protocol of RNA-seq, it is reasonable to expect that a longer gene will have more counts than an expressed short gene. The length effect will cause bias in gene set analysis [1,2,3]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BioData Mining	Publication Date: Feb 6, 2017
Citations: 7	License type: open-access

R Discovery Prime

R Discovery Prime

Gene set analysis controlling for length bias in RNA-seq experiments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BioData Mining

Lead the way for us

Similar Papers

Linear combination test for gene set analysis of a continuous phenotype
Irina Dinu ... Xiaoming Wang
BMC Bioinformatics | VOL. 14
Irina Dinu, et. al.Irina Dinu ... Xiaoming Wang
01 Jul 2013
BMC Bioinformatics | VOL. 14

An application of gene set analysis for a comparison of two groups

-

01 Nov 2011
01 Nov 2011

GSA-SNP: a general approach for gene set analysis of polymorphisms
Dougu Nam ... Seon-Young Kim
Nucleic Acids Research | VOL. 38
Dougu Nam, et. al.Dougu Nam ... Seon-Young Kim
25 May 2010
Nucleic Acids Research | VOL. 38

Linear Combination Test for Hierarchical Gene Set Analysis
Xiaoming Wang ... Wei Liu
Statistical Applications in Genetics and Molecular Biology | VOL. 10
Xiaoming Wang, et. al.Xiaoming Wang ... Wei Liu
01 Jan 2010
Statistical Applications in Genetics and Molecular Biology | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gene set analysis controlling for length bias in RNA-seq experiments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BioData Mining