Abstract

RNA-Seq is quickly becoming the preferred method for comprehensively characterizing whole transcriptome activity, and the analysis of count data from RNA-Seq requires new computational tools. We developed GSAASeqSP, a novel toolset for genome-wide gene set association analysis of sequence count data. This toolset offers a variety of statistical procedures via combinations of multiple gene-level and gene set-level statistics, each having their own strengths under different sample and experimental conditions. These methods can be employed independently, or results generated from multiple or all methods can be integrated to determine more robust profiles of significantly altered biological pathways. Using simulations, we demonstrate the ability of these methods to identify association signals and to measure the strength of the association. We show that GSAASeqSP analyses of RNA-Seq data from diverse tissue samples provide meaningful insights into the biological mechanisms that differentiate these samples. GSAASeqSP is a powerful platform for investigating molecular underpinnings of complex traits and diseases arising from differential activity within the biological pathways. GSAASeqSP is available at http://gsaa.unc.edu.

Highlights

  • RNA-Seq is quickly becoming the preferred method for comprehensively characterizing whole transcriptome activity, and the analysis of count data from RNA-Seq requires new computational tools

  • GSAASeqSP employs a multi-layer statistical framework that consists of two key steps, illustrated in Figure 1: (1) differential expression analysis of individual genes between two phenotypic groups; and (2) gene set association analysis based on differential gene activity

  • We have evaluated three gene-level statistics for differential expression analysis: Signal2Noise, log2Ratio, and Signal2Noise_log2Ratio, and ten gene set-level statistics for gene set association analysis: Weighted_KS, L2Norm, Mean, WeightedSigRatio, SigRatio, GeometricMean, TruncatedProduct, FisherMethod, MinP, and RankSum

Read more

Summary

Introduction

RNA-Seq is quickly becoming the preferred method for comprehensively characterizing whole transcriptome activity, and the analysis of count data from RNA-Seq requires new computational tools. Identifying biological pathways with differential activity between phenotypically distinct samples is a powerful way to uncover molecular mechanisms underlying complex traits, diseases, and diverse cell types. Towards this end, we previously developed GSAA1 (Gene Set Association Analysis) that identifies differentially expressed pathways through the integration of microarray gene expression and single nucleotide polymorphism (SNP) data. A variety of alternative statistical and computational methods have been developed as well such as GSEA2, SAM-GS3, PAGE4, GAGE5, T-profiler[6], GT7, AGT8, and GLAPA9 These programs, including GSAA, can only evaluate differential activity of pathways using real-valued data from microarrays, but not count data from RNA-seq. These modules include different sets of analytical methods and allow for the analysis of different types of transcriptomics data and genomics data (see Supplementary Table S1 for a description of each)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call