Abstract

Motivation: Gene set enrichment analysis is a widely accepted expression analysis tool which aims at detecting coordinated expression change within a pre-defined gene sets rather than individual genes. The benefit of gene set analysis over individual differentially expressed (DE) gene analysis includes more reproducible and interpretable results and detecting small but consistent change among gene set which could not be detected by DE gene analysis. There have been many successful gene set analysis applications in human diseases. However, when the sample size of a disease study is small and no other public data sets of the same disease are available, it will lead to lack of power to detect pathways of importance to the disease.Results: We have developed a novel joint gene set analysis statistical framework which aims at improving the power of identifying enriched gene sets through integrating multiple similar disease data sets. Through comprehensive simulation studies, we demonstrated that our proposed frameworks obtained much better AUC scores than single data set analysis and another meta-analysis method in identification of enriched pathways. When applied to two real data sets, the proposed framework could retain the enriched gene sets identified by single data set analysis and exclusively obtained up to 200% more disease-related gene sets demonstrating the improved identification power through information shared between similar diseases. We expect that the proposed framework would enable researchers to better explore public data sets when the sample size of their study is limited.

Highlights

  • High-throughput technology like microarray and next-generation sequencing (NGS) allows researchers measure the expression levels of thousands of genes or microRNAs in one sample simultaneously

  • We proposed two novel joint gene set enrichment analysis frameworks: JointNormKS and JointPathway aiming at borrowing shared information across similar disease from gene-level and pathway-level, respectively

  • Compared our previously developed joint gene analysis framework, the framework proposed here focused on pathway-level detection and demonstrated that assumption of similar disease sharing similar pathways is valid

Read more

Summary

Introduction

High-throughput technology like microarray and next-generation sequencing (NGS) allows researchers measure the expression levels of thousands of genes or microRNAs in one sample simultaneously. These high-throughput genomic data have enabled researchers to better identification of disease related genes and pathways (Gu et al, 2014, 2017; Zheng et al, 2015, 2016; Liu et al, 2016, 2017, 2018; Gong et al, 2018). Zhao et al combined gene set enrichment analysis information and microRNA target gene sets to identify cancer-related microRNAs (Zhao et al, 2014). Identifying the enriched gene set will provide crucial information of molecular functions and mechanisms underlying different diseases

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.