Abstract

Gene set testing problem has become the focus of microarray data analysis. A gene set is a group of genes that are defined by a priori biological knowledge. Several statistical methods have been proposed to determine whether functional gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to analyzing the dependence structure among gene sets. In this study, we have proposed a novel statistical method of gene set association analysis to identify significantly associated gene sets using the coefficient of intrinsic dependence. The simulation studies show that the proposed method outperforms the conventional methods to detect general forms of association in terms of control of type I error and power. The correlation of intrinsic dependence has been applied to a breast cancer microarray dataset to quantify the un-supervised relationship between two sets of genes in the tumor and non-tumor samples. It was observed that the existence of gene-set association differed across various clinical cohorts. In addition, a supervised learning was employed to illustrate how gene sets, in signaling transduction pathways or subnetworks regulated by a set of transcription factors, can be discovered using microarray data. In conclusion, the coefficient of intrinsic dependence provides a powerful tool for detecting general types of association. Hence, it can be useful to associate gene sets using microarray expression data. Through connecting relevant gene sets, our approach has the potential to reveal underlying associations by drawing a statistically relevant network in a given population, and it can also be used to complement the conventional gene set analysis.

Highlights

  • The interactions of genes usually take place in the signaling pathways, networks, or other biological systems

  • Simulation results In the simulation study, we explore the performance of our proposed method in identifying the enriched correlation between two gene sets through observing their mRNA expression levels

  • They were the coefficient of intrinsic dependence (CID), the canonical correlation (CanCor), the projection pursuit regression (PPR), the Kullback-Leibler distance (KLD) and the Hellinger distance (HD)

Read more

Summary

Introduction

The interactions of genes usually take place in the signaling pathways, networks, or other biological systems. The interactions between or among multi-dimensional gene sets in a given biological system have been demonstrated in a functional network [1,2,3,4,5,6]. If the expression levels of a gene set are significantly associated with the clinical outcomes/phenotypes, we can say that this gene set is ‘differentially expressed’. Many statistical approaches, such as gene set enrichment analysis (GSEA) methods [7,8], are used to determine whether functional gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. Readers are referred to [9] for the review of current GSEA algorithms

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.