Abstract

Deciphering gene regulatory networks requires identification of gene expression modules. We describe a novel bottom-up approach to identify gene modules regulated by cis-regulatory motifs from a human gene co-expression network. Target genes of a cis-regulatory motif were identified from the network via the motif’s enrichment or biased distribution towards transcription start sites in the promoters of co-expressed genes. A gene sub-network containing the target genes was extracted and used to derive gene modules. The analysis revealed known and novel gene modules regulated by the NF-Y motif. The binding of NF-Y proteins to these modules’ gene promoters were verified using ENCODE ChIP-Seq data. The analyses also identified 8,048 Sp1 motif target genes, interestingly many of which were not detected by ENCODE ChIP-Seq. These target genes assemble into house-keeping, tissues-specific developmental, and immune response modules. Integration of Sp1 modules with genomic and epigenomic data indicates epigenetic control of Sp1 targets’ expression in a cell/tissue specific manner. Finally, known and novel target genes and modules regulated by the YY1, RFX1, IRF1, and 34 other motifs were also identified. The study described here provides a valuable resource to understand transcriptional regulation of various human developmental, disease, or immunity pathways.

Highlights

  • Human gene expression in various tissues, during development, or under diverse environmental conditions has been cataloged systematically in NCBI GEO or ArrayExpress databases

  • We constructed a gene co-expression network for 19,718 human genes based on the graphical Gaussian model (GGM)[14, 15] using Affymetrix U133 Plus 2.0 microarray data deposited in the ArrayExpress database[13]

  • More importantly, motif position bias methods, many target genes were identified with high confidence for well-studied nuclear factor Y (NF-Y), specificity protein 1 (Sp1), and other TF motifs

Read more

Summary

Introduction

Human gene expression in various tissues, during development, or under diverse environmental conditions has been cataloged systematically in NCBI GEO or ArrayExpress databases These large datasets have been used to generate gene co-expression networks, in which genes with similar expression patterns were connected[6]. These networks effectively group genes with similar functions or functioning in the same processes, and have been used to analyze the transcriptome of the human brain, primary cell lines, and various tissues This advanced the identification of, for example, specific molecular pathways in autism and amyotrophic lateral sclerosis[7,8,9,10,11]. The modules enabled integrating various genomic/epigenomic data into a coherent regulatory system, providing a valuable resource to identify transcriptional regulators for various human developmental, disease, or immunity pathways

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call