Abstract

Large sets of genomic regions are generated by the initial analysis of various genome-wide sequencing data, such as ChIP-seq and ATAC-seq experiments. Gene set enrichment (GSE) methods are commonly employed to determine the pathways associated with them. Given the pathways and other gene sets (e.g., GO terms) of significance, it is of great interest to know the extent to which each is driven by binding near transcription start sites (TSS) or near enhancers. Currently, no tool performs such an analysis. Here, we present a method that addresses this question to complement GSE methods for genomic regions. Specifically, the new method tests whether the genomic regions in a gene set are significantly closer to a TSS (or to an enhancer) than expected by chance given the total list of genomic regions, using a non-parametric test. Combining the results from a GSE test with our novel method provides additional information regarding the mode of regulation of each pathway, and additional evidence that the pathway is truly enriched. We illustrate our new method with a large set of ENCODE ChIP-seq data, using the chipenrich Bioconductor package. The results show that our method is a powerful complementary approach to help researchers interpret large sets of genomic regions.

Highlights

  • Cell development and differentiation depend on complex gene expression patterns which are precisely and spatiotemporally controlled

  • We developed a new method, Proximity Regulation (ProxReg), to test the proximity of peaks to transcription start sites (TSS) or enhancers in a gene set of interest

  • We firstly measure the distances from the midpoints of the peaks to the nearest regulatory regions, and assign each peak to its target gene according to the gene with the nearest TSS (NTSS) (Welch et al, 2014)

Read more

Summary

Introduction

Cell development and differentiation depend on complex gene expression patterns which are precisely and spatiotemporally controlled. DNA to functional elements, and this process is regulated by many cis-regulatory elements across the genome (Wittkopp and Kalay, 2011). Cis-regulatory elements include promoters, enhancers, silencers, and insulators, with promoters and enhancers being two important ones that can initiate transcription and are the most well-studied (Andersson, 2015). Both promoters and enhancers are regions of DNA sequences that typically are a few hundred base pairs in length (Nguyen et al, 2016). Some TFs such as ESR1 bind to different sets of target genes in a cell type specific manner (Gertz et al, 2012), resulting in complex and dynamic TF regulatory programs. Deciphering the rules of TF binding events is a key step to understanding gene expression patterns and associated biological pathways

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call