Abstract

BackgroundMotif enrichment analysis (MEA) identifies over-represented transcription factor binding (TF) motifs in the DNA sequence of regulatory regions, enabling researchers to infer which transcription factors can regulate transcriptional response to a stimulus, or identify sequence features found near a target protein in a ChIP-seq experiment. Score-based MEA determines motifs enriched in regions exhibiting extreme differences in regulatory activity, but existing methods do not control for biases in GC content or dinucleotide composition. This lack of control for sequence bias, such as those often found in CpG islands, can obscure the enrichment of biologically relevant motifs.ResultsWe developed Motif Enrichment In Ranked Lists of Peaks (MEIRLOP), a novel MEA method that determines enrichment of TF binding motifs in a list of scored regulatory regions, while controlling for sequence bias. In this study, we compare MEIRLOP against other MEA methods in identifying binding motifs found enriched in differentially active regulatory regions after interferon-beta stimulus, finding that using logistic regression and covariates improves the ability to call enrichment of ISGF3 binding motifs from differential acetylation ChIP-seq data compared to other methods. Our method achieves similar or better performance compared to other methods when quantifying the enrichment of TF binding motifs from ENCODE TF ChIP-seq datasets. We also demonstrate how MEIRLOP is broadly applicable to the analysis of numerous types of NGS assays and experimental designs.ConclusionsOur results demonstrate the importance of controlling for sequence bias when accurately identifying enriched DNA sequence motifs using score-based MEA. MEIRLOP is available for download from https://github.com/npdeloss/meirlop under the MIT license.

Highlights

  • Motif enrichment analysis (MEA) identifies over-represented transcription factor binding (TF) motifs in the DNA sequence of regulatory regions, enabling researchers to infer which transcription factors can regulate transcriptional response to a stimulus, or identify sequence features found near a target protein in a Chromatin Immunoprecipitation (ChIP)-seq experiment

  • Motif Enrichment In Ranked Lists of Peaks (MEIRLOP) uses covariates to accurately call enrichment of relevant TF motifs To show the utility of our method, we performed a differential motif enrichment analysis of regulatory elements modulated by interferon beta (IFN-β) treatment in HCT116 cells as measured by Histone H3 lysine acetylation (H3K27ac) ChIP-seq

  • IFN-β treatment stimulates the type I interferon pathway, which leads to the activation of Signal Transducer and Activator of Transcription (STAT) and Interferon Regulatory Factor (IRF) family transcription factors to stimulate the expression of genes with antiviral activity [42]

Read more

Summary

Introduction

Motif enrichment analysis (MEA) identifies over-represented transcription factor binding (TF) motifs in the DNA sequence of regulatory regions, enabling researchers to infer which transcription factors can regulate transcriptional response to a stimulus, or identify sequence features found near a target protein in a ChIP-seq experiment. Score-based MEA determines motifs enriched in regions exhibiting extreme differences in regulatory activity, but existing methods do not control for biases in GC content or dinucleotide composition. MEA on differential H3K27ac ChIP-seq data can reveal which motifs and TFs regulate transcription in regions that change their activity in response to stimulation [8] Researchers typically filter these regulatory regions by a score threshold and place them into sets to yield contrasting categories (e.g., regulatory regions with higher activation levels after stimulation vs those with lower activation levels after stimulation) (Fig. 1a,b). Motif scanners detect motifs in sequences within those categories, followed by set enrichment tests (e.g. the Fisher exact test) which determine the overrepresentation of motifs in each category This allows the imputation of motifs and transcription factors that influence transcriptional response in those conditions (Fig. 1b). We term this process set-based MEA, for its thresholding of sequences into categorical sets

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call