Abstract

The expression of eukaryotic genes is regulated by cis-regulatory elements such as promoters and enhancers, which bind sequence-specific DNA-binding proteins. One of the great challenges in the gene regulation field is to characterise these elements. This involves the identification of transcription factor (TF) binding sites within regulatory elements that are occupied in a defined regulatory context. Digestion with DNase and the subsequent analysis of regions protected from cleavage (DNase footprinting) has for many years been used to identify specific binding sites occupied by TFs at individual cis-elements with high resolution. This methodology has recently been adapted for high-throughput sequencing (DNase-seq). In this study, we describe an imbalance in the DNA strand-specific alignment information of DNase-seq data surrounding protein–DNA interactions that allows accurate prediction of occupied TF binding sites. Our study introduces a novel algorithm, Wellington, which considers the imbalance in this strand-specific information to efficiently identify DNA footprints. This algorithm significantly enhances specificity by reducing the proportion of false positives and requires significantly fewer predictions than previously reported methods to recapitulate an equal amount of ChIP-seq data. We also provide an open-source software package, pyDNase, which implements the Wellington algorithm to interface with DNase-seq data and expedite analyses.

Highlights

  • The correct tissue-specific and temporal function of the genome is tightly controlled by transcription factors (TFs) that recognise specific DNA sequences and regulate the expression of specific genes

  • They do not act as single molecules but interact with each other to form large multi-protein assemblies that act as platforms for the recruitment of members of the epigenetic regulatory machinery [1,2]

  • Previous studies have shown a direct link between the sequence as well as tissue specificity of a number of TFs and gene expression patterns [3,4], the mechanisms behind how defined DNA sequences and the assembly of TF complexes translate into global gene expression patterns remains to be fully understood

Read more

Summary

Introduction

The correct tissue-specific and temporal function of the genome is tightly controlled by transcription factors (TFs) that recognise specific DNA sequences and regulate the expression of specific genes. They do not act as single molecules but interact with each other to form large multi-protein assemblies that act as platforms for the recruitment of members of the epigenetic regulatory machinery [1,2]. One of the significant challenges facing gene regulation studies is the identification of sites where TFs are bound to specific genes in a specific regulatory context.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.