Abstract
Accurately identifying the binding sites of transcription factors (TFs) is crucial to understanding the mechanisms of transcriptional regulation and human disease. We present incorporating Find Occurrence of Regulatory Motifs (iFORM), an easy-to-use and efficient tool for scanning DNA sequences with TF motifs described as position weight matrices (PWMs). Both performance assessment with a receiver operating characteristic (ROC) curve and a correlation-based approach demonstrated that iFORM achieves higher accuracy and sensitivity by integrating five classical motif discovery programs using Fisher’s combined probability test. We have used iFORM to provide accurate results on a variety of data in the ENCODE Project and the NIH Roadmap Epigenomics Project, and the tool has demonstrated its utility in further elucidating individual roles of functional elements. Both the source and binary codes for iFORM can be freely accessed at https://github.com/wenjiegroup/iFORM. The identified TF binding sites across human cell and tissue types using iFORM have been deposited in the Gene Expression Omnibus under the accession ID GSE53962.
Highlights
Gene regulation is co-ordinately regulated by interactions of many transcription factors (TFs), many of which bind promoter and enhancer DNA preferentially at characteristic sequence ‘motifs’
IFORM integrated the motif instances identified by the five classical algorithms, FIMO [1], Consensus [2, 3], STORM [4], RSAT [5] and HOMER [6], based on Fisher’s method
We extracted the core source code of the motif discovery functions of these five algorithms and integrated them into the framework of incorporating Find Occurrence of Regulatory Motifs (iFORM) instead of combining the resulting p-values obtained from these algorithms
Summary
Gene regulation is co-ordinately regulated by interactions of many transcription factors (TFs), many of which bind promoter and enhancer DNA preferentially at characteristic sequence ‘motifs’. Motifs are short patterns described as position weight matrices (PWMs) that tend to be conserved by purifying selection. Identifying and understanding these TF motifs can provide critical insight into the mechanisms of transcriptional regulation and human disease. S1 Table summarizes the features of iFORM and these five algorithms as motif scanners. Each of these methods has its own merits for identifying potential TF binding; it is still a major challenge to integrate superiorities and to preclude
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.