Abstract

Detecting binding motifs of combinatorial transcription factors (TFs) from chromatin immunoprecipitation sequencing (ChIP-seq) experiments is an important and challenging computational problem for understanding gene regulations. Although a number of motif-finding algorithms have been presented, most are either time consuming or have sub-optimal accuracy for processing large-scale datasets. In this article, we present a fully parallelized algorithm for detecting combinatorial motifs from ChIP-seq datasets by using Fisher combined method and OpenMP parallel design. Large scale validations on both synthetic data and 350 ChIP-seq datasets from the ENCODE database showed that FisherMP has not only super speeds on large datasets, but also has high accuracy when compared with multiple popular methods. By using FisherMP, we successfully detected combinatorial motifs of CTCF, YY1, MAZ, STAT3 and USF2 in chromosome X, suggesting that they are functional co-players in gene regulation and chromosomal organization. Integrative and statistical analysis of these TF-binding peaks clearly demonstrate that they are not only highly coordinated with each other, but that they are also correlated with histone modifications. FisherMP can be applied for integrative analysis of binding motifs and for predicting cis-regulatory modules from a large number of ChIP-seq datasets.

Highlights

  • In the past two decades, the motif-finding problem has been an important issue in sequence feature recognition

  • To evaluate the performance of FisherMP for finding transcription factors (TFs) motifs, we ran it on all 350 chromatin immunoprecipitation sequencing (ChIP)-seq experiments of 51 TFs that were downloaded from the ENCODE project

  • Since the predicted motifs are defined by their consensus sequences, we considered the real motif of a TF as recalled if its similarity to its predicted motifs was larger than a certain similarity threshold

Read more

Summary

Introduction

In the past two decades, the motif-finding problem has been an important issue in sequence feature recognition. A motif represents a set of binding sites recognized by a transcription factor (TF). Based on co-regulation of genes and phylogenetic footprinting of TFs, TF motifs can be discovered from a set of upstream non-coding DNA sequences of co-regulated or orthologous genes. Many motif-finding algorithms have been developed in the past two decades.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call