WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data

Hongbo Zhang,De-Shuang Huang,Lin Zhu

doi:10.1038/s41598-017-03554-7

Hongbo Zhang, De-Shuang Huang + Show 1 more

Open Access

https://doi.org/10.1038/s41598-017-03554-7

Copy DOI

Journal: Scientific Reports	Publication Date: Jun 12, 2017
Citations: 22	License type: open-access

Affiliation: Tongji University

Abstract

Although discriminative motif discovery (DMD) methods are promising for eliciting motifs from high-throughput experimental data, due to consideration of computational expense, most of existing DMD methods have to choose approximate schemes that greatly restrict the search space, leading to significant loss of predictive accuracy. In this paper, we propose Weakly-Supervised Motif Discovery (WSMD) to discover motifs from ChIP-seq datasets. In contrast to the learning strategies adopted by previous DMD methods, WSMD allows a “global” optimization scheme of the motif parameters in continuous space, thereby reducing the information loss of model representation and improving the quality of resultant motifs. Meanwhile, by exploiting the connection between DMD framework and existing weakly supervised learning (WSL) technologies, we also present highly scalable learning strategies for the proposed method. The experimental results on both real ChIP-seq datasets and synthetic datasets show that WSMD substantially outperforms former DMD methods (including DREME, HOMER, XXmotif, motifRG and DECOD) in terms of predictive accuracy, while also achieving a competitive computational speed.

Highlights

As the main regulators of transcription process, transcription factors (TFs) can modulate gene expression by binding to special DNA regions, which are known as TF binding sites (TFBS)
ChIP-Seq brings two challenges for motif discovery methods: (i) The enormous amount of potential TF binding regions yielded from a single experiment requires highly scalable motif discovery (MD) tools[8, 9]; (ii) The large quantities of datasets increase the possibility of finding multiple enriched sequence features, and most of them may either be false positives or not directly related to the problem of interest, which make it necessary for MD tools to be capable of understanding the nature of motif signals and determining the relevant ones[10,11,12]
We pointed out the inherent similarities between discriminative motif discovery (DMD) and object detection (OD) and thereby proposed a novel method for identifying motifs from ChIP-seq datasets

Summary

Introduction

As the main regulators of transcription process, transcription factors (TFs) can modulate gene expression by binding to special DNA regions, which are known as TF binding sites (TFBS). Previous researches have concluded that TFs are relatively conserved in the long-term evolution, and are inclined to bind to DNA sequences that follow specific patterns, which are commonly called TFBS motifs[1,2,3] Recognition of these motifs is fundamental for further understanding of the regulatory mechanisms, and is still a challenging task in computational biology[4, 5]. Objectives of DMDs are generally nonconvex, nondifferentiable, and even discontinuous, and are difficult to optimize To circumvent such difficulties and improve scalability, current DMD methods typically do not search for motif directly over the complete parameter space, but instead adopt approximate schemes that could sacrifice both accuracy and expressive power. The motifs learned by DREME19 and motifRG18 are limited to the discrete IUPAC space, while HOMER20 and XXmotif[21] choose to refine motifs by only tuning some external parameters

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Soft-bag based motif discovery for ChIP-seq datasets
Hongbo Zhang ... De-Shuang Huang
-
Hongbo Zhang, et. al.Hongbo Zhang ... De-Shuang Huang
01 Nov 2017
01 Nov 2017

Predicting TF-DNA Binding Motifs from ChIP-seq Datasets Using the Bag-Based Classifier Combined With a Multi-Fold Learning Scheme
Qinhu Zhang ... Dailun Wang
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 18
Qinhu Zhang, et. al.Qinhu Zhang ... Dailun Wang
18 Sep 2020
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 18

Probabilistic Models for Semisupervised Discriminative Motif Discovery in DNA Sequences
Jong Kyoung Kim ... Seungjin Choi
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 8
Jong Kyoung Kim, et. al. Jong Kyoung Kim ... Seungjin Choi
01 Sep 2011
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 8

LMMO: A Large Margin Approach for Refining Regulatory Motifs
Lin Zhu ... De-Shuang Huang
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 15
Lin Zhu, et. al.Lin Zhu ... De-Shuang Huang
05 Apr 2017
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports