DeepD2V: A Novel Deep Learning-Based Framework for Predicting Transcription Factor Binding Sites from Combined DNA Sequence.

Lei Deng,Hui Liu,Xuejun Liu,Hui Wu

doi:10.3390/ijms22115521

Abstract

Predicting in vivo protein–DNA binding sites is a challenging but pressing task in a variety of fields like drug design and development. Most promoters contain a number of transcription factor (TF) binding sites, but only a small minority has been identified by biochemical experiments that are time-consuming and laborious. To tackle this challenge, many computational methods have been proposed to predict TF binding sites from DNA sequence. Although previous methods have achieved remarkable performance in the prediction of protein–DNA interactions, there is still considerable room for improvement. In this paper, we present a hybrid deep learning framework, termed DeepD2V, for transcription factor binding sites prediction. First, we construct the input matrix with an original DNA sequence and its three kinds of variant sequences, including its inverse, complementary, and complementary inverse sequence. A sliding window of size k with a specific stride is used to obtain its k-mer representation of input sequences. Next, we use word2vec to obtain a pre-trained k-mer word distributed representation model. Finally, the probability of protein–DNA binding is predicted by using the recurrent and convolutional neural network. The experiment results on 50 public ChIP-seq benchmark datasets demonstrate the superior performance and robustness of DeepD2V. Moreover, we verify that the performance of DeepD2V using word2vec-based k-mer distributed representation is better than one-hot encoding, and the integrated framework of both convolutional neural network (CNN) and bidirectional LSTM (bi-LSTM) outperforms CNN or the bi-LSTM model when used alone. The source code of DeepD2V is available at the github repository.

Highlights

Transcription factor (TF) is a type of protein that controls the activity of genes, through binding to the upstream regulatory elements located in the promoter and enhancer regions
DeepD2V was compared to other simplified models with only convolutional neural network (CNN), recurrent neural network (RNN) or Bi-Long Short-Term Memory (LSTM) modules alone
These results consistently indicate that DeepD2V performs superior to DeepBind, DanQ, and WSCNNLSTM

Summary

Introduction

Transcription factor (TF) is a type of protein that controls the activity of genes, through binding to the upstream regulatory elements located in the promoter and enhancer regions. During the past few decades, with the advancement of high-throughput sequencing technology, a few experimental methods, such as Chromatin Immunoprecipitationsequence (ChIP-seq), have been developed to identify protein–DNA binding sites [3,4,5]. The amount of available protein–DNA binding sites increases rapidly, the DNA sequences extracted directly from ChIP-seq are still noisy [6]. Computational methods have been developed to predict protein–DNA binding sites. These methods can be roughly classified into conventional [7,8,9,10,11] and deep-learning algorithms [12,13,14,15,16,17,18,19,20,21]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International journal of molecular sciences	Publication Date: May 24, 2021
Citations: 19	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

DeepD2V: A Novel Deep Learning-Based Framework for Predicting Transcription Factor Binding Sites from Combined DNA Sequence.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International journal of molecular sciences

Lead the way for us

Similar Papers

Decision letter: Promoter sequence and architecture determine expression variability and confer robustness to genetic variants
George H Perry
-
George H PerryGeorge H Perry
07 Sep 2022
07 Sep 2022

Author response: Promoter sequence and architecture determine expression variability and confer robustness to genetic variants
Hjörleifur Einarsson ... Marco Salvatore
-
Hjörleifur Einarsson, et. al.Hjörleifur Einarsson ... Marco Salvatore
03 Nov 2022
03 Nov 2022

Recurrent Neural Network for Predicting Transcription Factor Binding Sites
Zhen Shen ... Wenzheng Bao
Scientific Reports | VOL. 8
Zhen Shen, et. al.Zhen Shen ... Wenzheng Bao
15 Oct 2018
Scientific Reports | VOL. 8

Deep learning model for predicting genetic diseases using DNA sequence data
Sana Tariq ... Asjad Amin
Journal of Intelligent & Fuzzy Systems | VOL. -
Sana Tariq, et. al.Sana Tariq ... Asjad Amin
23 Apr 2024
Journal of Intelligent & Fuzzy Systems | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DeepD2V: A Novel Deep Learning-Based Framework for Predicting Transcription Factor Binding Sites from Combined DNA Sequence.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International journal of molecular sciences