Abstract
BackgroundGene expression is regulated by transcription factors binding to specific target DNA sites. Understanding how and where transcription factors bind at genome scale represents an essential step toward our understanding of gene regulation networks. Previously we developed a structure-based method for prediction of transcription factor binding sites using an integrative energy function that combines a knowledge-based multibody potential and two atomic energy terms. While the method performs well, it is not computationally efficient due to the exponential increase in the number of binding sequences to be evaluated for longer binding sites. In this paper, we present an efficient pentamer algorithm by splitting DNA binding sequences into overlapping fragments along with a simplified integrative energy function for transcription factor binding site prediction.ResultsA DNA binding sequence is split into overlapping pentamers (5 base pairs) for calculating transcription factor-pentamer interaction energy. To combine the results from overlapping pentamer scores, we developed two methods, Kmer-Sum and PWM (Position Weight Matrix) stacking, for full-length binding motif prediction. Our results show that both Kmer-Sum and PWM stacking in the new pentamer approach along with a simplified integrative energy function improved transcription factor binding site prediction accuracy and dramatically reduced computation time, especially for longer binding sites.ConclusionOur new fragment-based pentamer algorithm and simplified energy function improve both efficiency and accuracy. To our knowledge, this is the first fragment-based method for structure-based transcription factor binding sites prediction.
Highlights
Gene expression is regulated by transcription factors binding to specific target DNA sites
To take advantage of the unique features of both knowledge-based and physics-based potentials for structure-based Transcription factor binding site (TFBS) prediction, we recently developed an integrative energy (IE) function that consists of three terms, a residue-level knowledge-based multibody (MB) potential, an explicit hydrogen bond (HB) energy, and an electrostatic potential for π-interaction energy
Wilcoxon Signed-rank test showed that there is no significant difference between the new IE and the original IE function when tested on the non-redundant dataset of 27 TFDNA complex structures using the full-length prediction algorithm [12]
Summary
Gene expression is regulated by transcription factors binding to specific target DNA sites. We present an efficient pentamer algorithm by splitting DNA binding sequences into overlapping fragments along with a simplified integrative energy function for transcription factor binding site prediction. Transcription factors (TFs) interact with specific DNA sequences, called transcription factor binding sites (TFBSs), to regulate gene expression [1, 2]. Genome-wide TFBS identification, a crucial step in deciphering transcription regulatory networks and annotating genomic sequences, remain a key challenge in post-genomics research. Both high-throughput experimental methods and computational approaches have been developed to tackle this problem. Each method has its unique advantages and limitations [3]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.