Abstract
DNA-binding proteins such as transcription factors use DNA-binding domains (DBDs) to bind to specific sequences in the genome to initiate many important biological functions. Accurate prediction of such target sequences, often represented by position weight matrices (PWMs), is an important step to understand many biological processes. Recent studies have shown that knowledge-based potential functions can be applied on protein-DNA co-crystallized structures to generate PWMs that are considerably consistent with experimental data. However, this success has not been extended to DNA-binding proteins lacking co-crystallized structures. This study aims at investigating the possibility of predicting the DNA sequences bound by DNA-binding proteins from the proteins' unbound structures (structures of the unbound state). Given an unbound query protein and a template complex, the proposed method first employs structure alignment to generate synthetic protein-DNA complexes for the query protein. Once a complex is available, an atomic-level knowledge-based potential function is employed to predict PWMs characterizing the sequences to which the query protein can bind. The evaluation of the proposed method is based on seven DNA-binding proteins, which have structures of both DNA-bound and unbound forms for prediction as well as annotated PWMs for validation. Since this work is the first attempt to predict target sequences of DNA-binding proteins from their unbound structures, three types of structural variations that presumably influence the prediction accuracy were examined and discussed. Based on the analyses conducted in this study, the conformational change of proteins upon binding DNA was shown to be the key factor. This study sheds light on the challenge of predicting the target DNA sequences of a protein lacking co-crystallized structures, which encourages more efforts on the structure alignment-based approaches in addition to docking- and homology modeling-based approaches for generating synthetic complexes.
Highlights
DNA-binding proteins are important to many biological processes in organisms
DNA sequences that can be recognized by the same DNA-binding domains (DBDs) are usually characterized by a probabilistic model, called position weight matrix (PWM), to accommodate variability in sequences of transcription factors (TFs)-binding sites
If the structure pair with the best root-mean-square deviations (RMSDs) is chosen to investigate the conformational changes of a protein upon binding DNA, we found that ratios of proteins which underwent secondary structures (SSE) and D2O transitions dropped to 13.8% and 39.4%, respectively
Summary
DNA-binding proteins are important to many biological processes in organisms. For example, transcription factors (TFs) activate or repress gene expression by using their DNA-binding domains (DBDs) to recognize specific nucleotide sequences in the genome. With the profile representation of TF binding sites (TFBSs), researchers can discover novel target genes regulated by known TFs. accurate prediction of such target DNA sequences for DNAbinding proteins is an important step to understand many biological processes [1,2,3]. The most widely used technique of PWM inference for a TF is to collect a set of promoter sequences of genes known to be regulated by the TF and detect frequently observed (overrepresented) subsequences from the collection [4,5,6,7,8] Such methods require sufficient sequences for pattern discovery, which are currently only available for a small amount of DNA-binding proteins. When the interaction involves multiple proteins, sequence-based approaches cannot tell how many DBDs are required to interact with DNA
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.