Abstract

Nucleotides are involved in several cellular processes, ranging from the transmission of genetic information, to energy transfer and storage. Both sequence and structure based methods have been developed to predict the location of nucleotide-binding sites in proteins. Here we propose a novel methodology that leverages the observation that nucleotide-binding sites have a modular structure. Nucleotides are composed of identifiable fragments, i.e. the phosphate, the nucleobase and the carbohydrate moieties. These fragments are bound by specific structural motifs that recur in proteins of different fold. Moreover these motifs behave as modules and are found in different combinations across fold space. Our method predicts binding sites for each nucleotide fragment by comparing a query protein with a database of templates extracted from proteins of known structure. Whenever a similarity is found the fragment bound by the template is transferred on the query protein, thus identifying a putative binding site. Predictions falling inside the surface of the protein are discarded, and the remaining ones are scored using clustering and conservation. The method is able to rank as first a correct prediction in the 48%, 48% and 68% of the analyzed proteins for the nucleobase, carbohydrate and phosphate respectively, while considering the first five predictions the performances change to 71%, 65% and 86% respectively. Furthermore we attempted to reconstruct the full structure of the binding site, starting from the predicted positions of the fragments. We calculated that in the 59% of the analyzed proteins the method ranks as first a reconstructed binding site or a part of it. Finally we tested the reliability of our method in a real world case in which it has to predict nucleotide-binding sites in unbound proteins. We analyzed proteins whose structure has been solved with and without the nucleotide and observed only little variations in the method performance.

Highlights

  • Nucleotides are ubiquitous molecules in the cellular environment and they are involved in key cellular processes

  • Overview We developed a method for the identification of binding sites for nucleotide molecules on protein structures

  • Results on the sc-Protein Data Bank (PDB) Dataset The first analyzed dataset is composed of 924 protein structures binding AMP, ADP, ATP, GDP, GTP, ANP, GNP, flavine-adenine dinucleotide (FAD), flavin mononucleotide (FMN), nicotine-adenine dinucleotide (NAD) and NAP

Read more

Summary

Introduction

Nucleotides are ubiquitous molecules in the cellular environment and they are involved in key cellular processes. They store and transfer energy and serve as building block of nucleic acids and enzyme cofactors. Nucleotides were one of the earliest cofactors to be bound by proteins [1]. Nucleotide-binding folds, such as the Rossmann-type [2] and the P-loop containing nucleotide hydrolases folds [3], are ancient and widespread. Several nucleotide-binding site prediction methods have been developed, relying both on sequence and structural information. Sequence-based methods use machine learning techniques to identify nucleotide-binding residues based on characteristics such as conservation or structural features of residues, like hydrophobicity, solvent accessibility or net charge

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call