Abstract

The fidelity of the folding pathways being encoded in the amino acid sequence is met with challenge in instances where proteins with no sequence homology, performing different functions and no apparent evolutionary linkage, adopt a similar fold. The problem stated otherwise is that a limited fold space is available to a repertoire of diverse sequences. The key question is what factors lead to the formation of a fold from diverse sequences. Here, with the NAD(P)-binding Rossmann fold domains as a case study and using the concepts of network theory, we have unveiled the consensus structural features that drive the formation of this fold. We have proposed a graph theoretic formalism to capture the structural details in terms of the conserved atomic interactions in global milieu, and hence extract the essential topological features from diverse sequences. A unified mathematical representation of the different structures together with a judicious concoction of several network parameters enabled us to probe into the structural features driving the adoption of the NAD(P)-binding Rossmann fold. The atomic interactions at key positions seem to be better conserved in proteins, as compared to the residues participating in these interactions. We propose a “spatial motif” and several “fold specific hot spots” that form the signature structural blueprints of the NAD(P)-binding Rossmann fold domain. Excellent agreement of our data with previous experimental and theoretical studies validates the robustness and validity of the approach. Additionally, comparison of our results with statistical coupling analysis (SCA) provides further support. The methodology proposed here is general and can be applied to similar problems of interest.

Highlights

  • The relationship between protein sequence-structure and its associated function has been an oft visited subject in biological literature [1,2]

  • We have examined the 84 structures in our dataset using a fold specific representation (called the fold-specific Combined Adjacency Matrix (f-CAM)) of the atomic interactions, at the side chain level, in the global milieu

  • The Fold-specific Combined Adjacency Matrix (f-CAM) is a combined representation of the length normalized individual adjacency matrices in the dataset (comprised of 8 different families that take up the NAD(P)-binding Rossmann fold domains according to the SCOP classification) and represents the spatially conserved interactions within the members of this fold/superfamily

Read more

Summary

Introduction

The relationship between protein sequence-structure and its associated function has been an oft visited subject in biological literature [1,2]. The three-dimensional structure of proteins is proposed to be encoded in its amino acid sequence, which enables them to rapidly fold into a unique structure aptly suited for its function [3]. With the large amount of sequence and structural data being available in the literature, it has been realized that only a limited structural space is available to the enormous repertoire of protein sequences [4,5,6,7]. It has been proposed that the optimization in backbone packing drives the selection of a limited number of folds for a diverse sequence space [11,12]. Insights into the interdependence between sequencestructure-function in proteins and their divergence can be obtained from a thorough and rigorous study of the available fold space, and detection of conserved structural features for a given fold from a global perspective [1]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call