Abstract

Proteins are grouped into various families according to their evolutionary origin. Analyzing such types of families based on their inter residue interactions is crucial because algorithms that search for pair wise homologies can miss important relations and produce false hits. Several statistical models have been created to aid in the classification but so far had only partial success. In this work, we have analyzed the variation of long-range contacts in different bin intervals as well as characterized the long-range order in a set of 37 families of homologous proteins belonging to different structural classes. The results reveal the specific long-range contacts as well as variation of long-range order in different structural classes. The pair-wise residue preference to form long-range contacts reveals the dominance of hydrophobic residues irrespective of the structural class. We also provide visual examples of long-range contact network pattern in the different structural classes. BACKGROUND Proteins evolved from a common ancestor are said to be homologues and to constitute a family with potentially similar structures, functions, and interactions. Analysis of a set of similarly folded proteins with distinct amino acid seq- uences, such as homologues, can help in identifying residues and regions of polypeptide chains that are likely to be important in the formation and stability of the fold. The problem of identifying real protein families based on amino acid sequence conservation has been the subject of extensive debate, because algorithms that search for pair wise homologies can miss important relations and produce false hits. Automatic classification of proteins into homo- logous super families, by looking at their three dimensional structure has been a long goal for scientists studying proteins. Several statistical models have been created to aid in the classification but so far had only partial success. Correct functional and evolutionary classification of new structures is difficult for distantly related proteins and error- prone using simple statistical scores based on sequence or structure similarity. There are databases, which contains homologous families of proteins that have been classified by their structural classes and folds. A fully automated database of protein sequences patterns derived from the analysis of the conser- ved residues that are predicted to be functional in struc- turally-aligned homologous families is the HOMSTRAD database (1) and PALI (2) is a database that consists of 1922 protein families containing over 13,500 protein domains. The SCOP (Structural Classification of Proteins) (3) data-

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call