Abstract

Function of proteins or a network of interacting proteins often involves communication between residues that are well separated in sequence. The classic example is the participation of distant residues in allosteric regulation. Bioinformatic and structural analysis methods have been introduced to infer residues that are correlated. Recently, increasing attention has been paid to obtain the sequence properties that determine the tendency of disease-related proteins (Abeta peptides, prion proteins, transthyretin, etc.) to aggregate and form fibrils. Motivated in part by the need to identify sequence characteristics that indicate a tendency to aggregate, we introduce a general method that probes covariations in charged residues along the sequence in a given protein family. The method, which involves computing the sequence correlation entropy (SCE) using the quenched probability P(sk)(i,j) of finding a residue pair at a given sequence separation, sk, allows us to classify protein families in terms of their SCE. Our general approach may be a useful way in obtaining evolutionary covariations of amino acid residues on a genome wide level. We use a combination of SCE and clustering based on the principle component analysis to classify the protein families. From an analysis of 839 families, covering approximately 500,000 sequences, we find that proteins with relatively low values of SCE are predominantly associated with various diseases. In several families, residues that give rise to peaks in P(sk)(i,j) are clustered in the three-dimensional structure. For the class of proteins with low SCE values, there are significant numbers of mixed charged-hydrophobic (CH) and charged-polar (CP) runs. Our findings suggest that the low values of SCE and the presence of (CH) and/or (CP) may be indicative of disease association or tendency to aggregate. Our results led to the hypothesis that functions of proteins with similar SCE values may be linked. The hypothesis is validated with a few anecdotal examples. The present results also lead to the prediction that the overall charge correlations in proteins affect the kinetics of amyloid formation--a feature that is common to all proteins implicated in neurodegenerative diseases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call