Protein sequence information is believed to embed the hint of their structures. To uncover the nature between protein sequence and their structures, this study motivates to inspect the dynamic interactions between various protein sequence features, and identify the sequential differences between the different protein structures. Protein sequence data from all structural classes in CATH and SCOP, and the structural disordered proteins from DisProt, as well as the structural motifs in PROSITE, are analyzed in this study. Betweenness and closeness centrality measures are employed to capture the topology of the networks constructed from amino acid feature interactions, while statistical tests are further implemented to compare the feature series distributions. Key findings suggest that in all structural classes, the features for Ala and α-helix and bend preference property, Ala and side-chain size, Ala and Gly, as well as Met and Leu attain significant interactions between each other, and the feature for Leu, Val, and Asn are acted as the critical sources of feature interactions, whereas Cys, His, Trp, and Met exhibit weak intra-type interactions with other features. These implicate that these feature interactions may have little impact in coding the structural differences. For the α structures, Glu, Pro and side-chain size, hydrophobicity properties exhibit high importance in feature interactions, whereas Gly, Thr and physical properties such as α-helix and bend preference, extended structural preference, pK-C value and surrounding hydrophobicity for β structures, show special high importance in β structures. Both α and β types of structures show Ser as the common sources of feature interactions, while the mixed α and β structures not only show common characters with the α and β types of structures, but also preferred interactions between Met, Lys and double-bend preference property, and between the sequence arrangements of Cys, His, Met, Tyr and amino acid composition features. The intrinsically disordered proteins (IDPs) present high frequency for the repetition patterns of certain amino acids, while the different structural motifs also show special characters. More sequential differences between the structures can also be identified from K-mers statistics and feature series distributions. The new discoveries reveal the nature of amino acid feature interaction mechanics, and show great importance of these interactions in coding the different types of protein structures. The results can not only contribute to future molecular design for protein-based vaccine or drug, but also enlighten the development for new protein structural classifiers.
Read full abstract