Abstract

Protein structures are evolutionarily more conserved than sequences, and sequences with very low sequence identity often share the same fold. This leads to the concept of protein designability. Elucidating the relationship between protein sequence and the three-dimensional structure that the sequence folds into is an important problem in computational structural biology.45 protein chains (40-mer) from the PDB were analyzed. Hydrophobic-polar sequences were generated and contact energies calculated by threading each sequence onto Cα coarse-grained protein structures. The minimum energy structure for each sequence was identified and the number of sequences folding to each fold (designability) was obtained. Highly designable structures obtained were found to be popular structural motifs.H/P mutational analysis of sequences folding to each conformation was performed. As designability increases, the total number of mutations was also found to increase. The sequences folding to the most designable structure (helix-turn-helix motif) were also analyzed. The degree of connectivity at each residue position correlates inversely with the degree of solvent exposure. The surface residues had fewer interactions compared to buried residues. Highly connected residues were also found to be more conserved than the other residue positions. i.e. the diversity of the sequences increases with designability; however, there are conserved positions.Using tripeptide percentages of the most and least designable sequences, ten-fold cross-validation was performed and designable sequences were found to be distinguishable (accuracies > 85%, AUC > 0.87). The same set of sequences was then used as a training set with a test set of real binary protein sequences. Designable sequences obtained mimic real protein sequences with accuracies of nearly 60%. Highly and poorly designable classes can be used to train machine learning algorithms to identify which real protein sequences are designable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call