This study presents an enhanced protein design algorithm that aims to emulate natural heterogeneity of protein sequences. Initial analysis revealed that natural proteins exhibit a permutation composition lower than the theoretical maximum, suggesting a selective utilization of the 20-letter amino acid alphabet. By not constraining the amino acid composition of the protein sequence but instead allowing random reshuffling of the composition, the resulting design algorithm generates sequences that maintain lower permutation compositions in equilibrium, aligning closely with natural proteins. Folding free energy computations demonstrated that the designed sequences refold to their native structures with high precision, except for proteins with large disordered regions. In addition, direct coupling analysis showed a strong correlation between predicted and actual protein contacts, with accuracy exceeding 82% for a large number of top pairs (>4L). The algorithm also resolved biases in previous designs, ensuring a more accurate representation of protein interactions. Overall, it not only mimics the natural heterogeneity of proteins but also ensures correct folding, marking a significant advancement in protein design and engineering.
Read full abstract