Cyclic peptides have emerged as a promising class of therapeutics. However, their de novo design remains challenging, and many cyclic peptide drugs are simply natural products or their derivatives. Most cyclic peptides, including the current cyclic peptide drugs, adopt multiple conformations in water. The ability to characterize cyclic peptide structural ensembles would greatly aid their rational design. In a previous pioneering study, our group demonstrated that using molecular dynamics results to train machine learning models can efficiently predict structural ensembles of cyclic pentapeptides. Using this method, which was termed StrEAMM (Structural Ensembles Achieved by Molecular Dynamics and Machine Learning), linear regression models were able to predict the structural ensembles for an independent test set with R2 = 0.94 between the predicted populations for specific structures and the observed populations in molecular dynamics simulations for cyclic pentapeptides. An underlying assumption in these StrEAMM models is that cyclic peptide structural preferences are predominantly influenced by neighboring interactions, namely, interactions between (1,2) and (1,3) residues. Here we demonstrate that for larger cyclic peptides such as cyclic hexapeptides, linear regression models including only (1,2) and (1,3) interactions fail to produce satisfactory predictions (R2 = 0.47); further inclusion of (1,4) interactions leads to moderate improvements (R2 = 0.75). We show that when using convolutional neural networks and graph neural networks to incorporate complex nonlinear interaction patterns, we can achieve R2 = 0.97 and R2 = 0.91 for cyclic pentapeptides and hexapeptides, respectively.
Read full abstract