The elucidation of aggregation rules for short peptides (e.g., tetrapeptides and pentapeptides) is crucial for the precise manipulation of aggregation. In this study, we derive comprehensive aggregation rules for tetrapeptides and pentapeptides across the entire sequence space based on the aggregation propensity values predicted by a transformer-based deep learning model. Our analysis focuses on three quantitative aspects. First, we investigate the type and positional effects of amino acids on aggregation, considering both the first- and second-order contributions. By identifying specific amino acids and amino acid pairs that promote or attenuate aggregation, we gain insights into the underlying aggregation mechanisms. Second, we explore the transferability of aggregation propensities between tetrapeptides and pentapeptides, aiming to explore the possibility of enhancing or mitigating aggregation by concatenating or removing specific amino acids at the termini. Finally, we evaluate the aggregation morphologies of over 20,000 tetrapeptides, regarding the morphology distribution and type and positional contributions of each amino acid. This work extends the existing aggregation rules from tripeptide sequences to millions of tetrapeptide and pentapeptide sequences, offering experimentalists an explicit roadmap for fine-tuning the aggregation behavior of short peptides for diverse applications, including hydrogels, emulsions, or pharmaceuticals.
Read full abstract