Generative β-hairpin design using a residue-based physicochemical property landscape

Gemechis D Degaga,Yui Tik Pang,Julie C Mitchell,James C Gumbart,Vardhan Satalkar,Wei Li,Matthew P Torres,Andrew C Mcshan

doi:10.1016/j.bpj.2024.01.029

Abstract

Do novo peptide design is a new frontier that has broad application potential in the biological and biomedical fields. Most existing models for do novo peptide design are largely based on sequence homology that can be restricted based on evolutionarily derived protein sequences and lack the physicochemical context essential in protein folding. Generative machine learning for do novo peptide design is a promising way to synthesize theoretical data that are based on, but unique from, the observable universe. In this study, we created and tested a custom peptide generative adversarial neural network intended to design peptide sequences that can fold into the β-hairpin secondary structure. This deep neural network model is designed to establish a preliminary foundation of the generative approach based on physicochemical and conformational properties of 20 canonical amino acids, for example, hydrophobicity and residue volume, using extant structure-specific sequence data from PDB. The beta generative adversarial neural network model robustly distinguishes secondary structures of β hairpin from α helix and intrinsically disordered peptides with an accuracy of up to 96% and generates artificial β-hairpin peptide sequences with minimum sequence identities around 31% and 50% when compared against the current NCBI PDB and nonredundant databases, respectively. These results highlight the potential of generative models specifically anchored by physicochemical and conformational property features of amino acids to expand the sequence-to-structure landscape of proteins beyond evolutionary limits.

Full Text