Peptides, defined as sequences of amino acids up to approximately 50 residues in length, represent an extremely large reservoir of potentially bioactive compounds, referred to here as the peptide chemical space. Recent advances in computer hardware and software have led to a wide application of computational methods to explore this chemical space. Here, we review different in silico approaches including structure-based design, genetic algorithms, and machine learning. We also review the use of molecular fingerprints to sample virtual libraries and to visualize the peptide chemical space. Finally, we present an overview of the known peptide chemical space in form of an interactive map representing 40,531 peptides collected from eleven open-access peptide and peptide-containing databases, accessible at https://tm.gdb.tools/map4/peptide_databases_tmap/. These peptides are displayed as TMAP (Tree-Map) according to their molecular fingerprint similarity computed using MAP4, a MinHashed atom pair fingerprint well suited to analyze large molecules.
Read full abstract