We combined our generalized energy-based fragmentation (GEBF) approach and machine learning (ML) technique to construct quantum mechanics (QM) quality force fields for proteins. In our scheme, the training sets for a protein are only constructed from its small subsystems, which capture all short-range interactions in the target system. The energy of a given protein is expressed as the summation of atomic contributions from QM calculations of various subsystems, corrected by long-range Coulomb and van der Waals interactions. With the Gaussian approximation potential (GAP) method, our protocol can automatically generate training sets with high efficiency. To facilitate the construction of training sets for proteins, we store all trained subsystem data in a library. If subsystems in the library are detected in a new protein, corresponding datasets can be directly reused as a part of the training set on this new protein. With two polypeptides, 4ZNN and 1XQ8 segment, as examples, the energies and forces predicted by GEBF-GAP are in good agreement with those from conventional QM calculations, and dihedral angle distributions from GEBF-GAP molecular dynamics (MD) simulations can also well reproduce those from ab initio MD simulations. In addition, with the training set generated from GEBF-GAP, we also demonstrate that GEBF-ML force fields constructed by neural network (NN) methods can also show QM quality. Therefore, the present work provides an efficient and systematic way to build QM quality force fields for biological systems.
Read full abstract