Recently, a number of message passing neural network (MPNN)-based methods have been introduced that, based on backbone atom coordinates, efficiently recover native amino acid sequences of proteins and predict modifications that result in better expressing, more soluble, and stable variants. However, usually, X-ray structures, or artificial structures generated by algorithms trained on X-ray structures, were employed to define target backbone conformations. Here, we show that commonly used algorithms ProteinMPNN and SolubleMPNN display low sequence recovery on structures determined using NMR. We subsequently propose a computational approach that we successfully apply to re-engineer AstaP, a protein that natively binds a large hydrophobic ligand astaxanthin (C40H52O4), and for which only a structure determined using NMR is currently available. The engineered variants, designated NeuroAstaP, are 51 amino acid shorter than the 22 kDa parent protein, have 38%-42% sequence identity to it, exhibit good yields, are expressed in a soluble, mostly monomeric form, and demonstrate efficient binding of carotenoids in vitro and in cells. Altogether, our work further tests the limits of using machine learning for protein engineering and paves the way for MPNN-based modification of proteins based on NMR-derived structures.
Read full abstract