Multidomain proteins with long flexible linkers and full-length intrinsically disordered proteins (IDPs) are best defined as an ensemble of conformations rather than a single structure. Determining high-resolution ensemble structures of such proteins poses various challenges by using tools from experimental structural biophysics. Integrative approaches combining available low-resolution ensemble-averaged experimental data and in silico biomolecular reconstructions are now often used for the purpose. However, extensive Boltzmann weighted conformation sampling for large proteins, especially for ones where both the folded and disordered domains exist in the same polypeptide chain, remains a challenge. In this work, we present a 2-site per amino-acid resolution SOP-MULTI force field for simulating coarse-grained models of multidomain proteins. SOP-MULTI combines two well-established self-organized polymer models─: (i) SOP-SC models for folded systems and (ii) SOP-IDP for IDPs. For the SOP-MULTI, we introduce cross-interaction terms between the beads belonging to the folded and disordered regions to generate conformation ensembles for full-length multidomain proteins such as hnRNP A1, TDP-43, G3BP1, hGHR-ECD, TIA1, HIV-1 Gag, polyubiquitin, and FUS. When back-mapped to all-atom resolution, SOP-MULTI trajectories faithfully recapitulate the scattering data over the range of the reciprocal space. We also show that individual folded domains preserve native contacts with respect to solved folded structures, and root-mean-square fluctuations of residues in folded domains match those obtained from all-atom molecular dynamics simulation trajectories of the same folded systems. SOP-MULTI force field is made available as a LAMMPS-compatible user package along with setup codes for generating the required files for any full-length protein with folded and disordered regions.
Read full abstract