Abstract

Many proteins consist of folded domains connected by regions with higher flexibility. The details of the resulting conformational ensemble play a central role in controlling interactions between domains and with binding partners. Small-Angle Scattering (SAS) is well-suited to study the conformational states adopted by proteins in solution. However, analysis is complicated by the limited information content in SAS data and care must be taken to avoid constructing overly complex ensemble models and fitting to noise in the experimental data. To address these challenges, we developed a method based on Bayesian statistics that infers conformational ensembles from a structural library generated by all-atom Monte Carlo simulations. The first stage of the method involves a fast model selection based on variational Bayesian inference that maximizes the model evidence of the selected ensemble. This is followed by a complete Bayesian inference of population weights in the selected ensemble. Experiments with simulated ensembles demonstrate that model evidence is capable of identifying the correct ensemble and that correct number of ensemble members can be recovered up to high level of noise. Using experimental data, we demonstrate how the method can be extended to include data from Nuclear Magnetic Resonance (NMR) and structural energies of conformers extracted from the all-atom energy functions. We show that the data from SAXS, NMR chemical shifts and energies calculated from conformers can work synergistically to improve the definition of the conformational ensemble.

Highlights

  • Proteins are highly dynamic systems [1] often with large scale conformational dynamics facilitated by regions of flexible or disordered amino acid sequence linking stably folded structured domains [2]

  • Small Angle X-ray Scattering (SAXS) is uniquely suited to study the conformational ensembles adopted by these kinds of proteins

  • Because of the limited information provided by SAXS, ensemble models must be built by combination with other information sources and care have to be taken to avoid constructing ensembles that are more complex than data can support

Read more

Summary

Introduction

Proteins are highly dynamic systems [1] often with large scale conformational dynamics facilitated by regions of flexible or disordered amino acid sequence linking stably folded structured domains [2]. Close to half to the proteins coded in the human genome contain significant disordered regions of greater than 30 residues [3] and there is a multitude of multi-domain proteins with shorter flexible linkers or hinges that are important for their biological function Successful 3D modelling against SAS data depends upon restraining the conformational space to be sampled by a priori knowledge of protein structure and wherever possible by other experimental data

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.