Computational methods including machine learning and molecular dynamics simulations have strong potential to characterize, understand, and ultimately predict the properties of proteins relevant to their stability and function as therapeutics. Such methods would streamline the development pathway by minimizing the current experimental testing required for many protein variants and formulations. The molecular understanding of thermostability and aggregation propensity has advanced significantly along with predictive algorithms based on the sequence-level or structural-level information on a protein. However, these approaches focus largely on a comparison of protein sequence variations to correlate the properties of proteins to their stability, solubility, and aggregation propensity. For therapeutic protein development, it is of equal importance to take into account the impact of the formulation conditions to elucidate and predict the stability of the antibody drugs. At the macroscopic level, changing temperature, pH, ionic strength, and the addition of excipients can significantly alter the kinetics of protein aggregation. The mechanisms controlling aggregation kinetics have been traced back to a combination of molecular features, including conformational stability, partial unfolding to aggregation-prone states, and the colloidal stability governed by surface charges and hydrophobicity. However, very little has been done to evaluate these features in the context of protein dynamics in different formulations. In this work, we have combined a range of molecular features calculated from the Fab A33 protein sequence and molecular dynamics simulations. Using the power of advanced, yet interpretable, statistical tools, it has been possible to uncover greater insights into the mechanisms behind protein stability, validating previous findings, and also develop models that can predict the aggregation kinetics within a range of 49 different solution conditions.
Read full abstract