Choosing an appropriate collective variable (CV) for any biomolecular process is a challenging task. Researchers are developing methods to solve this issue using a variety of methodologies, most recently using machine learning (ML) methods. In this work, we investigate the mechanism of collapse transition across various lengths of polymer systems through adaptively sampled multiple short trajectories utilizing the Time Lagged Independent Component Analysis (TICA) framework. From TICA analysis, it is revealed that the radius of gyration (Rg) and end-to-end distance serve as good order parameters (OPs) for these systems describing overall energy landscapes. Markov state model (MSM) and mean first passage time (MFPT) analysis suggest that hydration water (Nw) plays a determining role in dictating the time scale and barrier for the collapsed transition for the C40 system. P-fold analysis on identifying transition state ensembles (TSE) identified by committor analysis also strengthens the role of Nw in such a transition. TICA, MSM, and committor analyses on the collapse transition for C45 reveal similarities with C40 systems in different aspects. Furthermore, we propose a pipeline integrating XGBoost regression along with an interpretable ML model, Shapley Additive exPlanation (SHAP) to precisely elucidate the contribution of each OP locally at the TSE. Through this approach, we observe that the collapse transition is primarily driven by Nw for both polymer systems. A carefully designed protocol for the collapsed transition of C60 systems indirectly reiterates the above result. Overall, our results suggest that while the end-to-end distance should be considered for better resolution of metastable states in the landscape, Nw is the crucial coordinate to be used in enhanced sampling for the exploration of actual collapse transitions for linear hydrophobic polymer systems. The Python code for analyzing the contribution of different OPs in the TSE using an ML-aided protocol is available on GitHub (https://github.com/saikat-ai/linear_polymer_project).
Read full abstract