Abstract
An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function's ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning (ML) methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF; Zhang , Z. J. Chem. Theory Comput. 2018 , 14 , 5045 ) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a ML model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the "comparison" concept, and the resultant RF models were tested on CASF-2013 ( Li , Y. J. Chem. Inf.Model. 2014 , 54 , 1700 ). In a comparison of the performance of our RF models against 29 scoring functions, we found that our models outperformed the other scoring functions in predicting the native pose. In addition, we created two artificially designed potential function sets to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which shares the same peak positions with GARF but has fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. All code and data used in this work are available at https://github.com/JunPei000/random_forest_protein_ligand_decoy_detection .
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.