Abstract
Machine learning (ML) techniques are being widely implemented to fill the gap in simple molecular design guidelines for newer therapeutic modalities in the extended and beyond rule of five chemical space (eRo5, bRo5). These ML techniques predict molecular properties directly from the structure, allowing for the prioritization of promising compounds. However, the performance of models varies greatly among ML use cases. A molecular property for which achieving sufficient performance in generalizing global models still remains difficult is Caco-2 permeability. Especially within the lower permeability ranges, which are specific for larger molecules belonging to the e/bRo5 space, accurate regression predictions have proven to be challenging. The present study, therefore, identifies a suitable combination of ML algorithm and descriptors, consisting of the LightGBM algorithm and RDKit molecular property descriptors, to predict Caco-2 permeability very efficiently by a simple global model. An additionally introduced local model uses the same algorithm and descriptors but selects its training data based on Tanimoto fingerprint similarity to match the individual test compound's structure. Evaluation of this adaptive model, by systematically varying the number of most similar structures for training, shows that, in comparison to the global model, there was only marginally improved performance with specific training data constellations. These random improvements indicate that deriving general rules for local model parametrization is not possible a priori for the chosen algorithm and descriptor combination, and preselecting training data does not seem advantageous over global ML based on all available data, while creation of more data-efficient models was generally proven to be possible.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.