Abstract

This study introduces Chemical SuperLearner (ChemSL), a novel automated framework for building interpretable machine-learning models that predict molecular properties from chemical representations. The ChemSL framework helps build a suitable combination of molecular representation and a data-driven SuperLearner, a stacked ensemble model from a pool of 40 base learners. The top-ranked base learners are ensembled using weights by a meta learner. Three regression benchmark datasets (ESOL, FreeSolv, Lipophilicity) from MoleculeNet were used to compare the performance of the ChemSL-generated models against the models available in the literature. The ChemSL-generated models achieved superior performance while maintaining interpretability. Finally, the ChemSL framework's applicability was demonstrated using the Yield Sooting Index (YSI) database from Harvard Dataverse. The model developed showed excellent predictive capabilities, highlighting its potential as a powerful tool for researchers in various fields, including cheminformatics, materials science, drug discovery, and fuel design.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.