Abstract

This study introduces Chemical SuperLearner (ChemSL), a novel automated framework for building interpretable machine-learning models that predict molecular properties from chemical representations. The ChemSL framework helps build a suitable combination of molecular representation and a data-driven SuperLearner, a stacked ensemble model from a pool of 40 base learners. The top-ranked base learners are ensembled using weights by a meta learner. Three regression benchmark datasets (ESOL, FreeSolv, Lipophilicity) from MoleculeNet were used to compare the performance of the ChemSL-generated models against the models available in the literature. The ChemSL-generated models achieved superior performance while maintaining interpretability. Finally, the ChemSL framework's applicability was demonstrated using the Yield Sooting Index (YSI) database from Harvard Dataverse. The model developed showed excellent predictive capabilities, highlighting its potential as a powerful tool for researchers in various fields, including cheminformatics, materials science, drug discovery, and fuel design.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call