MolPipeline: A Python Package for Processing Molecules with RDKit in Scikit-learn.

Jochen Sieg,Christian W Feldmann,Jennifer Hemmerich,Conrad Stork,Frederik Sandfort,Philipp Eiden,Miriam Mathea

doi:10.1021/acs.jcim.4c00863

Abstract

The open-source package scikit-learn provides various machine learning algorithms and data processing tools, including the Pipeline class, which allows users to prepend custom data transformation steps to the machine learning model. We introduce the MolPipeline package, which extends this concept to cheminformatics by wrapping standard RDKit functionality, such as reading and writing SMILES strings or calculating molecular descriptors from a molecule object. We aimed to build an easy-to-use Python package to create completely automated end-to-end pipelines that scale to large data sets. Particular emphasis was put on handling erroneous instances, where resolution would require manual intervention in default pipelines. MolPipeline provides the building blocks to enable seamless integration of common cheminformatics tasks within scikit-learn's pipeline framework, such as scaffold splits and molecular standardization, making pipeline building easily adaptable to diverse project requirements.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MolPipeline: A Python Package for Processing Molecules with RDKit in Scikit-learn.

Abstract

Talk to us

Similar Papers

More From: Journal of chemical information and modeling

Lead the way for us

Similar Papers

Machine and deep learning algorithms for classifying different types of dementia: A literature review
Masoud Noroozi ... Niloofar Deravi
Applied Neuropsychology: Adult | VOL. ahead-of-print
Masoud Noroozi, et. al.Masoud Noroozi ... Niloofar Deravi
31 Jul 2024
Applied Neuropsychology: Adult | VOL. ahead-of-print

Predicting seismic response of SMRFs founded on different soil types using machine learning techniques
F. Kazemi ... R. Jankowski
Engineering Structures | VOL. 274
F. Kazemi, et. al.F. Kazemi ... R. Jankowski
27 Oct 2022
Engineering Structures | VOL. 274

The State of Machine Learning in Outcomes Prediction of Transsphenoidal Surgery: A Systematic Review.
Darrion B Yang ... Mika Janbahan
Journal of Neurological Surgery Part B: Skull Base | VOL. 84
Darrion B Yang, et. al.Darrion B Yang ... Mika Janbahan
23 Nov 2022
Journal of Neurological Surgery Part B: Skull Base | VOL. 84

Hybrid meta-heuristic and machine learning algorithms for tunneling-induced settlement prediction: A comparative study
Pin Zhang ... Tommy H.T Chan
Tunnelling and Underground Space Technology | VOL. 99
Pin Zhang, et. al.Pin Zhang ... Tommy H.T Chan
20 Mar 2020
Tunnelling and Underground Space Technology | VOL. 99

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MolPipeline: A Python Package for Processing Molecules with RDKit in Scikit-learn.

Abstract

Talk to us

Similar Papers

More From: Journal of chemical information and modeling