This study reports a machine-learning (ML) method to develop multi-purpose prediction strategies for the formation of cyclodextrin inclusion complexes (ICs) in aqueous solutions. A balanced dataset of pharmaceutically relevant molecules was constructed using experimental verification. Three ML models (artificial neural network, support vector machine, and logistic regression) were established and optimized to predict IC formation. To provide more reliable approaches for different prediction requirements, ML-based linear, recall-first, and precision-first strategies were further established based on the ML models for the maximum recall or precision values. The proposed recall-first strategy identified all positive samples to avoid missing data in the prediction, and the precision-first strategy accurately identified positive samples to reduce the number of validation experiments. The ML-based prediction strategies for IC formation were first established and showed high accuracy and reliability. These strategies provide higher efficiency and lower processing cost solutions for IC screening.
Read full abstract