Abstract

Rapid solvent selection is of great significance in chemistry. However, solubility prediction remains a crucial challenge. This study aimed to develop machine learning models that can accurately predict compound solubility in organic solvents. A dataset containing 5081 experimental temperature and solubility data of compounds in organic solvents was extracted and standardized. Molecular fingerprints were selected to characterize structural features. lightGBM was compared with deep learning and traditional machine learning (PLS, Ridge regression, kNN, DT, ET, RF, SVM) to develop models for predicting solubility in organic solvents at different temperatures. Compared to other models, lightGBM exhibited significantly better overall generalization (logS  ± 0.20). For unseen solutes, our model gave a prediction accuracy (logS  ± 0.59) close to the expected noise level of experimental solubility data. lightGBM revealed the physicochemical relationship between solubility and structural features. Our method enables rapid solvent screening in chemistry and may be applied to solubility prediction in other solvents.

Highlights

  • Organic solvents play an important role in the chemical industry

  • Molecular dynamics (MD), provides the dynamic evolution of a system, including structure, motion, and energy of molecules. It explains solubility through different aspects, such as Van der Waal forces and electrostatic forces between molecules [23]. Fitted equations such as general solubility equations (GSE) are proposed by Yalkowsky et al considering logP and melting point of a substance [24, 25]. logP is the measurement of the relationship between lipophilicity and hydrophilicity of a compound, which is determined by experiments

  • 5081 solubility data were shown in the histogram chart range from − 7 to 3 in the solubility dataset (Fig. 2A)

Read more

Summary

Introduction

Organic solvents play an important role in the chemical industry. They are used in synthesis, catalysis, separation, quantitative analysis, and pharmaceutical formulation. Computational methods have been applied to estimating the solubility of compounds in water before the experiments [1,2,3,4,5,6]. These methods included mechanism models and QSPR (quantitative structure–property relationship) approaches. These methods are almost applied to prediction in water. These methods include the simplistic rule-of-thumb “like dissolves like”, empirical equations, and thermodynamically-based mathematical expressions. Hildebrand and Hansen solubility parameters are introduced to compute the solubility of drugs, excipients, and surfactants for

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call