Abstract

The physicochemical properties of a drug molecule, such as lipophilicity and aqueous solubility, have profound effects on the ADMET (absorption, distribution, metabolism, elimination, and toxicity) profile of the drug; thus, a ready accessibility to accurate log P and solubility values for compounds of interest will facilitate making the right decision on the fate of the related compounds. To develop predictive and reliable quantitative structure–property relationship (QSPR) models, it is worthwhile to invest efforts on developing novel molecular descriptor systems to decipher the relevant structural features encoded in two-dimensional chemical structures or one-dimensional Smiles strings. Development of an atom type-based molecular descriptor system is presented in detail. The system started with an atom type casting tree, constructed on the basis of a chemist’s knowledge and intuition. The structure of the tree was then optimized on its ability to predict the log P values of the 10,851 compounds in the dataset of Starlist, through recursive error analysis and variable importance analysis. Discrepancy produces learning. Without removing any outlier, or preselecting a subset of descriptors, the QSPR models based on the optimized molecular descriptors were capable of accurately predicting the 10,851 experimental log P values with a correlation coefficient r2 of 0.90 and root mean square error (RMSE) of 0.54 log units using a linear partial least square regression (PLS) model with only seven components. Incorporation of nonlinearity significantly improves the accuracy of the model to a r2 of 0.98 and RMSE of 0.05 log units by using the ν-support vector regression (SVR) method. To alleviate the doubt of overfitting that arises from the extremely low error of prediction, a rigorous validation was performed by constructing a SVR model with 20% of the randomly selected Starlist compounds to predict the remaining 80% of the data. The SVR model exhibited high predictive power on both the training set, with r2 of 0.99 and RMSE of 0.03 log units, and the validation set, with r2 of 0.91 and root mean square error of prediction (RMSEP) of 0.25 log units. The most important atom types and correction factors derived from the PLS and SVR models overlap extensively. This piece of information offers valuable guidance for medicinal chemists to manipulate molecular lipophilicity through modifying the molecular structures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.