Abstract

Predicting human skin permeability of chemical compounds accurately and efficiently is useful for developing dermatological medicines and cosmetics. However, previous work have two problems; 1) quality of databases used, and 2) methods for prediction models. In this paper, we attempt to solve these two problems. We first compile, by carefully screening from the literature, a novel dataset of chemical compounds with permeability coefficients, measured under consistent experimental conditions. We then apply machine learning techniques such as support vector regression (SVR) and random forest (RF) to our database to develop prediction models. Molecular descriptors are fully computationally obtained, and greedy stepwise selection is employed for descriptor selection. Prediction models are internally and externally validated. We generated an original, new database on human skin permeability of 211 different compounds from aqueous donors. Nonlinear SVR achieved the best performance among linear SVR, nonlinear SVR, and RF. The determination coefficient, root mean square error, and mean absolute error of nonlinear SVR in external validation were 0.910, 0.342, and 0.282, respectively. We provided one of the largest datasets with purely experimental log kp and developed reliable and accurate prediction models for screening active ingredients and seeking unsynthesized compounds of dermatological medicines and cosmetics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call