Abstract

D i D i log condensed systems; their halogen, nitro, nitroso, cyano, and sulfo derivatives; cycloalkanes and cycloalkenes; adamantanes; heterocyclic and polycyclic structures; sulfones; sulfoxides; amino acids and amides; hydrazines; hydrazides; semicarbazides; amines; imines; quaternary ammonium salts; carboxylic, phosphoric, and sulfonic acids; carbamates; crown ethers; sugars; steroids; prostaglandins; alkaloids; antibiotics; and compounds of boron, mercury, germanium, selenium, lead, gold, etc. In the compilation of the database, we used only the most reliable data, marked by asterisks in the monograph [12]. For modeling, the database was randomly divided into three parts: the training set (86%), the validation set (7%), and the set for estimating the predictive ability (7%). At identical initial conditions (partitioning and types of descriptors), prediction quality was compared for the linear-regression and neural-network models based on the above types of descriptors in different combinations. We constructed 17 linear-regression models and 85 neural-network models for each number of neurons in the hidden layer, which was varied from 6 to 11. Next, the optimal number of neurons in the hidden layer was selected by rootmean-square errors in the validation set. Parameters of models that were obtained for some combinations of descriptors are presented in Table 1. The optimal number of hidden neurons for modeling lipophilicity with the above database was nine. In addition, from the data in Table 1 it follows that fragment descriptors are of significant importance in the construction of the model for log P . Upon increasing the maximum size of fragment descriptors, the predictive ability is significantly improved for both linear-regression and neural-network models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call