Abstract

Among a great number of QSPR/QSAR approaches to the prediction of physical and chemical properties and biological activity of organic com� pounds, the methods using fragmental descriptors play a specific role [1, 2]. The values of the latter can be either the occurrence numbers or indicators of the presence of some fragments in the structures of chem� ical compounds. Advantages of these descriptors are their transparent meaning and the possibility of fast automatic generation on the basis of only the struc� tural formula. Fragmental descriptors can be calcu� lated without knowledge of the 3D structure or elec� tronic structure of molecules and, therefore, can be easily used for operating large databases. One of the disadvantages of fragmental descriptors is the problem of rare fragments that can be absent in the training set but can exist in the compounds for which the prediction is performed. Since the contribu� tions of rare fragments cannot be determined on the basis of the training set, considerable errors of predic� tion are expected for compounds containing such fragments. We suggest solving this problem by intro� ducing additional descriptors with the values being to an extent related to the contributions of fragments to the predicted property. For this purpose, we also sug� gest using special fragmental descriptors with the val� ues being calculated by combining the properties of the atoms that constitute these fragments. Such descriptors are referred to as pseudofragmental descriptors in order to distinguish them from “proper” descriptors assigned the values of the occurrence num� bers or indicators of the presence of certain fragments in the structures of chemical compounds. The atomic properties that are believed to influence the contribu� tions of fragmental descriptors to the predicted prop� erty, for example, the atomic weight, number of elec� trons, covalent radius, electronegativity, ionization potential, etc., can be used for predicting physical and chemical properties of organic molecules. It is also important for the used combinations of properties to have a clear physical meaning since this provides a bet� ter chance for the existence of correlation of their val� ues with fragmental contributions. If such a correla� tion exists, a small number of pseudofragmental descriptors enter into statistical models instead of numerous proper fragmental descriptors, including potentially rare, thus acting as a compressed generali� zation of the latter. This largely solves the problem of rare fragments if the pseudofragmental descriptors are constructed on the basis of frequently encountered fragments consisting of separate atoms or short chains of arbitrary atoms, which are present almost in all molecules.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call