Abstract

The MEDV-13, molecular electronegativity distance vector based on 13 atomic types, has at best 91 descriptors. It is impossible to indirectly use multiple linear regression (MLR) to derive a quantitative structure-activity relationship (QSAR) model. Although principal component regression (PCR) or partial least-squares regression (PLSR) can be employed to develop a latent QSAR model, it is still difficult how to determine the principal components (PCs) and depict the physical meaning of the PCs. So, a genetic algorithm (GA) is first employed to select an optimal subset of the descriptors from original MEDV-13 descriptor set. Then MLR is utilized to build a QSAR model between the optimal subset and the biological activities of three sets of compounds. For 31 benchmark steroids, a 5-descriptor QSAR model (M1) between the corticosteroid-binding globulin (CBG) binding affinity of the steroids and 5-descriptor subset is developed. The root-mean-square error of estimations (RMSEE) and the correlation coefficient of estimations (r) between the CBG binding affinity (BA) observed and the BA estimated by M1 are 0.422 and 0.9182, respectively. The root-mean-square error of predictions (RMSEP) and the correlation coefficient of predictions (q) between the BA observed and the BA predicted by leave-one-out cross validations are 0.504 and 0.8818, respectively. For 58 dipeptides inhibiting angiotensin-converting enzyme (ACE), a 5-variable QSAR model (M2) between the pIC(50) of peptides and 5-descriptor subset is derived. The M2 has a high quality with RMSEE = 0.339 and r = 0.9398 and RMSEP = 0.370 and q = 0.9280. For 16 indomethacin amides and esters (ImAE) inhibiting cyclooxygenase-2 (COX-2), a 6-variable QSAR model (M3) with RMSEE = 0.079 and r = 0.9839 and RMSEP = 0.151 and q = 0.9413 is built.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call