Abstract

A central problem in forming accurate regression equations in QSAR studies is the selection of appropriate descriptors for the compounds under study. We describe a novel procedure for using inductive logic programming (ILP) to discover new indicator variables (attributes) for QSAR problems, and show that these improve the accuracy of the derived regression equations. ILP techniques have previously been shown to work well on drug design problems where there is a large structural component or where clear comprehensible rules are required. However, ILP techniques have had the disadvantage of only being able to make qualitative predictions (e.g. active, inactive) and not to predict real numbers (regression). We unify ILP and linear regression techniques to give a QSAR method that has the strength of ILP at describing steric structure, with the familiarity and power of linear regression. We evaluated the utility of this new QSAR technique by examining the prediction of biological activity with and without the addition of new structural indicator variables formed by ILP. In three out of five datasets examined the addition of ILP variables produced statistically better results (P < 0.01) over the original description. The new ILP variables did not increase the overall complexity of the derived QSAR equations and added insight into possible mechanisms of action. We conclude that ILP can aid in the process of drug design.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call