Abstract

BackgroundQuantitative structure-activity relationship (QSAR) models are important tools used in discovering new drug candidates and identifying potentially harmful environmental chemicals. These models often face two fundamental challenges: limited amount of available biological activity data and noise or uncertainty in the activity data themselves. To address these challenges, we introduce and explore a QSAR model based on custom distance metrics in the structure-activity space.MethodsThe model is built on top of the k-nearest neighbor model, incorporating non-linearity not only in the chemical structure space, but also in the biological activity space. The model is tuned and evaluated using activity data for human estrogen receptor from the US EPA ToxCast and Tox21 databases.ResultsThe model closely trails the CERAPP consensus model (built on top of 48 individual human estrogen receptor activity models) in agonist activity predictions and consistently outperforms the CERAPP consensus model in antagonist activity predictions.DiscussionWe suggest that incorporating non-linear distance metrics may significantly improve QSAR model performance when the available biological activity data are limited.

Highlights

  • Identifying and understanding the connection between chemical structure and biological activity is a central problem in contemporary pharmacology and toxicology

  • We introduced the generalized k-nearest neighbor (GkNN) Quantitative structure-activity relationship (QSAR) model based on a custom non-linear distance metric in the chemical structure—biological activity space and explored how this non-linearity influences the model performance

  • Using the human estrogen receptor (hER) data from the ToxCast [9] and Tox21 [10] databases, we compared the accuracy of the GkNN model against that of other variants of the k-nearest neighbor (kNN) model with non-linear weighting schemes and the collaborative estrogen receptor activity prediction project (CERAPP) consensus model [16]

Read more

Summary

Introduction

Identifying and understanding the connection between chemical structure and biological activity is a central problem in contemporary pharmacology and toxicology Advances in such understanding could facilitate in silico discovery of novel drug candidates and give rise to more efficient methods for computational screening of environmental chemicals for potential adverse effects on human health [1, 2]. Quantitative structure-activity relationship (QSAR) models are important tools used in discovering new drug candidates and identifying potentially harmful environmental chemicals. These models often face two fundamental challenges: limited amount of available biological activity data and noise or uncertainty in the activity data themselves.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call