Abstract
Quantitative Structure Activity Relationship (QSAR) models can inform on the correlation between activities and structure-based molecular descriptors. This information is important for the understanding of the factors that govern molecular properties and for designing new compounds with favorable properties. Due to the large number of calculate-able descriptors and consequently, the much larger number of descriptors combinations, the derivation of QSAR models could be treated as an optimization problem. For continuous responses, metrics which are typically being optimized in this process are related to model performances on the training set, for example, and . Similar metrics, calculated on an external set of data (e.g., ), are used to evaluate the performances of the final models. A common theme of these metrics is that they are context -” ignorant”. In this work we propose that QSAR models should be evaluated based on their intended usage. More specifically, we argue that QSAR models developed for Virtual Screening (VS) should be derived and evaluated using a virtual screening-aware metric, e.g., an enrichment-based metric. To demonstrate this point, we have developed 21 Multiple Linear Regression (MLR) models for seven targets (three models per target), evaluated them first on validation sets and subsequently tested their performances on two additional test sets constructed to mimic small-scale virtual screening campaigns. As expected, we found no correlation between model performances evaluated by “classical” metrics, e.g., and and the number of active compounds picked by the models from within a pool of random compounds. In particular, in some cases models with favorable and/or values were unable to pick a single active compound from within the pool whereas in other cases, models with poor and/or values performed well in the context of virtual screening. We also found no significant correlation between the number of active compounds correctly identified by the models in the training, validation and test sets. Next, we have developed a new algorithm for the derivation of MLR models by optimizing an enrichment-based metric and tested its performances on the same datasets. We found that the best models derived in this manner showed, in most cases, much more consistent results across the training, validation and test sets and outperformed the corresponding MLR models in most virtual screening tests. Finally, we demonstrated that when tested as binary classifiers, models derived for the same targets by the new algorithm outperformed Random Forest (RF) and Support Vector Machine (SVM)-based models across training/validation/test sets, in most cases. We attribute the better performances of the Enrichment Optimizer Algorithm (EOA) models in VS to better handling of inactive random compounds. Optimizing an enrichment-based metric is therefore a promising strategy for the derivation of QSAR models for classification and virtual screening.
Highlights
Quantitative Structure Activity Relationship (QSAR) analysis could be broadly defined as the application of mathematical/statistical methods in order to find an empirical relationship between dependent variables obtained for a set of objects, and independent variables which describe in some ways these objects
The two main conclusions emerging from this work are the following: (1) QSAR models derived in order to be used for virtual screening (VS) should be best evaluated using a metric that can reflect on their success in a VS campaign
(2) Deriving QSAR models by directly optimizing an enrichment-based metric is a promising strategy for the development of QSAR models that could favorably be used as classifiers and for VS
Summary
Quantitative Structure Activity Relationship (QSAR) analysis could be broadly defined as the application of mathematical/statistical methods in order to find an empirical relationship between dependent variables obtained for a set of objects, and independent variables which describe in some ways these objects. In the most common QSAR applications, the dependent variables are activities (defined in the broadest possible way), the objects are molecules/materials and the independent variables are structure-based molecular/materials descriptors. When the number of samples in the input dataset is too small to allow for a reasonably large test set, models are usually evaluated using cross validation. The most common metrics for evaluating QSAR equations developed for continuous responses are R2 (for the training set) and Q2F1/F2/F3 (for the external set) [12,13,14]. Classification-based models, i.e., models derived for categorized responses are evaluated by metrics derived from the confusion matrix (e.g., the Matthews Correlation Coefficient; MCC)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.