Abstract

BackgroundRecently, novel 3D alignment-free molecular descriptors (also known as QuBiLS-MIDAS) based on two-linear, three-linear and four-linear algebraic forms have been introduced. These descriptors codify chemical information for relations between two, three and four atoms by using several (dis-)similarity metrics and multi-metrics. Several studies aimed at assessing the quality of these novel descriptors have been performed. However, a deeper analysis of their performance is necessary. Therefore, in the present manuscript an assessment and statistical validation of the performance of these novel descriptors in QSAR studies is performed.ResultsTo this end, eight molecular datasets (angiotensin converting enzyme, acetylcholinesterase inhibitors, benzodiazepine receptor, cyclooxygenase-2 inhibitors, dihydrofolate reductase inhibitors, glycogen phosphorylase b, thermolysin inhibitors, thrombin inhibitors) widely used as benchmarks in the evaluation of several procedures are utilized. Three to nine variable QSAR models based on Multiple Linear Regression are built for each chemical dataset according to the original division into training/test sets. Comparisons with respect to leave-one-out cross-validation correlation coefficientsleft( {Q_{loo}^{2} } right) reveal that the models based on QuBiLS-MIDAS indices possess superior predictive ability in 7 of the 8 datasets analyzed, outperforming methodologies based on similar or more complex techniques such as: Partial Least Square, Neural Networks, Support Vector Machine and others. On the other hand, superior external correlation coefficientsleft( {Q_{ext}^{2} } right) are attained in 6 of the 8 test sets considered, confirming the good predictive power of the obtained models. For the Q_{ext}^{2} values non-parametric statistic tests were performed, which demonstrated that the models based on QuBiLS-MIDAS indices have the best global performance and yield significantly better predictions in 11 of the 12 QSAR procedures used in the comparison. Lastly, a study concerning to the performance of the indices according to several conformer generation methods was performed. This demonstrated that the quality of predictions of the QSAR models based on QuBiLS-MIDAS indices depend on 3D structure generation method considered, although in this preliminary study the results achieved do not present significant statistical differences among them.ConclusionsAs conclusions it can be stated that the QuBiLS-MIDAS indices are suitable for extracting structural information of the molecules and thus, constitute a promissory alternative to build models that contribute to the prediction of pharmacokinetic, pharmacodynamics and toxicological properties on novel compounds.Graphical abstractComparative graphical representation of the performance of the novel QuBiLS-MIDAS 3D-MDs with respect to other methodologies in QSAR modeling of eight chemical datasetsElectronic supplementary materialThe online version of this article (doi:10.1186/s13321-016-0122-x) contains supplementary material, which is available to authorized users.

Highlights

  • Novel 3D alignment-free molecular descriptors based on twolinear, three-linear and four-linear algebraic forms have been introduced

  • (1) Computation of the molecular vectors according to selected atomic properties; (2) Computation from 3D Cartesian coordinates of each atom of a molecule the non-stochastic two-tuple, three-tuple or four-tuple total spatial-(dis)similarity matrices for k = 1; (3) Consideration of atom-types or local-fragments; (4) Computation of the simplestochastic, double-stochastic and mutual probability matrices, as well as to determine the kth matrices through Hadamard product until the k value selected; (5) Splitting the calculated matrices into atom-level matrices; (6) Computation of the atom-level indices using the molecular vectors calculated in the step (1); and (7) Application of the selected aggregation operators over vector of atom-level descriptors computed from scrambling tests (a(Q2)) have in all cases values inferior to 0.4, indicating reduced propensity to chance correlation

  • Only in the dihydrofolate reductase inhibitors (DHFR) and glycogen phosphorylase b (GPB) datasets does the utilization of the local-fragment QuBiLS-MIDAS molecular descriptors (MDs) not influence the performance of the developed Quantitative Structure–Activity Relationship (QSAR) models

Read more

Summary

Introduction

Novel 3D alignment-free molecular descriptors ( known as QuBiLS-MIDAS) based on twolinear, three-linear and four-linear algebraic forms have been introduced. Computational methods that employ statistical and/ or artificial intelligence procedures are widely used in the drug discovery process, where the Quantitative Structure–Activity Relationship (QSAR) studies have an important role [1–4]. These studies are based on the principle that the biological activity (or property) of compounds depends on their structural and physicochemical features and are primarily aimed at finding good correlations among molecular features and specific biological activities [5]. These take into account the geometric (3D) features of molecules, which can be computed either from the information represented in a grid through an alignment process with respect to a reference compound or a pharmacophore [2, 10, 11], or using procedures based on Cartesian coordinates [8, 12, 13], molecular spectra [14, 15] and molecular transforms [16], or by the adaptation of 2D methods to take into account three-dimensional (3D) aspects [17–21]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.