Abstract

In the present work, support vector machines (SVMs) and multiple linear regression (MLR) techniques were used for quantitative structure–property relationship (QSPR) studies of retention time (tR) in standardized liquid chromatography–UV–mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins) based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLR and SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD). The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r2 and q2 are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William’s plot. The effects of different descriptors on the retention times are described.

Highlights

  • Fungi are major plant and insect pathogens, but they are not nearly as important as agents of disease in vertebrates, i.e., the number of medically important fungi is relatively low

  • We introduce the applications of support vector regression (SVR) for correlation problems in QSAR and compare its performance with multiple linear regression (MLR) method

  • In other words, increasing the electronic energy (ElcE), dipole length (DPLL)and Lowest Unoccupied Molecular Orbital energy (LUMO) will decrease tR, and the increase in the C logP increases the extent of tR of the compounds

Read more

Summary

Introduction

Fungi are major plant and insect pathogens, but they are not nearly as important as agents of disease in vertebrates, i.e., the number of medically important fungi is relatively low. The other methods are more empirically based on QSPR approaches using multiple linear regression (MLR) and support vector machine (SVM) techniques Of those previous studies that aimed to predict the retention time, the most promising method has been to use the QSPR approach: QSPR methods have been successfully used to predict many physicochemical properties. After the calculation of molecular descriptors, many different chemometrics methods, such as multiple linear regression (MLR), partial least squares regression (PLS), different types of artificial neural networks (ANN), genetic algorithms (GAs), and support vector machine (SVM) can be employed to derive correlation models between the molecular structures and properties. Zeroand multicollinearity tests with a cutoff value of 0.95 and variable selection by genetic algorithm, the number of descriptors was reduced to 22.The stepwise regression routine was used to develop the linear model for the prediction of the retention time of mycotoxins using calculated structural descriptors.

Definition of the Applicability Domain of the Model
Interpretation of Descriptors
Data Set
Descriptor Generation and Reduction
Descriptor Selection and Model Building
Theory of SVM
Validation Test
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.