Abstract

The importance of small molecule retention time in detecting illegal additives cannot be overstated. In order to provide a basis for identification in small-scale databases lacking corresponding chemical standards, a quantitative structure-retention relationship (QSRR) model was developed using two liquid chromatography databases under the same column but with different mobile phases. The QSRR model workflow includes feature extraction, feature selection, model construction and evaluation. Molecular descriptors were calculated by molecules optimized using Consistent force field (CFF) and Chemistry at HARvard Macromolecular Mechanicets (CHARMm) force field during the feature extraction stage. Two selection algorithms, minimum redundancy-maximum relevance (MRMR) and F-test, were compared during the feature selection stage. Five categories of machine learning algorithms, Regression Trees (Reg-T), Support Vector Machines (SVM), Gaussian Process Regression Models (GPR), Ensembles of Trees, and Kernel approximation models, were compared during the model construction phase resulting in 14 models being obtained. The best-performing model was found to be the Exponential Gaussian Process Regression Model (E-GPR). This was tested on Database 1 with coefficient of determination (R2)= 0.84 ± 0.02 and on Database 2 with R2 = 0.83 ± 0.03. The results indicate that this QSRR model can accurately predict retention times for small molecules within small-scale databases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call