The study of chromatographic retention of natural products can be used to increase their identification speed in complex biological matrices. In this work, six variables were used to study the retention behavior in reversed phase liquid chromatography of 39 sesquiterpene lactones (SL) from an in-house database using chemoinformatics tools. To evaluate the retention of the SL, retention parameters on an ODS C-18 column in two different solvent systems were experimentally obtained, namely, MeOH-H2O 55:45 and MeCN-H2O 35:75. The chemoinformatics approach involved three descriptor type sets (one 2D and two 3D) comprising three groups of each (four, five, and six descriptors), two different training and test sets, four algorithms for variable selection (best first, linear forward, greedy stepwise, and genetic algorithm), and two modeling methods (partial least-squares regression and back-propagation artificial neural network). The influence of the six variables used in this study was assessed in a holistic context, and influences on the best model for each solvent system were analyzed. The best set for MeOH-H2O showed acceptable correlation statistics with training R(2) = 0.91, cross-validation Q(2) = 0.88, and external validation P(2) = 0.80, and the best MeCN-H2O model showed much higher correlation statistics with training R(2) = 0.96, cross-validation Q(2) = 0.92, and external validation P(2) = 0.91. Consensus models were built for each chromatographic system, and although all of them showed an improved statistical performance, only one for the MeCN-H2O system was able to separate isomers as well as to improve the performance. The approach described herein can therefore be used to generate reproducible and robust models for QSRR studies of natural products as well as an aid for dereplication of complex biological matrices using plant metabolomics-based techniques.
Read full abstract