In the present study, computational molecular descriptors of 90 saturated esters and seven poly(siloxane) stationary phases with different polarity (SE-30, OV-7, DC-710, OV-25, XE-60, OV-225 and Silar-5CP) were combined into quantitative structure-retention relationship (QSRR) models aimed at predicting the Kováts retention indices (RIs) of the solutes. The molecular descriptors (174) of the stationary phases included in the models were computed using Dragon software from poly(siloxane) oligomers made of 20 siloxane units reflecting the nominal composition of the stationary phase, whereas 439 molecular descriptors were adopted to represent the esters. Different QSRR models were generated by means of Partial Least Squares (PLS) regression to assess the accuracy of this approach in predicting the RIs of unexplored solutes both in known and external stationary phases. After calibration of each PLS model, the descriptors were selected/discarded according to their relevance, evaluated by Covariance Selection (CovSel), and the PLS models were re-built, which resulted in a noticeable improvement of their predictive ability. Firstly, all the available data were equally divided into a training and a test set; the model built on the calibration set was used to predict the RIs of the validation observations. Successively, seven diverse PLS models were created following a “leave-one-column-out” fashion procedure, each one finalized to the estimation of the RIs of the 90 esters associated with a single stationary phase, whereas the calibration model was calculated on the remaining data. All the estimated models provided successful results on the external stationary phase, and predictive performance further increased after variable selection based on CovSel analysis. The final models provided a Root Mean Square Error in Cross Validation (RMSECV) in the range 12–20, a Root Mean Square Error in Prediction (RMSEP) in the range 11–26, and Mean Absolute Percentage Errors in Prediction (MAMEPs) in the range 0.7–1.5, revealing accurate cross-column prediction. Eventually, to test the robustness of the proposed approach, the 90 solutes were equally partitioned into a calibration and a test set and two further QSSR strategies were applied. The first PLS model was calibrated on all the seven stationary phases and the RIs of the 45 external solutes in the same seven columns were simultaneously predicted. The last QSRR approach followed a “leave-one-column-out” scheme and RI of 45 test solutes on an external stationary phase was predicted by a PLS model calibrated with the data of the 45 remaining solutes and the six left stationary phases. After selection of the significant molecular descriptors, PLS regression provided RMSECV values in the range 6–19, RMSEPs in the range 10–14, and MAPEPs in the range 0.9–2.4, revealing the suitability of the approach to deduce the RI of unknown solutes in uncharted stationary phases.
Read full abstract