Abstract

A back-propagation artificial neural network (ANN) was used to create a 10-fold leave-10%-out cross-validated ensemble model of high performance liquid chromatography retention index (HPLC-RI) for a data set of 498 diverse druglike compounds. A 10-fold multiple linear regression (MLR) ensemble model of the same data was developed for comparison. Molecular structure was described using IGroup E-state indices, a novel set of structure-information representation (SIR) descriptors, along with molecular connectivity chi and kappa indices and other SIR descriptors previously reported. The same input descriptors were used to develop models by both learning algorithms. The MLR model yielded marginally acceptable statistics with training correlation r(2) = 0.65, mean absolute error (MAE) = 83 RI units. External validation of 104 compounds not used for model development yielded validation v(2) = 0.49 and MAE = 73 RI units. The distribution of residuals for the fit and validate data sets suggest a nonlinear relationship between retention index and molecular structure as described by the SIR indices. Not surprisingly, the ANN model was significantly more accurate for both training and validation with training set r(2) = 0.93, MAE = 30 RI units and validation v(2) = 0.84, MAE = 41 RI units. For the ANN model, a total of 91% of validation predictions were within 100 RI units of the experimental value.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.