Advantages of Relative versus Absolute Data for the Development of Quantitative Structure-Activity Relationship Classification Models.

Irene Luque Ruiz,Miguel Ángel Gómez-Nieto

doi:10.1021/acs.jcim.7b00492

Abstract

The appropriate selection of a chemical space represented by the data set, the selection of its chemical data representation, the development of a correct modeling process using a robust and reproducible algorithm, and the performance of an exhaustive training and external validation determine the usability and reproducibility of a quantitative structure-activity relationship (QSAR) classification model. In this paper, we show that the use of relative versus absolute data in the representation of the data sets produces better classification models when the other processes are not modified. Relative data considers a reference frame to measure the chemical characteristics involved in the classification model, refining the data set representation and smoothing the lack of chemical information. Three data sets with different characteristics have been used in this study, and classifications models have been built applying the support vector machine algorithm. For randomly selected training and test sets, values of accuracy and area under the receiver operating characteristic curve close to 100% have been obtained for the generation of the models and external validations in all cases.

Full Text