An investigation was made into the use of linear and quadratic discriminant analysis, along with K nearest-neighbor analysis, in the classification of a set of 51 compounds which were divided into five therapeutic categories. By superimposing each compound on a pattern structure, as first proposed by Cammarata, eight positions were assigned on the molecule. Each position was coded with the numerical value of a descriptor index. Relative molar refraction, which was the index used by Cammarata, was compared with a number of molecular connective indices. For each of the indices studied, it was found that only four of the eight positions contributed significantly to between-class differences. It was also found that first-order molecular connectivity, calculated as the sum of the contributions of each of the bonds joining a given position, resulted in consistently fewer misclassifications as compared with the other indices. Using first-order molecular connectivity, validation procedures were performed on the original set of compounds, on random samples drawn from this set, and on a set of ten compounds not included in the analysis. The results obtained were highly data dependent, but they, nevertheless, suggest that molecular connectivity indices should prove useful in structural classification procedures.
Read full abstract