Comparative study of class data analysis with PCA-LDA, SIMCA, PLS, ANNs, and k-NN

Yukio Tominaga

doi:10.1016/s0169-7439(99)00034-9

Yukio Tominaga

https://doi.org/10.1016/s0169-7439(99)00034-9

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Three types of chemotherapeutic agents, antibacterials, antineoplastics, and antifungals, which are registered in the MDL drug data report (MDDR) database, were used as training data set, and the classification study was performed using the following seven methods: principal component analysis–linear discriminant analysis (PCA-LDA), soft independent modeling by class analogy (SIMCA), partial least-squares2 (PLS2), artificial neural networks (ANNs), nearest neighbor method (NN), combined method of Ward clustering and NN (W-NN), and combined method of genetic algorithms (GAs) and NN (GA-NN). The number of correctly classified samples for each method was decreased by the following order: NN, ANNs, GA-NN, SIMCA, PLS2, W-NN, and PCA-LDA. Using these models, prediction study was then performed for the test set which consists of the drugs registered in the comprehensive medicinal chemistry (CMC) database. The number of correctly predicted samples for each method was decreased by the following order: NN, GA-NN, W-NN, SIMCA, PCA-LDA, ANNs, and PLS2. NN gave the best model from view points of the classification and prediction while overfitting was observed in ANNs and PLS2. Although the fitness and predictiveness of GA-NN and W-NN were inferior to those of NN, the predictiveness of the two methods were superior to PCA-LDA, SIMCA, ANNs, and PLS2.

Full Text