The clas program for classification and evaluation

Jan B Hemel

doi:10.1016/s0003-2670(00)86295-9

Abstract

Multivariate classification methods are needed to assist in extracting information from analytical data. The most appropriate method for each problem must be chosen. The applicability of a method mainly depends on the distributional characteristics of the data population (normality, correlations between variables, separation of classes, nature of variables) and on the characteristics of the data sample available (numbers of objects, variables and classes, missing values, measurement errors). The CLAS program is designed to combine classification methods with evaluation of their performance, for batch data processing. It incorporates two-group linear discriminant analysis (SLDA), independent class modelling with principal components (SIMCA), kernel density estimation (ALLOC), and principal component class modelling with kernel density estimation (CLASSY). Most of these methods are implemented so as to give probabilistic classifications. Multiple linear regression is provided for, and other methods are scheduled. CLAS evaluates the classification method using the training set data (resubstitution), independent test data, and pseudo test data (leave-one-out method). This last method is optimized for faster computation. Criteria for classification performance and reliability of the given probabilities, etc. are determined. The package contains flexible possibilities for data manipulation, variable transformation and missing data handling.

Full Text