Classification and Clustering Methods

T. Agami Reddy

doi:10.1007/978-1-4419-9613-8_8

Abstract

This chapter covers two widely used classes of multivariate data analysis methods, classification and clustering methods. Classification methods are meant: (i) to statistically distinguish or “discriminate” between differences in two or more groups when one knows beforehand that such groupings exist in the data set of measurements provided, and (ii) subsequently assign or allocate a future unclassified observation into a specific group with the smallest misclassification error. Numerous classification techniques, divided into three groups: parametric, heuristic and regression trees, are described and illustrated by way of examples. Clustering involves situations when the number of clusters or groups is not known beforehand, and the intent is to allocate a set of observation sets into groups which are similar or “close” to one another with respect to certain attribute(s) or characteristic(s). In general, the number of clusters is not predefined and has to be gleaned from the data set. This and the fact that one does not have a training data set to build a model make clustering a much more difficult problem than classification. Two types of clustering techniques, namely partitional and hierarchical, are described. This chapter provides a non-mathematical overview of these numerous techniques with conceptual understanding enhanced by way of simple examples as well as actual case study examples.

Full Text