2 - Theoretical Aspects of Pattern Analysis

Arjen Van Ooyen

doi:10.1016/b978-044450740-2/50003-4

Abstract

In this chapter, simple examples of both principal component analysis and cluster analysis are given to explain the ideas behind the methods. Principal component analysis studies large data sets by reducing the number of characters. This is achieved by forming new characters that are combinations of the old ones. Cluster analysis is a procedure that starts with a data set containing information about a set of objects and then attempts to organize these objects into groups that are in some sense optimal for the data set under consideration. Cluster analysis can be used for a variety of goals, including developing typologies or classifications, generating concepts or hypotheses through data exploration, and testing whether typologies or classifications generated by other procedures, or by using other data, are present in the data set under consideration. Cluster analysis can best be seen as a heuristic, rather than a statistical, method for exploring the diversity in a data set by means of pattern generation. The result of a cluster analysis study can, and usually does, depend on the similarity measure used, the clustering method used, the set of objects in the study, the characters used to describe the objects, and the relative weight different characters are given in calculating the similarity between objects.

Full Text