Optimization in Symbolic Data Analysis: Dissimilarities, Class Centers, and Clustering

Hans-Hermann Bock

doi:10.1007/3-540-28397-8_1

Abstract

’symbolic Data Analysis’ (SDA) provides tools for analyzing ’symbolic’ data, i.e., data matrices X = (xkj) where the entries xkj are intervals, sets of categories, or frequency distributions instead of ‘single values’ (a real number, a category) as in the classical case. There exists a large number of empirical algorithms that generalize classical data analysis methods (PCA, clustering, factor analysis, etc.) to the ‘symbolic’ case. In this context, various optimization problems are formulated (optimum class centers, optimum clustering, optimum scaling,…). This paper presents some cases related to dissimilarities and class centers where explicit solutions are possible. We can integrate these results in the context of an appropriate κ-means clustering algorithm. Moreover, and as a first step to probabilistically based results in SDA, we consider the definition and determination of set-valued class ‘centers’ in SDA and relate them to theorems on the ‘approximation of distributions by sets’.

Full Text