Abstract

’symbolic Data Analysis’ (SDA) provides tools for analyzing ’symbolic’ data, i.e., data matrices X = (xkj) where the entries xkj are intervals, sets of categories, or frequency distributions instead of ‘single values’ (a real number, a category) as in the classical case. There exists a large number of empirical algorithms that generalize classical data analysis methods (PCA, clustering, factor analysis, etc.) to the ‘symbolic’ case. In this context, various optimization problems are formulated (optimum class centers, optimum clustering, optimum scaling,…). This paper presents some cases related to dissimilarities and class centers where explicit solutions are possible. We can integrate these results in the context of an appropriate κ-means clustering algorithm. Moreover, and as a first step to probabilistically based results in SDA, we consider the definition and determination of set-valued class ‘centers’ in SDA and relate them to theorems on the ‘approximation of distributions by sets’.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call