Characterizing the scale dimension of a high-dimensional classification problem

David J Marchette,Carey E Priebe

doi:10.1016/s0031-3203(02)00042-0

Abstract

Classification of high-dimensional data is inherently difficult. We present an exploratory data analysis methodology for characterizing the scale dimension of a classification problem. The idea is to characterize the support of one distinguished target class as a collection of balls covering the class, with each ball centered at an observation in that class such that the radius is maximal without containing observations from the other classes. The scale dimension is defined to be the number of distinct radii (ball sizes) required to cover the class without covering observations from the other class. A greedy algorithm is used to fit the balls. The balls then provide a description of the support of the target class, with information about the complexity of the classification problem implicit in the number, radii, adjacency and position of the balls. Clustering the balls by radius and pruning the cluster tree yields an estimate of the scale dimension for the problem. We illustrate the methodology with pedagogical simulations and a chemical sensor data analysis application.

Full Text