Comparing representative selection strategies for dissimilarity representations

Zane Reynolds,Abraham Kandel,Horst Bunke,Mark Last

doi:10.1002/int.20180

Abstract

Many of the computational intelligence techniques currently used do not scale well in data type or computational performance, so selecting the right dimensionality reduction technique for the data is essential. By employing a dimensionality reduction technique called representative dissimilarity to create an embedded space, large spaces of complex patterns can be simplified to a fixed-dimensional Euclidean space of points. The only current suggestions as to how the representatives should be selected are principal component analysis, projection pursuit, and factor analysis. Several alternative representative strategies are proposed and empirically evaluated on a set of term vectors constructed from HTML documents. The results indicate that using a representative dissimilarity representation with at least 50 representatives can achieve a significant increase in classification speed, with a minimal sacrifice in accuracy, and when the representatives are selected randomly, the time required to create the embedded space is significantly reduced, also with a small penalty in accuracy. © 2006 Wiley Periodicals, Inc. Int J Int Syst 21: 1093–1109, 2006.

Full Text