Spectral embedding methods have played a very important role in dimensionality reduction and feature generation in machine learning. Supervised spectral embedding methods additionally improve the classification of labeled data, using proximity information that considers both features and class labels. However, these calculate the proximity information by treating all intraclass similarities homogeneously for all classes, and similarly for all interclass samples. In this paper, we propose a very novel and generic method which can treat all the intra- and interclass sample similarities heterogeneously by potentially using a different proximity function for each class and each class pair. To handle the complexity of selecting these functions, we employ evolutionary programming as an automated powerful formula induction engine. In addition, for computational efficiency and expressive power, we use a compact matrix tree representation equipped with a broad set of functions that can build most currently used similarity functions as well as new ones. Model selection is data driven, because the entire model is symbolically instantiated using only problem training data, and no user-selected functions or parameters are required. We perform thorough comparative experimentations with multiple classification datasets and many existing state-of-the-art embedding methods, which show that the proposed algorithm is very competitive in terms of classification accuracy and generalization ability.
Read full abstract