Abstract

We propose unsupervised representation learning and feature extraction from dendrograms. The commonly used Minimax distance measures correspond to building a dendrogram with single linkage criterion, with defining specific forms of a level function and a distance function over that. Therefore, we extend this method to arbitrary dendrograms. We develop a generalized framework wherein different distance measures and representations can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, to address the model selection problem, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep representations. In the first approach, for example for the clustering problem, we build a graph with positive and negative edge weights according to the consistency of the clustering labels of different objects among different solutions, in the context of ensemble methods. Then, we use an efficient variant of correlation clustering to produce the final clusters. In the second approach, we investigate the combination of different distances and features sequentially in the spirit of multi-layered architectures to obtain the final features. Finally, we demonstrate the effectiveness of our approach via several numerical studies.

Highlights

  • Real-world datasets often consist of complex and a priori unknown patterns and structures, requiring to improve the basic representation

  • We investigate inferring pairwise distances from a dendrogram computed according to an arbitrary criterion, i.e., beyond single linkage criterion

  • We investigate the different feature extraction methods with three different clustering algorithms

Read more

Summary

Introduction

Real-world datasets often consist of complex and a priori unknown patterns and structures, requiring to improve the basic representation. Kernel methods are commonly used for this purpose (Hofmann et al 2008; Shawe-Taylor and Cristianini 2004) Their applicability is confined by several limitations (von Luxburg 2007; Nadler and Galun 2007; Chehreghani 2017b). (2) The proper values of the parameters usually occur inside a very narrow range that makes cross-validation critical, even in presence of labeled data To overcome such challenges, some graph-based distance measures have been developed in the context of algorithmic graph-theory. The final distance is obtained by summing up the path-specific distances of all paths between the two nodes This distance measure can be obtained by inverting the Laplacian of the base distance matrix related to Markov diffusion kernel (Fouss et al 2012; Yen et al 2008). It requires an O(n3) runtime, with n the number of objects

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call