Following basic principles of information-theoretic learning, in this paper, we propose a novel approach to data clustering, referred to as minimal entropy encoding (MEE), which is based on a set of functions (features) projecting each input onto a minimum entropy configuration (code). Inspired by traditional parsimony principles, we seek solutions in reproducing kernel Hilbert spaces and then we prove that the encoding functions are expressed in terms of kernel expansion. In order to avoid trivial solutions, the developed features must be as different as possible by means of a soft constraint on the empirical estimation of the entropy associated with the encoding functions. This leads to an unconstrained optimization problem that can be efficiently solved by conjugate gradient. We also investigate an optimization strategy based on concave-convex algorithms. The relationships with maximum margin clustering are studied, showing that MEE overcomes some of its critical issues, such as the lack of a multiclass extension and the need to face problems with a large number of constraints. A massive evaluation on several benchmarks of the proposed approach shows improvements over state-of-the-art techniques, both in terms of accuracy and computational complexity.
Read full abstract