A Clustering Method Based on the Maximum Entropy Principle

Edwin Aldana-Bobadilla,Angel Kuri-Morales

doi:10.3390/e17010151

Abstract

Clustering is an unsupervised process to determine which unlabeled objects in a set share interesting properties. The objects are grouped into k subsets (clusters) whose elements optimize a proximity measure. Methods based on information theory have proven to be feasible alternatives. They are based on the assumption that a cluster is one subset with the minimal possible degree of “disorder”. They attempt to minimize the entropy of each cluster. We propose a clustering method based on the maximum entropy principle. Such a method explores the space of all possible probability distributions of the data to find one that maximizes the entropy subject to extra conditions based on prior information about the clusters. The prior information is based on the assumption that the elements of a cluster are “similar” to each other in accordance with some statistical measure. As a consequence of such a principle, those distributions of high entropy that satisfy the conditions are favored over others. Searching the space to find the optimal distribution of object in the clusters represents a hard combinatorial problem, which disallows the use of traditional optimization techniques. Genetic algorithms are a good alternative to solve this problem. We benchmark our method relative to the best theoretical performance, which is given by the Bayes classifier when data are normally distributed, and a multilayer perceptron network, which offers the best practical performance when data are not normal. In general, a supervised classification method will outperform a non-supervised one, since, in the first case, the elements of the classes are known a priori. In what follows, we show that our method’s effectiveness is comparable to a supervised one. This clearly exhibits the superiority of our method.

Highlights

Pattern recognition is a scientific discipline whose methods allow us to describe and classify objects.The descriptive process involves the symbolic representation of these objects through a numerical vector ~x:~x = [x1, x2, . . . xn ] ∈
We propose a numerical clustering method that lies in the group of meta-heuristic clustering methods, where the optimization criterion is based on information theory
We rely on the conclusions of previous analyses [48,49], which showed that a breed of genetic algorithms (GAs), called the eclectic genetic algorithm (EGA), achieves the best relative performance

Summary

Introduction

Pattern recognition is a scientific discipline whose methods allow us to describe and classify objects. Given a set of objects (“dataset”) X, there are two approaches to attempt the classification: (1) supervised; and (2) unsupervised. In the unsupervised approach case, no prior class information is used. Such an approach aims at finding a hypothesis about the structure of X based only on the similarity relationships among its elements. These relationships allow us to divide the space of X into k subsets, called clusters. The process to find the appropriate clusters is typically denoted as a clustering method

Clustering Methods

Determining the Optimal Value of k

Evaluating the Clustering Process

Finding the Best Partition of X

Choosing the Meta-Heuristic

Related Works

Organization of the Paper

Maximum Entropy Principle and Clustering

Solving the Problem through EGA

Encoding a Clustering Solution

Finding The Probability Distribution of the Elements of a Cluster

Determining the Parameters of CBE

Datasets

Synthetic Dataset

Methodology to Gauge the Effectiveness of a Clustering Method

Determining the Effectiveness Using Synthetic Gaussian Datasets

Determining the Effectiveness of Using Synthetic Non-Gaussian Datasets

Determining the Statistical Significance of the Effectiveness

Results

Synthetic Gaussian Datasets

Synthetic Non-Gaussian Datasets

Conclusions

Eclectic Genetic Algorithm

Result

Ensuring Normality in an Experimental Distribution

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: Jan 7, 2015
Citations: 82	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Clustering Method Based on the Maximum Entropy Principle

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Resolution of Maximum Entropy Method-Derived Posterior Conformational Ensembles of a Flexible System Probed by FRET and Molecular Dynamics Simulations
Jonas Dittrich ... Karl-Erich Jaeger
Journal of Chemical Theory and Computation | VOL. 19
Jonas Dittrich, et. al.Jonas Dittrich ... Karl-Erich Jaeger
06 Apr 2023
Journal of Chemical Theory and Computation | VOL. 19

Experimental Study on Error Functions for Multilayer Perceptron Neural Network Architecture Selection with Average Weighted F-Score Evaluation
Guo-Li Ye ... Bin-Bin Sun
-
Guo-Li Ye, et. al.Guo-Li Ye ... Bin-Bin Sun
01 Jan 2007
01 Jan 2007

A maximum entropy method when prior information consists of inexact constraints
R Lieu ... R B Hicks
The Astrophysical Journal | VOL. 422
R Lieu, et. al.R Lieu ... R B Hicks
01 Feb 1994
The Astrophysical Journal | VOL. 422

On estimation error using maximum entropy density estimates
J.P Noonan ... Prabahan Basu
Kybernetes | VOL. 36
J.P Noonan, et. al.J.P Noonan ... Prabahan Basu
20 Feb 2007
Kybernetes | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Clustering Method Based on the Maximum Entropy Principle

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy