Evaluation of Clustering Results in the Aspect of Information Theory

Peter Grabusts

doi:10.1109/itms51158.2020.9259227

Abstract

Clustering is the process of finding possible groups in a given set, taking into account the signs of similarity or difference between the elements of this set. In cluster analysis, it is necessary to classify the data in some way or to find regularities in given data, therefore the concept of “regularity” is gaining more and more importance in the context of intelligent data processing systems. It is necessary to find out how the data are related to each other, what is the similarity or difference of different data, what is the measure of comparison of these data. For such purposes, various clustering algorithms can be used, which divide the data into groups according to certain criteria - metrics. Metric in this context means the distance between the points in the cluster. The existing methods of entropy clustering is an information-theoretical approach to the problem of clustering. In the work the evaluation of clustering quality was performed with the help of clustering quality criteria and in the context of information theory concepts.

Full Text