The Cluster Analysis in Intellectual Systems

Mikhail Z Zgurovsky,Yuriy P Zaychenko

doi:10.1007/978-3-319-35162-9_7

Abstract

Term cluster analysis (introduced by Tryon, 1939 for the first time) actually includes a set of various algorithms of classification without teacher [1]. The general question asked by researchers in many areas is how to organize observed data in evident structures, i.e. to develop taxonomy. For example, biologists set the purpose to divide animals into different types that to describe distinctions between them. According to the modern system accepted in biology, the person belongs to primacies, mammals, vertebrate and an animal. Notice that in this classification, the higher is aggregation level, the less is the similarity between members in the corresponding class. The person has more similarity to other primacies (i.e. with monkeys), than with the “remote” members of family of mammals (for example, dogs), etc.The clustering is applied in the most various areas. For example, in the field of medicine the clustering of diseases, treatments of diseases or symptoms of diseases leads to widely used taksonomy. In the field of psychiatry the correct diagnostics of clusters of symptoms, such as paranoia, schizophrenia, etc., is decisive for successful therapy. In archeology by means of the cluster analysis researchers try to make taxonomy of stone tools, funeral objects, etc. Broad applications of the cluster analysis in market researches are well known. Generally, every time when it is necessary to classify “mountains” of information to groups, suitable for further processing, the cluster analysis is very useful and effective. In recent years the cluster analysis is widely used in the intellectual analysis of data (Data Mining), as one of the principal methods [1]. The purpose of this chapter is the consideration of modern methods of the cluster analysis, crisp methods(a method of C-means, Ward’s method, the next neighbor, the most distant neighbor), and fuzzy methods, robust probabilistic and possibilistic clustering methods. In the Sect. 7.2 problem of cluster analysis is formulated, main criteria and metrics are considered and discussed. In the Sect. 7.3 classification of cluster analysis methods is presented, several crisp methods are considered, in particular hard C-means method and Ward’s method. In the Sect. 7.4 fuzzy C-means method is described. In the Sect. 7.5 the methods of initial location of cluster centers are considered: peak and differential grouping and their properties analyzed.

Full Text