Abstract
The preclustering algorithm as opposed to other existed algorithms does not require a priori information about cluster location and about additional means of control. Preclustering algorithm is multipurpose and promising for a primary analysis of investigated input data. In this article the main part of the preclustering algorithm – the modified decision rule has been presented. The modification consisted to the replacement of the calculation of mean distances in a precluster (like in the classical decision rule) by the mean distances from the center of the precluster to all objects in the chosen precluster. The proposed decision rule determines the centre of the group as a local density maximum of the group of objects (before clustering) or of the precluster (after clustering). The results obtained during the testing of the decision rule were compared with the results obtained with the use of criteria of spherical resolution. Also, from the analysis, the advantages and disadvantages of the proposed decision rule have been identified.
Highlights
Clustering analysis or clustering is a process of dividing a set of data objects into two or more subsets in such a way that objects in one subset are characterized by a high degree of similarity, but differ from objects in other clusters
The most known preclustering algorithms require a user setting of certain input parameters, one of the examples is a canopy clustering algorithm, presented by [3]. It is often used for the preliminary analysis of input data or for primary clusterization for the k-means algorithm or hierarchical clustering algorithm. The aim of this method is finding the approximate number of the clusters, which make up the input information for other clustering algorithms
The preclustering algorithm proposes the possibilities of “artificial intelligence”, that is the determination of the number of clusters in the input data set without a priori information about input data and without additional means of checking
Summary
Clustering analysis or clustering is a process of dividing a set of data objects into two or more subsets in such a way that objects in one subset (cluster) are characterized by a high degree of similarity, but differ from objects in other clusters. The concept and application of clustering is quite wide, they have been described repeatedly in various literature sources. It seems reasonable to omit well-known features of cluster analysis, its application in different fields of science and technology [1] and the description of popular clustering algorithms [2], and focus on a preclustering algorithm
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have