Abstract

The preclustering algorithm as opposed to other existed algorithms does not require a priori information about cluster location and about additional means of control. Preclustering algorithm is multipurpose and promising for a primary analysis of investigated input data. In this article the main part of the preclustering algorithm – the modified decision rule has been presented. The modification consisted to the replacement of the calculation of mean distances in a precluster (like in the classical decision rule) by the mean distances from the center of the precluster to all objects in the chosen precluster. The proposed decision rule determines the centre of the group as a local density maximum of the group of objects (before clustering) or of the precluster (after clustering). The results obtained during the testing of the decision rule were compared with the results obtained with the use of criteria of spherical resolution. Also, from the analysis, the advantages and disadvantages of the proposed decision rule have been identified.

Highlights

  • Clustering analysis or clustering is a process of dividing a set of data objects into two or more subsets in such a way that objects in one subset are characterized by a high degree of similarity, but differ from objects in other clusters

  • The most known preclustering algorithms require a user setting of certain input parameters, one of the examples is a canopy clustering algorithm, presented by [3]. It is often used for the preliminary analysis of input data or for primary clusterization for the k-means algorithm or hierarchical clustering algorithm. The aim of this method is finding the approximate number of the clusters, which make up the input information for other clustering algorithms

  • The preclustering algorithm proposes the possibilities of “artificial intelligence”, that is the determination of the number of clusters in the input data set without a priori information about input data and without additional means of checking

Read more

Summary

Introduction

Clustering analysis or clustering is a process of dividing a set of data objects into two or more subsets in such a way that objects in one subset (cluster) are characterized by a high degree of similarity, but differ from objects in other clusters. The concept and application of clustering is quite wide, they have been described repeatedly in various literature sources. It seems reasonable to omit well-known features of cluster analysis, its application in different fields of science and technology [1] and the description of popular clustering algorithms [2], and focus on a preclustering algorithm

Analysis of published data and problem statement
Purpose and objectives of the study
Modified decision rule
Choosing the centre of the cluster
Experimental results obtained by the modified decision rule
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call