Clustering is an unsupervised classification technique used to form groups of unlabeled data sets called clusters. K-means algorithm is a popular clustering algorithm in which random cluster centers are initially taken. The cluster centers are randomly picked in most of the clustering techniques for which the results obtained by these techniques might be compromised. In clustering techniques, the cluster centers are recalculated iteratively unless convergence is achieved, which once again may compromise the accuracy of the results. In all these iterations, the data elements continue to switch to the neighboring clusters, which may add a bias to the clustering results. Thus, a new clustering technique known as “Clustering through correlation and congruence modulo (CCCM),”is developed based on the correlation reward (reinforcement factor) and the congruence modulo operator. In the CCCM technique, cluster centroids are fixed and selected in the first iteration by arranging all the involved variables in order of importance that is calculated by using spearman ranked correlation analysis. After arranging these variables, the congruence modulo is used to convert these variables into equally sized buckets. The correlation values for the elements placed in these buckets are again calculated and the difference is reinforced by bucket rearrangement. When the initial cluster centers are selected, the points are placed in clusters (data instances) like the conventional K-means clustering algorithm. This newly developed algorithm is tested on energy data from 40 countries and each country has 16 energy parameters collected from the online sources over a period of ten years. The proposed technique produced more accurate clusters in less time (achieved accuracy and efficiency) as compared to the K-mean algorithm.
Read full abstract