Abstract

As a branch of statistics, cluster analysis has been extensively studied and widely used in many applications. Cluster analysis has recently become a highly active topic in data mining research. As a data mining function, cluster analysis can be used as a standalone tool to gain insight into the distribution of data, to observe the characteristics of each cluster. Alternatively, it may serve as a preprocessing step for other algorithms. In fact, clustering is known as unsupervised learning because the class label information is not presented. For this reason, clustering is a form of learning by observation, rather than learning by examples. It means that we always do not know whether the clustering partition is good. In general, the clustering partition need to be evaluated within quality and effectiveness. In this paper, a new evaluation algorithm is proposed which based on information entropy, to evaluate the quality of clustering. In order to improve the evaluation results, it takes full advantage of the marked data and other information, and constrains the number of clusters to enhance the credibility of the process. The membership degree is defined according to the distance of tuples. This method broadens the original information entropy method application on non-convex data set, while the convex data set has a good application. It is validated by taking an experiment on data set R15 and data set Jain, demonstrating the effectiveness of the different types of data sets. For evaluating the quality of clustering, a new evaluation algorithm based on information entropy is proposed. This method broadens the original information entropy method application on non-convex data set, while the convex data set has a good application. In order to improve the evaluation results, it takes full advantage of the marked data and other information, and constrains the number of clusters to enhance the credibility of the process. It is validated by taking an experiment on data set R15 and data set Jain, demonstrating the effectiveness of the different types of data sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call