Abstract

The limitations in general methods to evaluate clustering will remain difficult to overcome if verifying the clustering validity continues to be based on clustering results and evaluation index values. This study focuses on a clustering process to analyze crisp clustering validity. First, we define the properties that must be satisfied by valid clustering processes and model clustering processes based on program graphs and transition systems. We then recast the analysis of clustering validity as the problem of verifying whether the model of clustering processes satisfies the specified properties with model checking. That is, we try to build a bridge between clustering and model checking. Experiments on several datasets indicate the effectiveness and suitability of our algorithms. Compared with traditional evaluation indices, our formal method can not only indicate whether the clustering results are valid but, in the case the results are invalid, can also detect the objects that have led to the invalidity.

Highlights

  • Clustering analysis attempts to discover distribution patterns of data objects [1]

  • Data objects in crisp clustering are divided into distinct clusters, where each object belongs to exactly one cluster; while the boundaries between objects in fuzzy clustering are un-sharp, so objects may belong to more than one clusters

  • We focus on clustering processes, trying to verify the validity of the clustering results by analyzing whether a clustering process satisfies corresponding properties

Read more

Summary

Introduction

Clustering analysis attempts to discover distribution patterns of data objects [1]. We should confirm whether the clustering reflects the intrinsic character of the data. This evaluation can be called verifying the validity of clustering results. Visualization methods can intuitively reflect the validity of clustering results for two-dimensional data objects. Appropriate visualization tools can describe the data, it is difficult to discern the cluster distribution in high-dimensional space. We need a metric to evaluate the validity of the results of clustering analysis. Clustering analysis can be divided into crisp and fuzzy clustering according to the boundaries between clusters. We will discuss the method of verifying the validity of crisp clustering results in this study

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call