Abstract

Clustering is one of the most important unsupervised approaches to organizing data in many practical applications. The application of different methods to clustering one dataset generally results in different clusterings of that dataset; for that reason, there is a need for evaluating the similarities of various clustering methods. This can be accomplished with the help of external validity indices. Although many external validity measures exist in the literature, most of them suffer from some drawbacks: they apply very strong null hypotheses, which lead to the independence of the clusterings, fixed number of clusters, and fixed cluster sizes. In addition, an acceptable measure should have some properties to be a metric. If a measure does not qualify for being a metric, the interpretation will not be easy. In this study, a novel Set Matching Index (SMI) is proposed based extended derivation of Jaccard, Dice, and Cosine measures. Thus, three new indices are introduced in this paper, i.e., Set Matching Index based on Extended Jaccard similarity measure (SMI-EJ), Set Matching Index based on Extended Dice similarity measure (SMI-ED) and Set Matching Index based on Extended Cosine similarity measure (SMI-EC) as a set matching metric. All of the metric properties are proved in the present research. For the assessment of the proposed indices performance quality, they are experimented on ten real-world and Synthetic datasets, and then the obtained results are compared with those of four popular external indices (NMI, RI, ARI and Purity) and a new index (PSI). Also to verify the performance of three proposed similarity indices, the experiments develop. We apply the proposed indices to the generated clustering solutions and compare the results. We show that the proposed indices achieve desirable results based on the external validity index properties.Moreover, SMI has simple formula with moderate and appropriate results that can be easily interpreted.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.