Abstract

In clustering, finding the optimal number of clusters is usually one of the most crucial steps in the whole partitioning process. The decision about the optimal number of clusters, however, is not easy to make. In addition, the term ”optimal” is rather vague. In general, determining the optimal number of clusters is directly dependent on the method used to measure similarities and the parameter selection of the partition method. Moreover, certain inherent characteristics of the datasets, such as clusters that overlap with each other or clusters that contain subclusters, may, most often, increase the task’s level of difficulty. Given the above, in order to tackle the problem of estimating such an optimal in each distinct clustering case, different kind of indicators have over the years been proposed. In this study, a large number of such indicators, called validity indices, based on the approach of the so-called relative criteria, are examined comparatively. Specifically, a total of 26 validity indices are examined in two separate study cases: one in real-world and one in artificially generated data. Every index is utilized under the schemes of 9 different clustering methods which incorporate a total of 5 different distance metrics. The results are presented in various explanatory forms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.