Abstract
A variety of algorithms have recently emerged in the field of cluster analysis. Consequently, based on the distribution nature of the data, an appropriate algorithm can be chosen for the purpose of clustering. It is difficult for a user to decide a priori which algorithm would be the most appropriate for a given dataset. Algorithms based on graphs provide good results for this task. However, these algorithms are vulnerable to outliers with limited information about edges contained in the tree to split a dataset. Thus, in several fields, the need for better clustering algorithms increases and for this reason utilizing robust and dynamic algorithms to improve and simplify the whole process of data clustering has become an urgent need. In this paper, we propose a novel distance-based clustering algorithm called the critical distance clustering algorithm. This algorithm depends on the Euclidean distance between data points and some basic mathematical statistics operations. The algorithm is simple, robust, and flexible; it works with quantitative data that are real-valued, not qualitative, and categorical with different dimensions. In this work, 26 experiments are conducted using different types of real and synthetic datasets taken from different fields. The results prove that the new algorithm outperforms some popular clustering algorithms such as MST-based clustering, K-means, and Dbscan. Moreover, the algorithm can precisely produce more reasonable clusters even when the dataset contains outliers and without specifying any parameters in advance. It also provides a number of indicators to evaluate the established clusters and prove the validity of the clustering.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.