Abstract

AbstractIn recent years, graph‐based data clustering algorithms have become popular as they perform connectivity‐based rather than centroid‐based partitioning. Methods related to minimum spanning tree (MST)‐based data clustering are types of graph‐based algorithms that can recognize arbitrary shapes of clusters by eliminating inconsistent edges from MST graphs. In all MST‐based data clustering algorithms, definition of inconsistent edges is the main problem that needs to be addressed. The longest edges in MST graphs are considered as inconsistent edges under ideal conditions. Nevertheless, outliers often exist in real‐world tasks, which makes the longest edges inaccurate cluster separation indicators. In this paper, we propose a new data clustering algorithm using MST and a critical distance method. The proposed algorithm solves the main issue of MST‐based data clustering, namely identifying and removing inconsistent edges to obtain clusters even in the event that the dataset contains some outliers. It begins by constructing the MST over a given weighted graph based on Euclidean distance and then splits up the graph into clusters by eliminating inconsistent edges using critical distance as a threshold. Integration of the advantages of both MST and critical distance methodology to obtain optimal clusters is the main contribution of this work. The conducted experimental analysis and results using different datasets prove that our proposed clustering algorithm yields better overall performance compared with the most common data clustering algorithms. Taking the Liver and Tumor datasets as an example, the proposed algorithm outperforms all other clustering algorithms with clustering accuracy equal to 0.579 and 0.660, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call