Abstract

Abstract Similarity metrics, kernels and similarity-based algorithms have gained much attention due to their increasing applications in information retrieval, data mining, pattern recognition and machine learning. Similarity Graphs are often adopted as the underlying representation of similarity matrices and are at the origin of known clustering algorithms such as spectral clustering. Similarity matrices offer the advantage of working in object–object (two-dimensional) space where visualization of clusters similarities is available instead of object-features (multi-dimensional) space. In this paper, sparse ϵ -similarity graphs are constructed and decomposed into strong components using appropriate methods such as Dulmage–Mendelsohn permutation (DMperm) and/or Reverse Cuthill–McKee (RCM) algorithms. The obtained strong components correspond to groups (clusters) in the input (feature) space. Parameter ϵ i is estimated locally, at each data point i from a corresponding narrow range of the number of nearest neighbors. Although more advanced clustering techniques are available, our method has the advantages of simplicity, better complexity and direct visualization of the clusters similarities in a two-dimensional space. Also, no prior information about the number of clusters is needed. We conducted our experiments on two and three dimensional, low and high-sized synthetic datasets as well as on an astronomical real-dataset. The results are verified graphically and analyzed using gap statistics over a range of neighbors to verify the robustness of the algorithm and the stability of the results. Combining the proposed algorithm with gap statistics provides a promising tool for solving clustering problems. An astronomical application is conducted for confirming the existence of 45 galaxy clusters around the X-ray positions of galaxy clusters in the redshift range [0.1..0.8]. We re-estimate the photometric redshifts of the identified galaxy clusters and obtain acceptable values compared to published spectroscopic redshifts with a 0.029 standard deviation of their differences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call