Abstract

Cluster tendency assessment in big data poses a challenge, particularly for non-compact separated (non-CS) datasets with irregular boundaries. This paper introduces a novel Spectral-Based Visual Technique (SVT) to address this limitation. Determining the similarity features for the data objects is a crucial computation in data clustering. Distance measures such as Euclidean and cosine are widely employed in clustering applications. By pre-determining cluster tendency, the quality of clusters is obtained using the algorithms of Visual Assessment of Cluster Tendency (VAT) and cosine-based VAT (cVAT). Both VAT and cVAT utilize Euclidean and cosine distance measures to identify the similarity features of objects. For extensive data cluster tendency assessment, an extended concept of VAT, Clustering using Improved Visual Assessment of Tendency (ClusiVAT), is employed to derive clusters with scalable amounts of time and memory loads. However, it operates efficiently for Compactly Separated (CS) datasets. The research gap lies in the need to deliver the quality of big data partitions (or clusters) for non-compact separated (non-CS) datasets. Thus, this paper proposes a spectral-based visual cluster tendency technique to address the challenge of significant data clustering for non-CS datasets. Experimental analysis employs benchmarked datasets to illustrate the performance of the proposed work compared to other techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call