Abstract

Clustering multi-dense large scale high dimensional datasets are a challenging task duo to scalability limits of most of clustering algorithms. Nowadays, data collection tools produce large amounts of data. So, fast and scalable algorithms are vital requirement for clustering such data. In this paper, a fast and scalable algorithm called dimension-based partitioning and merging clustering (DPM) is proposed. In DPM, data is partitioned into small dense volumes while processing each dimension values range. Next, noise are filtered out using dimensional densities of the generated partitions. At last, merging process in invoked to construct clusters based on partitions boundary data samples. DPM algorithm detects automatically the number of data clusters based on three insensitive tuning parameters which decrease the burden of its usage. Performance evaluation on different datasets proves the extreme fastness and scalability of the proposed algorithm along with clustering accuracy compared to other large scale clustering competitors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call