Abstract

Clustering is an essential data mining and tool for investigating big data. There are difficulties in applying clustering techniques to big data due to new drawbacks which are elevated with big data. As Big Data is referring to terabytes and petabytes of data and basically the clustering algorithms use great computational costs, here have to consider that the question is how to cope with this problem and how to deploy clustering methods to big data and acquire the outcomes in a reasonable time. Clustering is an essential analysis area in the data processing. In several decades, k-means lingers the most popular clustering algorithm because of its simplicity. Recently, as data volume continues to raise called large data (Big Data); many researchers address various clustering algorithms for big data to get high performance. This chapter proposes the distributed-parallel particle swarm optimization with k-means (D-PPSOK) clustering algorithm with data sampling on large data sets for getting the clusters with less computational steps. According to the experimental results, the proposed D-PPSOK with data sampling on large-scale data set gives high performance compared to the existing algorithms.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.