Abstract

The K-means algorithm is one of the most popular clustering algorithms. However, it is sensitive to initialised partitions and circular dataset. To address this problem, this paper introduces a CK-means clustering algorithm based on the K-means algorithm and the Canopy algorithm, which uses the MapReduce programming model of Hadoop platform. The experimental results prove that the CK-means algorithm has strong advantages for processing large datasets. The theoretical analysis shows that the CK-means algorithm and the traditional algorithm are of the same order of magnitude. The experimental results on artificial data show that the improved algorithm is better than the traditional algorithm in terms of acceleration ratio, accuracy and expansion rate. An experiment on real data is performed to obtain appropriate parameters.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.