Abstract
Machine learning has become a core part of computing and has affected countless sectors with better implementations of existing systems. Machine learning algorithms use various methods to organize and learn from data and Clustering is one such method. Clustering as the name suggests, forms different clusters of data from the dataset based on the characteristics. However, clustering datasets could be onerous and might become worse when the number of clusters or if the number of data points is increased. Parallelizing the algorithms is one way by which the time taken can be reduced. Clustering algorithms can be parallelized by optimizing the algorithm to make use of multiple CPUs or multiple cores of a single CPU by sharing the workload. This paper focuses on the performance analysis of parallelized clustering algorithms and other mainstream clustering algorithms. DBSCAN (Density-Based Spatial Clustering of Applications with Noise), K-Means, Mini-Batch K-Means, Mean Shift are the chosen algorithms from different types of clustering to diversify the comparison. This paper will provide a comparative analysis of the performance between the different clustering algorithms by controlling the environment to either be single or multi-threaded.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.