Parallel K-Means Algorithm for Shared Memory Multiprocessors

Tayfun Kucukyilmaz

doi:10.4236/jcc.2014.211002

Abstract

Clustering is the task of assigning a set of instances into groups in such a way that is dissimilarity of instances within each group is minimized. Clustering is widely used in several areas such as data mining, pattern recognition, machine learning, image processing, computer vision and etc. K-means is a popular clustering algorithm which partitions instances into a fixed number clusters in an iterative fashion. Although k-means is considered to be a poor clustering algorithm in terms of result quality, due to its simplicity, speed on practical applications, and iterative nature it is selected as one of the top 10 algorithms in data mining [1]. Parallelization of k-means is also studied during the last 2 decades. Most of these work concentrate on shared-nothing architectures. With the advent of current technological advances on GPU technology, implementation of the k-means algorithm on shared memory architectures recently start to attract some attention. However, to the best of our knowledge, no in-depth analysis on the performance of k-means on shared memory multiprocessors is done in the literature. In this work, our aim is to fill this gap by providing theoretical analysis on the performance of k-means algorithm and presenting extensive tests on a shared memory architecture.

Highlights

Clustering is grouping similar objects according to their resemblances
It is widely used in several areas of computer science such as data mining, pattern recognition, image processing, computer vision, and etc
Many parallel implementations of various clustering techniques [2] is studied in the literature and k-means algorithm is one of them

Summary

Introduction

Clustering is grouping similar objects according to their resemblances. It is widely used in several areas of computer science such as data mining, pattern recognition, image processing, computer vision, and etc. The k-means algorithm is one of the most popular techniques for clustering. Simplicity, speed of convergence, and its parallelizable nature makes k-means an attractive clustering technique. Parallelization of clustering techniques is receiving an increasing attention due to ever-increasing data sizes today. Many parallel implementations of various clustering techniques [2] is studied in the literature and k-means algorithm is one of them. Being a very simple iterative clustering algorithm that relies on Lloyd’s iteration, the k-means algorithm has an almost “embarrassingly parallel” nature

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computer and Communications	Publication Date: Jan 1, 2014
Citations: 27	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Parallel K-Means Algorithm for Shared Memory Multiprocessors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer and Communications

Lead the way for us

Similar Papers

Introduction to 3DM: Domain-Oriented Data-Driven Data Mining
Guoyin Wang
-
Guoyin WangGuoyin Wang
17 May 2008
17 May 2008

Design and Application of ERP System for Chinese State-Owned Enterprise Employees Based on Data Mining and Clustering Algorithm
Dejie Ma ... Huilan Jing
-
Dejie Ma, et. al.Dejie Ma ... Huilan Jing
01 Jan 2021
01 Jan 2021

Based on Data Mining Algorithm of Data Mining Research
Jia Zhu ... Jun Wang
-
Jia Zhu, et. al.Jia Zhu ... Jun Wang
15 Dec 2023
15 Dec 2023

Performance Evaluation of Mahout Clustering Algorithms Using a Twitter Streaming Dataset
Fatos Xhafa ... Adriana Bogza
-
Fatos Xhafa, et. al.Fatos Xhafa ... Adriana Bogza
01 Mar 2017
01 Mar 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel K-Means Algorithm for Shared Memory Multiprocessors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer and Communications