ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use.

Piotr Kraj,Ashok Sharma,Nikhil Garge,Richard A Mcindoe,Robert Podolsky

doi:10.1186/1471-2105-9-200

Abstract

BackgroundDuring the last decade, the use of microarrays to assess the transcriptome of many biological systems has generated an enormous amount of data. A common technique used to organize and analyze microarray data is to perform cluster analysis. While many clustering algorithms have been developed, they all suffer a significant decrease in computational performance as the size of the dataset being analyzed becomes very large. For example, clustering 10000 genes from an experiment containing 200 microarrays can be quite time consuming and challenging on a desktop PC. One solution to the scalability problem of clustering algorithms is to distribute or parallelize the algorithm across multiple computers.ResultsThe software described in this paper is a high performance multithreaded application that implements a parallelized version of the K-means Clustering algorithm. Most parallel processing applications are not accessible to the general public and require specialized software libraries (e.g. MPI) and specialized hardware configurations. The parallel nature of the application comes from the use of a web service to perform the distance calculations and cluster assignments. Here we show our parallel implementation provides significant performance gains over a wide range of datasets using as little as seven nodes. The software was written in C# and was designed in a modular fashion to provide both deployment flexibility as well as flexibility in the user interface.ConclusionParaKMeans was designed to provide the general scientific community with an easy and manageable client-server application that can be installed on a wide variety of Windows operating systems.

Highlights

During the last decade, the use of microarrays to assess the transcriptome of many biological systems has generated an enormous amount of data
Clustering algorithms are used in various fields such as computer graphics, statistics, data mining and biomedical research
A serial kmeans algorithm has complexity of N*k*R where R is the number of iterations and N is the number of arrays

Summary

Introduction

The use of microarrays to assess the transcriptome of many biological systems has generated an enormous amount of data. A common technique used to organize and analyze microarray data is to perform cluster analysis. Data clustering is a process of partitioning a dataset into separate groups ("clusters") containing "similar" data items based on some distance function and does not require a priori knowledge of the groups to which data members belong. The application of high-throughput technologies, e.g. microarrays, in biomedical research generates an enormous amount of high dimensional data that (page number not for citation purposes). The k-means algorithm, introduced by J.B. MacQueen in 1967, is one of the more popular partitioning methods. MacQueen in 1967, is one of the more popular partitioning methods This algorithm groups data into k groups of similar means. A serial kmeans algorithm has complexity of N*k*R where R is the number of iterations and N is the number of arrays

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 16, 2008
Citations: 39	License type: cc-by

R Discovery Prime

R Discovery Prime

ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

GEPAS, a web-based tool for microarray data analysis and interpretation
J Tarraga ... P Minguez
Nucleic Acids Research | VOL. 36
J Tarraga, et. al.J Tarraga ... P Minguez
19 May 2008
Nucleic Acids Research | VOL. 36

Clustering Algorithms and Other Exploratory Methods for Microarray Data Analysis
J Rahnenführer
Methods of Information in Medicine | VOL. 44
J RahnenführerJ Rahnenführer
01 Jan 2004
Methods of Information in Medicine | VOL. 44

Analysis of Microarray Data for Identification Differentially Expressed Genes: A Survey
Kusuma B ... Mallikarjun M.Kodabagi
SSRN Electronic Journal | VOL. -
Kusuma B, et. al.Kusuma B ... Mallikarjun M.Kodabagi
01 Jan 2020
SSRN Electronic Journal | VOL. -

Performance comparison of clustering and thresholding algorithms for tuberculosis bacilli segmentation
M K Osman ... M Y Mashor
-
M K Osman, et. al.M K Osman ... M Y Mashor
01 May 2012
01 May 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics