Abstract

Since data analysis using technical computational model has profound influence on interpretation of the final results, basic understanding of the underlying model surrounding such computational tools is required for optimal experimental design by target users of such tools. Despite wide variation of techniques associated with clustering, cluster analysis has become a generic name in bioinformatics and is seen to discover the natural grouping(s) of a set of patterns, points or sequences. The aim of this paper is to analyze k-means by applying a step-by-step k-means walk approach using graphic-guided analysis to provide clear understanding of the operational mechanism of the k-means algorithm. Scattered graph was created using theoretical microarray gene expression data which is a simplified view of a typical microarray experiment data. We designate the centroid as the first three initial data points and applied Euclidean distance metrics in the k-means algorithm leading to assignment of these three data points as reference point to each cluster formation. A test is conducted to determine if there is a shift in centroid before the next iteration is attained. We were able to trace out those data points in same cluster after convergence. We observed that, as both the dimension of data and gene list increases for hybridization matrix of microarray data, computational implementation of k-means algorithm becomes more rigorous. Furthermore, the understanding of this approach will stimulate new ideas for further development and improvement of the k-means clustering algorithm especially within the confines of the biology of diseases and beyond. However, the major advantage will be to give improved cluster output for the interpretation of microarray experimental results, facilitate better understanding for bioinformaticians and algorithm experts to tweak k-means algorithm for improved run-time of clustering.

Highlights

  • Since data analysis using technical computational model has profound influence on interpretation of the final results, basic understanding of the underlying model surrounding such computational tools is required for optimal experimental design by target users of such tools

  • The word “k-means” indicates that the algorithm takes as an input a user predefined number of clusters, which is the k from its name, while means stands for an average representing the average location of all the members of a particular cluster

  • Its operational mechanism is categorised into several tasks under four headings: choosing initial center, Computing for Cluster Membership, Taking Decisions based on Boolean Variable, Re-arrangement of Gene-Cluster Assignment using theoretical microarray gene expression data

Read more

Summary

Introduction

Since data analysis using technical computational model has profound influence on interpretation of the final results, basic understanding of the underlying model surrounding such computational tools is required for optimal experimental design by target users of such tools. An infectious disease with particular reference to malaria is caused by a lethal pathogenic protozoan, Plasmodium falciparum, responsible for major losses and death in Sub-Saharan Africa [4,5] This disease has attracted intense microarray patronage with large data generation efforts [5,6,7,8,9,10,11,12,13]. “Cluster analysis” first appeared as a phrase in 1954, and was suggested as a tool used to understand anthropological data [16] Biologists called it “numerical taxonomy”, owing to the early research done on hierarchical clustering, a technique that aided them to create hierarchy of different species for analyzing their relationship systematically and understanding their phylogeny. The most popular partitional clustering algorithm, k-means has been proposed by Lloyd [19] and MacQueen [20]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.