Abstract

The need of high quality clustering is very important in the modern era of information processing. Clustering is one of the most important data analysis methods and the k-means clustering is commonly used for diverse applications. Despite its simplicity and ease of implementation, the k-means algorithm is computationally expensive and the quality of clusters is determined by the random choice of initial centroids. Different methods were proposed for improving the accuracy and efficiency of the k-means algorithm. In this paper, we propose a new approach that improves the accuracy of clustering microarray based gene expression data sets. In the proposed method, the initial centroids are determined by using the Red Black Tree and an improved heuristic approach is used to assign the data items to the nearest centroids. Experimental results show that the proposed algorithm performs better than other existing algorithms.

Highlights

  • Clustering is a process of grouping the set of data items into disjoint clusters so that similarity between the items in the same cluster are high, and similarity between the items in different clusters are low [1].This paper describes an improved method for cluster analysis of microarray gene expression data [2]

  • The proposed algorithm consists of two phases-the first phase is for getting initial centroids and the second phase is to assign data points to appropriate clusters

  • Algorithm 3 describes the first phase of determining the initial centroids

Read more

Summary

Introduction

Clustering is a process of grouping the set of data items into disjoint clusters so that similarity between the items in the same cluster are high, and similarity between the items in different clusters are low [1].This paper describes an improved method for cluster analysis of microarray gene expression data [2]. Microarray mainly consists of large number of gene sequences under multiple conditions. We may need to cluster either genes or samples based on the application. We have used the Red Black Tree based approach in the first phase to get the initial centroids. The second phase is used to assign each data item to the nearest centroid. The similarity between each data item and the centroids are determined by using the cosine similarity measure

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call