Abstract

An unsupervised data clustering method, called the local maximum clustering (LMC) method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the -mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999).

Highlights

  • Data analysis is a key step in obtaining information from large-scale gene expression data

  • As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al [12]

  • This work proposed the local maximum clustering (LMC) method and evaluated its performance as compared with some typical clustering methods through designed model data sets. This clustering method is an unsupervised one and can generate hierarchic cluster structures with minimum input. It allows a magnitude property of research interest to be chosen for clustering

Read more

Summary

Introduction

Data analysis is a key step in obtaining information from large-scale gene expression data. Many analysis methods and algorithms have been developed for the analysis of the gene expression matrix [1, 2, 3, 4, 5, 6, 7, 8, 9]. A reasonable hypothesis is that genes with similar expression profiles, that is, genes that are coexpressed, may have something in common in their regulatory mechanisms, that is, they may be coregulated. By clustering together genes with similar expression profiles, one can find groups of potentially coregulated genes and search for putative regulatory signals. They can be divided into two categories: supervised and unsupervised methods. Some widely used methods in this category are the hierarchic clustering method [6], the K-mean clustering method [10], and the self-organized map clustering method [9, 11]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.