Knowledge-assisted recognition of cluster boundaries in gene expression data

Yoshifumi Okada,Takehiko Sahara,Hikaru Mitsubayashi,Satoru Ohgiya,Tomomasa Nagashima

doi:10.1016/j.artmed.2005.02.007

Abstract

DNA microarray technology has made it possible to determine the expression levels of thousands of genes in parallel under multiple experimental conditions. Genome-wide analyses using DNA microarrays make a great contribution to the exploration of the dynamic state of genetic networks, and further lead to the development of new disease diagnosis technologies. An important step in the analysis of gene expression data is to classify genes with similar expression patterns into the same groups. To this end, hierarchical clustering algorithms have been widely used. Major advantages of hierarchical clustering algorithms are that investigators do not need to specify the number of clusters in advance and results are presented visually in the form of a dendrogram. However, since traditional hierarchical clustering methods simply provide results on the statistical characteristics of expression data, biological interpretations of the resulting clusters are not easy, and it requires laborious tasks to unveil hidden biological processes regulated by members in the clusters. Therefore, it has been a very difficult routine for experts. Here, we propose a novel algorithm in which cluster boundaries are determined by referring to functional annotations stored in genome databases. The algorithm first performs hierarchical clustering of gene expression profiles. Then, the cluster boundaries are determined by the Variance Inflation Factor among the Gene Function Vectors, which represents distributions of gene functions in each cluster. Our algorithm automatically specifies a cutoff that leads to functionally independent agglomerations of genes on the dendrogram derived from similarities among gene expression patterns. Finally, each cluster is annotated according to dominant gene functions within the respective cluster. In this paper, we apply our algorithm to two gene expression datasets related to cell cycle and cold stress response in budding yeast Saccharomyces cerevisiae. As a result, we show that the algorithm enables us to recognize cluster boundaries characterizing fundamental biological processes such as the Early G1, Late G1, S, G2 and M phases in cell cycles, and also provides novel annotation information that has not been obtained by traditional hierarchical clustering methods. In addition, using formal cluster validity indices, high validity of our algorithm is verified by the comparison through other popular clustering algorithms, K-means, self-organizing map and AutoClass.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Knowledge-assisted recognition of cluster boundaries in gene expression data

Abstract

Talk to us

Similar Papers

More From: Artificial Intelligence In Medicine

Lead the way for us

Journal: Artificial Intelligence In Medicine	Publication Date: Jul 27, 2005
Citations: 35

Similar Papers

Chapter 4 - Hierarchical k-Means: A Hybrid Clustering Algorithm and Its Application to Study Gene Expression in Lung Adenocarcinoma
Mohammad Shabbir Hasan ... Zhong-Hui Duan
Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology | VOL. -
Mohammad Shabbir Hasan, et. al.Mohammad Shabbir Hasan ... Zhong-Hui Duan
01 Jan 2015
Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology | VOL. -

Platelet-derived Growth Factor Stimulates Src-dependent mRNA Stabilization of Specific Early Genes in Fibroblasts
Paul A Bromann ... Sara A Courtneidge
Journal of Biological Chemistry | VOL. 280
Paul A Bromann, et. al.Paul A Bromann ... Sara A Courtneidge
01 Mar 2005
Journal of Biological Chemistry | VOL. 280

Performance Improvement of Gene Selection Methods using Outlier Modification Rule
Md Shahjaman ... Md Nurul Haque Mollah
Current Bioinformatics | VOL. 14
Md Shahjaman, et. al.Md Shahjaman ... Md Nurul Haque Mollah
16 Jul 2019
Current Bioinformatics | VOL. 14

Bayesian infinite mixture model based clustering of gene expression profiles.
Mario Medvedovic ... Siva Sivaganesan
Bioinformatics | VOL. 18
Mario Medvedovic, et. al.Mario Medvedovic ... Siva Sivaganesan
01 Sep 2002
Bioinformatics | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Knowledge-assisted recognition of cluster boundaries in gene expression data

Abstract

Talk to us

Similar Papers

More From: Artificial Intelligence In Medicine