Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study

Junbai Wang,Hans Christian Aasheim,Ola Myklebost,Jan Delabie,Erlend Smeland

doi:10.1186/1471-2105-3-36

Junbai Wang, Hans Christian Aasheim + Show 3 more

Open Access

https://doi.org/10.1186/1471-2105-3-36

Copy DOI

Journal: BMC bioinformatics	Publication Date: Jan 1, 2002
Citations: 133	License type: cc-by

Affiliation: Norwegian Cancer Society

Abstract

BackgroundA method to evaluate and analyze the massive data generated by series of microarray experiments is of utmost importance to reveal the hidden patterns of gene expression. Because of the complexity and the high dimensionality of microarray gene expression profiles, the dimensional reduction of raw expression data and the feature selections necessary for, for example, classification of disease samples remains a challenge. To solve the problem we propose a two-level analysis. First self-organizing map (SOM) is used. SOM is a vector quantization method that simplifies and reduces the dimensionality of original measurements and visualizes individual tumor sample in a SOM component plane. Next, hierarchical clustering and K-means clustering is used to identify patterns of gene expression useful for classification of samples.ResultsWe tested the two-level analysis on public data from diffuse large B-cell lymphomas. The analysis easily distinguished major gene expression patterns without the need for supervision: a germinal center-related, a proliferation, an inflammatory and a plasma cell differentiation-related gene expression pattern. The first three patterns matched the patterns described in the original publication using supervised clustering analysis, whereas the fourth one was novel.ConclusionsOur study shows that by using SOM as an intermediate step to analyze genome-wide gene expression data, the gene expression patterns can more easily be revealed. The "expression display" by the SOM component plane summarises the complicated data in a way that allows the clinician to evaluate the classification options rather than giving a fixed diagnosis.

Highlights

A method to evaluate and analyze the massive data generated by series of microarray experiments is of utmost importance to reveal the hidden patterns of gene expression
We propose a two-level analysis [14] for the study of complex gene expression data
Davies-Bouldin index was used to find the optimum number of 12 clusters in Kmeans clustering of the self-organizing map (SOM) [14]

Summary

Introduction

A method to evaluate and analyze the massive data generated by series of microarray experiments is of utmost importance to reveal the hidden patterns of gene expression. Because of the complexity and the high dimensionality of microarray gene expression profiles, the dimensional reduction of raw expression data and the feature selections necessary for, for example, classification of disease samples remains a challenge. Hierarchical clustering and K-means clustering is used to identify patterns of gene expression useful for classification of samples. To reliably identify expression patterns associated with tumor type, prognosis or therapy, hundreds of samples need to be studied, and powerful data mining tools are needed. The data mining tools have to be developed that reveal a maximum of information to generate new hypotheses [9] with minimal supervision.

Methods

Results

Conclusion