Abstract

BackgroundDNA microarray technology allows for the measurement of genome-wide expression patterns. Within the resultant mass of data lies the problem of analyzing and presenting information on this genomic scale, and a first step towards the rapid and comprehensive interpretation of this data is gene clustering with respect to the expression patterns. Classifying genes into clusters can lead to interesting biological insights. In this study, we describe an iterative clustering approach to uncover biologically coherent structures from DNA microarray data based on a novel clustering algorithm EP_GOS_Clust.ResultsWe apply our proposed iterative algorithm to three sets of experimental DNA microarray data from experiments with the yeast Saccharomyces cerevisiae and show that the proposed iterative approach improves biological coherence. Comparison with other clustering techniques suggests that our iterative algorithm provides superior performance with regard to biological coherence. An important consequence of our approach is that an increasing proportion of genes find membership in clusters of high biological coherence and that the average cluster specificity improves.ConclusionThe results from these clustering experiments provide a robust basis for extracting motifs and trans-acting factors that determine particular patterns of expression. In addition, the biological coherence of the clusters is iteratively assessed independently of the clustering. Thus, this method will not be severely impacted by functional annotations that are missing, inaccurate, or sparse.

Highlights

  • DNA microarray technology allows for the measurement of genome-wide expression patterns

  • Example classes of clustering algorithms include (a) single and complete link hierachical clustering [4], (b) K-family of clustering algorithms [57], (c) optimization-based clustering approaches [8,9,10], (d) fuzzy clustering [11,12], (e) quality threshold clustering (QTClust) [13], (f) artificial neural networks for clustering, such as the self-organizing map (SOM) [14] and a variant that combines the SOM with hierachical clustering, the self-organizing tree algorithm (SOTA) [15], (g) information-based clustering [16,17], and (h) stochastic approaches such as clustering by simulated annealing [18,19]

  • We perform an initial clustering run on a given data as previously described to reach the optimal number of clusters [10] and apply a Gene Ontology (GO) analysis of the data to obtain a preliminary assessment of the level of biological coherence

Read more

Summary

Introduction

DNA microarray technology allows for the measurement of genome-wide expression patterns. Example classes of clustering algorithms include (a) single and complete link hierachical clustering [4], (b) K-family of clustering algorithms [57], (c) optimization-based clustering approaches [8,9,10], (d) fuzzy clustering [11,12], (e) quality threshold clustering (QTClust) [13], (f) artificial neural networks for clustering, such as the self-organizing map (SOM) [14] and a variant that combines the SOM with hierachical clustering, the self-organizing tree algorithm (SOTA) [15], (g) information-based clustering [16,17], and (h) stochastic approaches such as clustering by simulated annealing [18,19] Some of these algorithms, while novel in their own rights, suffer from certain shortcomings. Whichever the clustering algorithm used, we need an intuitive and relevant tool to first assess the quality and significance of the clusters formed

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call