Partial mixture model for tight clustering of gene expression time-course

Yinyin Yuan,Roland Wilson,Chang-Tsun Li

doi:10.1186/1471-2105-9-287

Yinyin Yuan, Roland Wilson + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-9-287

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Jun 18, 2008
Citations: 48	License type: cc-by

Affiliation: University of Warwick

Abstract

BackgroundTight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored.ResultsIn this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms.ConclusionFor the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the combination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion.

Highlights

Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies
We propose within-cluster compactness (WCC) to measure the functional closeness for genes within one cluster based on the corresponding GO relationship graph
Experiments on Yeast Galactose dataset Experiments are conducted on the Yeast Galactose dataset [42], which consists of gene expression measurements in galactose utilization in Saccharomyces cerevisiae

Summary

Introduction

Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Based on the assumption that co-expression indicates coregulation, gene expression data clustering aims to reveal gene groups of similar functions in the biological pathways. This biological rationale is readily supported by both empirical observations and systematic analysis [1]. Various model-based methods have been proposed to accommodate the needs for data mining in such massive datasets. The basic approach of these model-based methods is to fit a finite mixture model to the observed data, assuming that there is an underlying true model/density, and systemically find the optimal parameters so that the fitted model/density is as close to the true model/density as possible. Current methods can be problematic, as they often fail to show how clustering can assist in mining gene expression data

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Partial mixture model for tight clustering of gene expression time-course

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Partial Mixture Model for Tight Clustering in Exploratory Gene Expression Analysis
Yinyin Yuan ... Chang-Tsun Li
-
Yinyin Yuan, et. al.Yinyin Yuan ... Chang-Tsun Li
01 Oct 2007
01 Oct 2007

Estimation for u-shaped beta distributions: minimum hellinger distance and related methods
D Richard Cutler ... Adele Cutler
Communications in Statistics - Theory and Methods | VOL. 29
D Richard Cutler, et. al.D Richard Cutler ... Adele Cutler
01 Jan 1999
Communications in Statistics - Theory and Methods | VOL. 29

30 Minimum distance procedures
Rudolf Beran
Handbook of Statistics | VOL. 4
Rudolf BeranRudolf Beran
01 Jan 1984
Handbook of Statistics | VOL. 4

Asymptotic normality of an adaptive kernel density estimator for finite mixture models
R.J Karunamuni ... J Wu
Statistics and Probability Letters | VOL. 76
R.J Karunamuni, et. al.R.J Karunamuni ... J Wu
24 Aug 2005
Statistics and Probability Letters | VOL. 76

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Partial mixture model for tight clustering of gene expression time-course

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics