Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes

Ujjwal Maulik,Sanghamitra Bandyopadhyay,Anirban Mukhopadhyay

doi:10.1186/1471-2105-10-27

Ujjwal Maulik, Sanghamitra Bandyopadhyay + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-10-27

Copy DOI

Abstract

BackgroundThe landscape of biological and biomedical research is being changed rapidly with the invention of microarrays which enables simultaneous view on the transcription levels of a huge number of genes across different experimental conditions or time points. Using microarray data sets, clustering algorithms have been actively utilized in order to identify groups of co-expressed genes. This article poses the problem of fuzzy clustering in microarray data as a multiobjective optimization problem which simultaneously optimizes two internal fuzzy cluster validity indices to yield a set of Pareto-optimal clustering solutions. Each of these clustering solutions possesses some amount of information regarding the clustering structure of the input data. Motivated by this fact, a novel fuzzy majority voting approach is proposed to combine the clustering information from all the solutions in the resultant Pareto-optimal set. This approach first identifies the genes which are assigned to some particular cluster with high membership degree by most of the Pareto-optimal solutions. Using this set of genes as the training set, the remaining genes are classified by a supervised learning algorithm. In this work, we have used a Support Vector Machine (SVM) classifier for this purpose.ResultsThe performance of the proposed clustering technique has been demonstrated on five publicly available benchmark microarray data sets, viz., Yeast Sporulation, Yeast Cell Cycle, Arabidopsis Thaliana, Human Fibroblasts Serum and Rat Central Nervous System. Comparative studies of the use of different SVM kernels and several widely used microarray clustering techniques are reported. Moreover, statistical significance tests have been carried out to establish the statistical superiority of the proposed clustering approach. Finally, biological significance tests have been carried out using a web based gene annotation tool to show that the proposed method is able to produce biologically relevant clusters of co-expressed genes.ConclusionThe proposed clustering method has been shown to perform better than other well-known clustering algorithms in finding clusters of co-expressed genes efficiently. The clusters of genes produced by the proposed technique are also found to be biologically significant, i.e., consist of genes which belong to the same functional groups. This indicates that the proposed clustering method can be used efficiently to identify co-expressed genes in microarray gene expression data.Supplementary Website The pre-processed and normalized data sets, the matlab code and other related materials are available at .

Highlights

The landscape of biological and biomedical research is being changed rapidly with the invention of microarrays which enables simultaneous view on the transcription levels of a huge number of genes across different experimental conditions or time points
Thereafter, we examined the use of different kernel functions and compared their performances
A crisp version of Multiobjective GA (MOGA)-Support Vector Machine (SVM) clustering (MOGAcrisp-SVM) is considered for comparison in order to establish the utility of incorporating fuzziness

Summary

Introduction

The landscape of biological and biomedical research is being changed rapidly with the invention of microarrays which enables simultaneous view on the transcription levels of a huge number of genes across different experimental conditions or time points. This article poses the problem of fuzzy clustering in microarray data as a multiobjective optimization problem which simultaneously optimizes two internal fuzzy cluster validity indices to yield a set of Pareto-optimal clustering solutions Each of these clustering solutions possesses some amount of information regarding the clustering structure of the input data. Motivated by this fact, a novel fuzzy majority voting approach is proposed to combine the clustering information from all the solutions in the resultant Pareto-optimal set. A novel fuzzy majority voting approach is proposed to combine the clustering information from all the solutions in the resultant Pareto-optimal set This approach first identifies the genes which are assigned to some particular cluster with high membership degree by most of the Pareto-optimal solutions. A fuzzy clustering algorithm produces a K × n membership matrix

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 20, 2009
Citations: 93	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Unsupervised and Supervised Learning Approaches Together for Microarray Analysis
Indrajit Saha ... Ujjwal Maulik
Fundamenta Informaticae | VOL. 106
Indrajit Saha, et. al.Indrajit Saha ... Ujjwal Maulik
01 Jan 2010
Fundamenta Informaticae | VOL. 106

Classification of Anti-learnable Biological and Synthetic Data
Adam Kowalczyk
-
Adam KowalczykAdam Kowalczyk
17 Sep 2007
17 Sep 2007

Rough Based Symmetrical Clustering for Gene Expression Profile Analysis.
Anasua Sarkar ... Ujjwal Maulik
IEEE transactions on nanobioscience | VOL. 14
Anasua Sarkar, et. al.Anasua Sarkar ... Ujjwal Maulik
29 Apr 2015
IEEE transactions on nanobioscience | VOL. 14

Investigation of Self-Organizing Oscillator Networks for Use in Clustering Microarray Data
S.A Salem ... A.K Nandi
IEEE Transactions on NanoBioscience | VOL. 7
S.A Salem, et. al.S.A Salem ... A.K Nandi
01 Mar 2008
IEEE Transactions on NanoBioscience | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics