Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification.

Anirban Mukhopadhyay,Ujjwal Maulik,Sanghamitra Bandyopadhyay

doi:10.1371/journal.pone.0013803

Anirban Mukhopadhyay, Ujjwal Maulik + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0013803

Copy DOI

Abstract

With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their subtypes. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer types. In this article, we have presented an unsupervised cancer classification technique based on multiobjective genetic clustering of the tissue samples. In this regard, a real-coded encoding of the cluster centers is used and cluster compactness and separation are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of non-dominated solutions. A novel approach to combine the clustering information possessed by the non-dominated solutions through Support Vector Machine (SVM) classifier has been proposed. Final clustering is obtained by consensus among the clusterings yielded by different kernel functions. The performance of the proposed multiobjective clustering method has been compared with that of several other microarray clustering algorithms for three publicly available benchmark cancer datasets. Moreover, statistical significance tests have been conducted to establish the statistical superiority of the proposed clustering method. Furthermore, relevant gene markers have been identified using the clustering result produced by the proposed clustering method and demonstrated visually. Biological relationships among the gene markers are also studied based on gene ontology. The results obtained are found to be promising and can possibly have important impact in the area of unsupervised cancer classification as well as gene marker identification for multiple cancer subtypes.

Highlights

The advent of microarray technology has made it possible the study of the expression profiles of a huge number of genes across different experimental conditions or tissue samples simultaneously
When microarray datasets are organized as samples versus gene fashion, they are very helpful for classification of different types of tissues and identification of those genes whose expression levels are good diagnostic indicators
If the samples are from different subtypes of cancer, it becomes the problem of multi-class cancer classification

Summary

Introduction

The advent of microarray technology has made it possible the study of the expression profiles of a huge number of genes across different experimental conditions or tissue samples simultaneously. The clustering solution produced by the proposed MOGASVM clustering technique has been used to identify the gene markers that are mostly responsible for distinguishing a particular tumor class from the remaining ones.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Nov 12, 2010
Citations: 74	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Refining Genetic Algorithm Based Fuzzy Clustering through Supervised Learning for Unsupervised Cancer Classification
Anirban Mukhopadhyay ... Ujjwal Maulik
-
Anirban Mukhopadhyay, et. al.Anirban Mukhopadhyay ... Ujjwal Maulik
01 Jan 2009
01 Jan 2009

Abstract 2075: Highly customizable multi-sample single cell RNA-Seq pipeline on the CGC
Nevena Vukojicic ... Jack Digiovanna
Cancer Research | VOL. 83
Nevena Vukojicic, et. al.Nevena Vukojicic ... Jack Digiovanna
04 Apr 2023
Abstract 2075: Highly customizable multi-sample single cell RNA-Seq pipeline on the CGC
Nevena Vukojicic ... Jack Digiovanna

Transcriptome study and identification of potential marker genes related to the stable expression of recombinant proteins in CHO clones.
Uros Jamnikar ... Holger Laux
BMC Biotechnology | VOL. 15
Uros Jamnikar, et. al.Uros Jamnikar ... Holger Laux
23 Oct 2015
BMC Biotechnology | VOL. 15

An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer
Zne-Jung Lee
Artificial Intelligence in Medicine | VOL. 42
Zne-Jung LeeZne-Jung Lee
19 Nov 2007
Artificial Intelligence in Medicine | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-class clustering of cancer subtypes through SVM based ensemble of pareto-optimal solutions for gene marker identification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one