Simultaneous feature selection and unsupervised clustering for gene-expression data in multiobjective optimization framework

Abhay Kumar Alok,Asif Ekbal,Sriparna Saha,Neha Kanekar

doi:10.1109/iciinfs.2014.7036594

Abstract

In this paper, the problem of simultaneous feature selection and automatic clustering is formulated as a multi-objective optimization task. Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large volume of genes and the highly complex biological networks some sophisticated techniques are required to study available data consisting of large number of measurements. In general clustering techniques are used to identify natural partitioning and detect some interesting patterns from the given data as a first step of studying the gene expression data. But in general all the features present in the data set may not be important for clustering purpose. Thus appropriate selection of features from the set of all features is very much relevant from clustering point of view. A modern simulated annealing based multiobjective optimization technique namely AMOSA is utilized as the background optimization methodology. Here features and cluster centers are represented in the form of a string. Three optimization criteria are utilized: i) a function representing the total compactness of the partitioning based on the Euclidean distance, ii) a function representing the total compactness of the partitioning based on the point symmetry based distance and iii) a function counting the number of features. The objective is to optimize values of cluster validity indices where as to increase the number of features in order to remove the bias of internal cluster validity indices on dimensionality. Appropriate subset of features, proper number of clusters and the proper partitioning are determined using the search capability of AMOSA. In order to assign cluster label to all points, a recently introduced distance, namely point symmetry based distance, is utilized. Thus the effectiveness of this proposed Fea-GenClustMOO technique is shown for automatically clustering publicly available gene-expression data sets. Results are compared with existing techniques for gene expression data clustering.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Simultaneous feature selection and unsupervised clustering for gene-expression data in multiobjective optimization framework

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Simultaneous feature selection and semi-supervised clustering for gene-expression data
Abhay Kumar Alok ... Neha Kanekar
-
Abhay Kumar Alok, et. al.Abhay Kumar Alok ... Neha Kanekar
01 Feb 2015
01 Feb 2015

Use of Semisupervised Clustering and Feature-Selection Techniques for Identification of Co-expressed Genes.
Sriparna Saha ... Asif Ekbal
IEEE journal of biomedical and health informatics | VOL. 20
Sriparna Saha, et. al.Sriparna Saha ... Asif Ekbal
20 Jul 2015
IEEE journal of biomedical and health informatics | VOL. 20

Semi-supervised clustering for gene-expression data in multiobjective optimization framework
Abhay Kumar Alok ... Asif Ekbal
International Journal of Machine Learning and Cybernetics | VOL. 8
Abhay Kumar Alok, et. al.Abhay Kumar Alok ... Asif Ekbal
15 Feb 2015
International Journal of Machine Learning and Cybernetics | VOL. 8

A symmetry based multiobjective clustering technique for automatic evolution of clusters
Sriparna Saha ... Sanghamitra Bandyopadhyay
Pattern Recognition | VOL. 43
Sriparna Saha, et. al.Sriparna Saha ... Sanghamitra Bandyopadhyay
16 Jul 2009
Pattern Recognition | VOL. 43

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Simultaneous feature selection and unsupervised clustering for gene-expression data in multiobjective optimization framework

Abstract

Talk to us

Similar Papers