Abstract

In this paper, the problem of simultaneous feature selection and automatic clustering is formulated as a multi-objective optimization task. Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large volume of genes and the highly complex biological networks some sophisticated techniques are required to study available data consisting of large number of measurements. In general clustering techniques are used to identify natural partitioning and detect some interesting patterns from the given data as a first step of studying the gene expression data. But in general all the features present in the data set may not be important for clustering purpose. Thus appropriate selection of features from the set of all features is very much relevant from clustering point of view. A modern simulated annealing based multiobjective optimization technique namely AMOSA is utilized as the background optimization methodology. Here features and cluster centers are represented in the form of a string. Three optimization criteria are utilized: i) a function representing the total compactness of the partitioning based on the Euclidean distance, ii) a function representing the total compactness of the partitioning based on the point symmetry based distance and iii) a function counting the number of features. The objective is to optimize values of cluster validity indices where as to increase the number of features in order to remove the bias of internal cluster validity indices on dimensionality. Appropriate subset of features, proper number of clusters and the proper partitioning are determined using the search capability of AMOSA. In order to assign cluster label to all points, a recently introduced distance, namely point symmetry based distance, is utilized. Thus the effectiveness of this proposed Fea-GenClustMOO technique is shown for automatically clustering publicly available gene-expression data sets. Results are compared with existing techniques for gene expression data clustering.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call