Evolutionary multi-objective optimization based overlapping subspace clustering

Dipanjyoti Paul,Abhishek Kumar,Jimson Mathew,Sriparna Saha

doi:10.1016/j.patrec.2021.02.012

Abstract

• It reports the first attempt in integrating multi-objective optimization and generation of overlapped subspace clusters. • The proposed method defines a new objective function in order to select the optimum subspace feature set for each cluster. • To allow the overlapping of objects, an objective function is defined that optimizes the membership degree. • Mutation operators that have been used in this approach are the modified versions of those used in the method ChameleoClust. • The proposed subspace clustering method is applied to a real-life application of bi-clustering the gene expression data . Subspace clustering techniques divide the data set into various groups, where each group is represented by a subset of features known as subspace feature set, that are relevant to the objects in the group. The grouping is performed in such a way that similar objects are placed in the same group, whereas dissimilar objects are in different groups. Most of the previous subspace clustering methods have not considered an object to be a part of more than one cluster. However, in many real-life situations, an object may belong to more than one cluster. Moreover , subspace clustering algorithms developed in the past are based on single objective optimization framework which limits in optimizing only a particular shape or property of the clusters. To this end, we have developed an evolutionary-based overlapped subspace clustering method using multi-objective optimization framework. Various mutation operators have been used to explore the search space effectively. Multiple objectives that have been optimized simultaneously in this algorithm are ICC-index, MNR-index and PSM-index. The developed algorithm is evaluated with 7 real-life and 16 synthetic data sets. However, to check the efficiency of using multiple objectives, the proposed algorithm is also tested with 3 big data sets. An application of the proposed method is shown in bi-clustering the gene expression data. The results obtained using these 23 data sets and 3 big data sets are compared with many state-of-the-art algorithms. The comparative study illustrates the efficacy of the proposed algorithm over state-of-the-art algorithms.

Full Text