Improved subspace clustering algorithm using multi-objective framework and subspace optimization

Dipanjyoti Paul,Sriparna Saha,Jimson Mathew

doi:10.1016/j.eswa.2020.113487

Abstract

Subspace clustering technique divides the data set into different groups or clusters where each cluster comprises of objects that share some similar properties. Again, the feature sets or the subspace features that are used to represent clusters are different for different clusters. Moreover, in subspace clustering, the grouping of similar objects and the subspace feature set representing that group are identified simultaneously. In evolutionary-based machine learning problems, two critical measures to determine the quality of the generated clusters are compactness within and separation between the clusters. However, the distance-based separation between two clusters may not be useful in the context of subspace clustering, as the clusters may belong to two different subspaces. Again, in the case of subspace clustering, the selection of relevant subspace features plays a primary role in generating good quality subspace clusters. Therefore, the proposed approach optimizes the subspace features by considering two new objective functions, feature non-redundancy (FNR) and feature per cluster (FPC) represented in the form of PSM-index. Another objective function, intra-cluster compactness (ICC-index), is modified and used to optimize the compactness among objects within the cluster. Finally, an evolutionary-based multi-objective subspace clustering technique is developed in this paper optimizing these validity indices. A new mutation operator, namely duplication and deletion along with the modified version of the exogenous genetic material uptake, are developed to explore the search space effectively. The developed algorithm is tested on sixteen synthetic data sets and seven standard real-life data sets for identifying different subspace clusters. Again, to show the effectiveness of using multiple objectives, the algorithm is also tested on three big data sets and a MNIST data set. Also, an application of the proposed method is shown in bi-clustering the gene expression data. The results obtained by the proposed algorithm are compared against some state-of-the-art methods. Experimentation reveals that the proposed algorithm can take advantage of its evolvable genomic structure and the newly defined objective functions on the multi-objective based framework.

Full Text