A hybrid heuristic for the overlapping cluster editing problem

Luiz Antonio Nogueira Lorena,Guilherme Oliveira Chagas,Rafael Duarte Coelho Dos Santos

doi:10.1016/j.asoc.2019.105482

Luiz Antonio Nogueira Lorena, Guilherme Oliveira Chagas + Show 1 more

Open Access

https://doi.org/10.1016/j.asoc.2019.105482

Copy DOI

Abstract

Clustering problems are applicable to several areas of science. Approaches and algorithms are as varied as the applications. From a graph theory perspective, clustering can be generated by partitioning an input graph into a vertex-disjoint union of cliques (clusters) through addition and deletion of edges. Finding the minimum number of edges additions and deletions required to cluster data that can be represented as graphs is a well-known problem in combinatorial optimization, often referred to as cluster editing problem. However, many real-world clustering applications are characterized by overlapping clusters, that is, clusters that are non-disjoint. In these situations cluster editing cannot be applied to these problems. Literature concerning a relaxation of the cluster editing, where clusters can overlap, is scarce. In this work, we propose the overlapping cluster editing problem, a variation of the cluster editing where the goal is to partition a graph, also by editing edges, into maximal cliques that are not necessarily disjoint. In addition, we also present three slightly different versions of a hybrid heuristic to solve this problem. Each hybrid heuristic is based on coupling two metaheuristicsthat, together, generate a set of clusters; and one of three mixed-integer linear programming models, also introduced in this paper, that uses these clusters as input. The objective with the metaheuristics is to limit the solution exploration space in the models’ resolution, therefore reducing its computational time.Tests results show that the all proposed hybrid heuristic versions are able to generate good-quality overlapping cluster editing solutions. In particular, one version of the hybrid heuristic achieved, at a low computational cost, the best results in 51 of 112 randomly-generated graphs. Although the other two hybrid heuristic versions have harder to solve models, they obtained reasonable results in medium-sized randomly-generated graphs. In addition, the hybrid heuristic achieved good results identifying labeled overlapping clusters in a supervised data set experiment. Furthermore, we also show that, with our new problem definition, clustering a vertex in more than one cluster can reduce the edges editing cost.

Full Text