Abstract

Semi-supervised clustering uses the information of unsupervised and supervised learning to overcome the problems associated with them. Extracted information are given in the form of class labels and data distribution during clustering process. In this paper the problem of semi-supervised clustering is formulated under the framework of multiobjective optimization (MOO). Thereafter, a multiobjective based clustering technique is extended to solve the semi-supervised clustering problem. The newly developed semi-supervised multiobjective clustering algorithm (Semi-GenClustMOO), is used for appropriate partitioning of data into appropriate number of clusters. Four objective functions are optimized, out of which first three use some unsupervised information and the last one uses supervised information. These four objective functions represent, respectively, the, total compactness of the partitioning, total symmetry present in the clusters, cluster connectedness and Adjust Rand Index. These four objective functions are optimized simultaneously using AMOSA, a newly developed simulated annealing based multiobjective optimization method. Results show that it can easily detect the appropriate number of clusters as well as the appropriate partitioning from data sets having either well-separated clusters of any shape or symmetrical clusters with or without overlaps. Seven artificial and four real-life data sets have been used for evaluation to show the effectiveness of the Semi-GenClustMOO technique. In each case class information of 10% randomly chosen data point is known to us <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call