An Ensemble of Locally Reliable Cluster Solutions

Huan Niu,Mohammad Reza Mahmoudi,Amin Beheshti,Hamid Parvin,Nasim Khozouie,Hamid Alinejad-Rokny

doi:10.3390/app10051891

Abstract

Clustering ensemble indicates to an approach in which a number of (usually weak) base clusterings are performed and their consensus clustering is used as the final clustering. Knowing democratic decisions are better than dictatorial decisions, it seems clear and simple that ensemble (here, clustering ensemble) decisions are better than simple model (here, clustering) decisions. But it is not guaranteed that every ensemble is better than a simple model. An ensemble is considered to be a better ensemble if their members are valid or high-quality and if they participate according to their qualities in constructing consensus clustering. In this paper, we propose a clustering ensemble framework that uses a simple clustering algorithm based on kmedoids clustering algorithm. Our simple clustering algorithm guarantees that the discovered clusters are valid. From another point, it is also guaranteed that our clustering ensemble framework uses a mechanism to make use of each discovered cluster according to its quality. To do this mechanism an auxiliary ensemble named reference set is created by running several kmeans clustering algorithms.

Highlights

Clustering as a task in statistics, pattern detection, data mining, and machine learning is considered to be very important [1,2,3,4,5]
cluster based similarity partitioning algorithm (CSPA), hyper-graph partitioning algorithm (HGPA), and meta clustering algorithm (MCLA) are applied on the output ensemble of the Algorithm 1
The proposed ensemble clustering method has the advantages of the kmedoids clustering algorithm, including its high speed

Summary

Introduction

Clustering as a task in statistics, pattern detection, data mining, and machine learning is considered to be very important [1,2,3,4,5]. Its purpose is to assign a set of data points to several groups. The purpose of clustering is to group a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters) and have the maximum difference with other objects within the other clusters [6] It is often assumed in the definition of clustering that each data object must belong to a minimum of one cluster (i.e., the clustering of all data must be done rather than part of it) and a maximum of one cluster (i.e., clusters must be non-overlapping).

Methods

Results

Conclusion