Abstract

Among the multi-source data clustering tasks, there is a kind of frequently encountered tasks where only one of the multi-source datasets is available for sake of privacy and other reasons. The only available dataset is called local dataset, and the other are called external datasets. The horizontal collaborative fuzzy clustering (HCFC) model is a typical one that can deal with such clustering tasks. In HCFC, each external dataset is used through the knowledge mined from it rather than itself. The knowledge expressed as a knowledge partition matrix is fused into the clustering process of the local dataset. Reviewing the existing HCFC models, we can find three issues that need improvement. Firstly, the existing HCFC models quantify the collaboration contribution of each external knowledge by a hyperparameter at dataset-level, and moreover, do not distinguish the collaboration contributions of objects in the same external dataset. This may lead to counterintuitive clustering results. Focused on this issue, this paper proposes an enhanced HCFC (EHCFC) algorithm that extends the collaboration from dataset-level to object-level, and assigns different weights to objects based on the information amount provided by objects. Through EHCFC, a more flexible collaboration and a more intuitive clustering result can be reached. Secondly, the collaboration mechanisms of the existing HCFC models require that the dimensionalities of the partition matrices of external datasets and local dataset are the same, which makes the HCFC algorithms unable to work in many real situations. Focused on this limitation, a knowledge adaption mechanism based on relative entropy and spectral clustering is proposed resulting in a further refined EHCFC-KA algorithm, i.e., EHCFC with knowledge adaption. The proposed knowledge adaption mechanism makes both the HCFC algorithms and the EHCFC algorithm effective and successful in more application scenarios. Finally, we define two indexes in terms of consistency (the consistency of the clustering result with external knowledge) to evaluate the performance of collaborative clustering. Experiments on synthetic datasets and UCI public datasets demonstrate that the proposed EHCFC and EHCFC-KA algorithms outperform the existing HCFC algorithms and achieve significantly better intuitive collaborative clustering performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call