A novel intelligent clustering approach for high dimensional data in a big data environment

Qian Tao,Wenyuan Chen,Haojie Lin,Weiqiang Lin,Zhenyu Wang,Chunqin Gu

doi:10.1109/fskd.2017.8393016

Abstract

There are many high dimensional multi-view data for various complex and large-scaled applications in a big data environment. However, traditional clustering algorithms consider all features of data with equal relevance, which is difficult to deal with those high dimensional multi-view data. In order to address this challenge problem, we propose a novel approach named intelligent weighting k-means clustering approach (IWKM), which is based on swarm intelligence and k-means algorithm. Because of the sensitivity to initial clusters centers of k-means, IWKM algorithm utilizes the global search capability of swarm intelligence to find initial clusters centers, the weights of view and feature. Then the weighting k-means approach is applied to determine the clusters of objects with initial clusters centers, the weights of view and feature obtained by swarm intelligence. The character of IWKM is as follows: In the model of clustering, every view and feature have their own weights. The weights will affect object's assigned cluster. The weights of view and feature are calculated by swarm intelligent algorithm; At the same time, the degree of coupling between clusters is also introduced into the model of clustering to enlarge the dissimilarity of clusters. The comprehensive experiments are conducted on three high dimensional multi-view data from machine learning repository. The experimental results are put together with five other state-of-the-art clustering algorithms by the evaluation metrics of Rand index, Jaccard coefficient and Folkes Russel. The experiments reveal that our new approach can generate better clustering results when dealing with high dimensional multi-view data in a big data environment.

Full Text