Abstract

Advances made to the traditional clustering algorithms solves the various problems such as curse of dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we apply it to high dimensional data it causes the dimensional disaster problem due to high computational complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms improve the performance for clustering high dimension dataset from different aspects in different extent. Still these algorithms will improve the performance form a single perspective. The objective of the proposed model is to improve the performance of traditional H-K clustering and overcome the limitations such as high computational complexity and poor accuracy for high dimensional data by combining the three different approaches of clustering algorithm as subspace clustering algorithm and ensemble clustering algorithm with H-K clustering algorithm.

Highlights

  • As an important technique in data mining, clustering analysis groups the observations having similar properties which can be called as an unsupervised classification[1] which helps to extract the relevant information from high dimensional data

  • The proposed model combines the three techniques, subspace clustering, H-K clustering and ensemble clustering and their advantages to improve the performance of clustering result on high dimensional data which will simultaneously overcome the limitations of H-K clustering algorithm for high dimensional data

  • A lot of work has been done in the area of clustering, based on the research until date, the general categorization for high dimensional data set clustering includes: 1- Dimension reduction, 2- Subspace clustering, 3 - Ensemble Clustering and 4 - H-K clustering [1] [11] [14]

Read more

Summary

. INTRODUCTION

As an important technique in data mining, clustering analysis groups the observations having similar properties which can be called as an unsupervised classification[1] which helps to extract the relevant information from high dimensional data. Ensemble clustering ‘the knowledge reuse framework’, firstly proposed by Strel and Ghosh [11] is the technique which uses the two mechanisms as generation mechanism which generates the clusters using different criteria and consensus function will choose the most appropriate solution form the set of solutions. It overcome the challenges created by high dimensional data and gives high performance on real world datasets for applications as Internet applications and medical diagnostics [2,3,12,13,19,20]. The proposed model combines the three techniques, subspace clustering, H-K clustering and ensemble clustering and their advantages to improve the performance of clustering result on high dimensional data which will simultaneously overcome the limitations of H-K clustering algorithm for high dimensional data ( as high computational complexity and poor accuracy)

MOTIVATION
RELATED WORK
Dimension reduction
Subspace clustering
Ensemble Clustering
H-K clustering
Method
Findings
5.CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call